Season 1 – Episode 13 -Impact of Generative AI on Research Integrity
Discusses the impact of generative artificial intelligence on research integrity.
Episode Transcript
Click to expand/collapse
Daniel Smith: Welcome to On Tech Ethics with CITI Program. Our guest today is Mohammad Hosseini, who is an assistant professor at Northwestern University. Mohammad’s work explores a broad range of research ethics issues such as recognizing contributions in academic publications, citations and publication ethics, gender issues in academia, and employing artificial intelligence and large language models in research. Today we’re going to discuss the impact of artificial intelligence on research integrity.
Before we get started, I want to quickly note that this podcast is for educational purposes only. It is not designed to provide legal advice or legal guidance. You should consult with your organization’s attorneys if you have questions or concerns about the relevant laws and regulations that may be discussed in this podcast. In addition, the views expressed in this podcast are solely those of our guest. And on that note, welcome to the podcast Mohammad.
Mohammad Hosseini: Thank you so much. It’s great to be here.
Daniel Smith: It’s a pleasure to have you as well. I gave you a brief introduction, but can you tell us more about yourself and what you currently focus on at Northwestern University?
Mohammad Hosseini: Absolutely. For a long time I was mostly investigating the ethics of authorship and attribution of credit and responsibilities, as you just mentioned. But since ChatGPT’s release in November of 2022, my research focus has slightly shifted. I’m still exploring similar issues and I’m basically in the same orbit of research integrity and ethics of research, but now I’m mostly focused on the impact of generative AI on different stages and processes in research.
And this has been quite a journey because I think the debate about generative AI has been a very noisy debate, and it’s taken a lot of time and energy just to stay on top of everything and try to keep up with what’s being published, what’s being said, what’s being developed, and so on.
Daniel Smith: Let’s talk about the impact of artificial intelligence and large language models on research integrity. I think many people are familiar with systems based on large language models such as ChatGPT, but can you provide us with a quick overview of them for those who are not?
Mohammad Hosseini: Absolutely. I can take a stab at providing a simple description of what these systems are. Large language models are a type of artificial intelligence that uses machine learning to process, generate, and sometimes translate human language. If I want to provide a quick overview of how large language models work, I need to start with five terms and concepts that lead up to large language models.
The first one is artificial intelligence. This is the broad concept of machines that are able to carry out tasks in a way that we would consider “smart,” quote unquote. And AI can encompass anything from computer programs that play chess, to those that interpret complex data or predict weather patterns. I think given that we have seen different forms of AI until now, it refers to technologies that can perform human tasks or mimic human behavior that require intelligence. And with that I mean tasks that require judgment, reasoning, or decision-making. So this is AI.
One step further is machine learning, which is a technique that allows computers to learn from data and can be programmed or self-trained. So if programmed, you can imagine it like teaching a kid, where the teacher giving the computer a picture book and saying, “Hey, this is a caterpillar.” And we help it recognize what a caterpillar looks like, and it gets better and better over time with practice. And with the pictures we show it it can recognize different types of caterpillar, it can recognize caterpillars in different moves and different backgrounds and so on. We also have unsupervised or self-trained machine learning, and this is when the computer gets no help, it looks like a bunch of photos and then tries to figure out patterns and what a caterpillar might look like, all on its own.
The third concept is deep learning. And deep learning takes machine learning, which was a kind of AI, or which was a method used in AI, one step further. Deep learning uses layers of neural networks to sift through piles of unstructured data from words and pictures, to videos or whatever the kind of data we have. And it used these data to learn from them and make sense of them without being told what exactly to look for.
The fourth one is artificial neural networks, and these are inspired by biological neural networks in brain. These networks process information in ways that mimic the way human brain operates, in a very rudimentary way, of course. And this is the foundation of generative AI systems. Generative AI systems are those that use artificial neural network to generate new content after having learned from the data set. And large language models are then trained on enormous data sets to process language patterns and generate text that is coherent and relevant to context, and also sometimes indistinguishable from text that is written by humans.
Daniel Smith: That’s a really helpful overview and I think breaks it down nicely. And I want to talk in a moment about how people are using tools like ChatGPT in research. But first, just for some helpful context, since we’re talking about research integrity today as well, can you briefly define research integrity and describe its importance?
Mohammad Hosseini: Absolutely. I’m going to use the definition that is provided by the European Network of Research Ethics and Research Integrity. And according to them, research integrity is the attitude and habit of researchers to conduct research according to appropriate ethical, legal and professional frameworks, obligations and standards. Basically the attitude and habit of following norms and doing good research.
In my understanding we can describe the importance of research integrity using three pillars. The first one is that scientific enterprise needs collaborators and researchers who are morally responsible and are aware of norms and ethics of research. Ultimately, this is a requirement because otherwise the science that is generated by their work or by their collaboration cannot be trusted. And so this trust is a key pillar because without it, the public support for research can decrease and the public might be more susceptible to pseudoscience and misinformation. So in this sense I think research integrity is like an enabler of trust.
The second one is that to the extent that research integrity improves research methods, it also fosters a critical attitude towards research and enables us to probe all kind of geographic, political, cultural contexts of research activities, and ultimately foster ethical research. The third pillar I think, is that it enables us to look at research from above, almost like putting research and its results on a hospital bed and dissecting it, whether to find malaise, tumors, sicknesses or so on. Or to find healthy bits that can be reused and give life to next or future research.
I think in that sense these three are the main pillars that describe the significance of research integrity. The first one is trust. The second one is that it fosters a critical attitude towards research. And the third one is that it allows us to look at research from above and dissect it and think about how to improve it.
Daniel Smith: I really like that breakdown. And now that we have a better understanding of both AI and research integrity, I want to get to how people are currently using tools like ChatGPT in research, and how do you think the ways in which people use them will evolve over time, particularly as new AI tools become available or they improve over time or new features are added and so on?
Mohammad Hosseini: Right. In the last year, one thing I have witnessed is how versatile the research community is. And I think for that, it might seem a little bit like a self-serving comment, but I feel like we all deserve a lot of credit for how versatile we are and how creative we are.
I’ve seen a range of different use cases, and I think in that sense we can maybe even think about a spectrum. Recently, a global postdoc survey was published in the Journal of Nature by Linda Nordling. According to them, 63% of researchers are only using it to refine text. There’s fewer who are using it to generate code or edit or troubleshoot code. There is some who are using it to find or summarize the literature. There are some who are using it to prepare manuscripts. There’s some who are using it to prepare presentation materials. There’s also some who are using it to improve experimental protocols. So there is a very large spectrum of use cases.
And because there is this large spectrum of use cases I think we need to think about use cases in specific contexts. So recently I and a colleague of mine, Kristi Holmes, who’s the director of Galter Library here at Northwestern University, we interviewed some librarians about their use cases. And what we found was very interesting, which is that among people who use AI, those who have coding skills are more likely to be more positive about its impact. And maybe they can be described even as heavy users, because either they use it for tasks related to programming, like help with coding or improve existing code or document code or, for other tasks like knowledge management, summarizing, or developing tools that are very specific to their own context.
There are also others who are using it, for instance, in medical research. For instance, many people might know protein folding was a method that was kind of there before AI, but then once AI was used, researchers were able to scale up what existed. And that was a problem that was there from the ’60s and it was finally experiencing a major leap in 2022 because of AI.
Another example is using AI for prediction in healthcare. A recent example is a predictive model that was developed by researchers at NYU to improve accuracy of clinical predictions related to readmission, mortality, comorbidity, and so on. And these are really important predictions in a healthcare setting because they allow hospitals to manage their resources, and to know what kind of patients need to go where, what kind of patients need to be treated by who. It’s quintessential for what hospitals do. And for that too, the NYU researchers trained a large language model based on hundreds of thousands of patient notes and then fine-tuned it to improve their prediction.
And I mean, I can go on for another hour just talking about examples, which I think is a good thing. To my earlier point, it shows that the research community has been very versatile and creative in terms of using these systems to our benefit. This is not to say that all kind of use cases are flawless or generate unbiased results. No, we got to be super careful about how we use them, but I think the fact that we find new ways of using this tool every day is fascinating.
Daniel Smith: Speaking of some of those issues, what are some of the research integrity issues that these tools raise?
Mohammad Hosseini: That’s a spot-on question. I think at the moment, some of the issues that we have seen are those that have been reported. In that sense, what I mean is that there is issues that we can now talk about, but there’s also issues that we cannot exactly anticipate right now.
Some of the issues pertain to errors. We know that these systems make errors because they basically predict, they are not connected. ChatGPT is not connected to a Wikipedia, it’s not a search engine. It is trained on existing data, and then it predicts. So it is maybe, I don’t know, 98% accurate all the times, but then the 2% that is inaccurate, it can be very inaccurate. And we have different kinds of errors, like fabricated citations by systems like ChatGPT. Like you see a name that has been in the field, you see a year that that person has published something, but then you see the page number, bibliographic information, that are incorrect. These are all examples of random errors.
The other one is bias. Bias could be resulted from biased training data, like bias in the literature. We know that there’s all kinds of biases and prejudices in our existing literature. And because these systems are trained on those biased content, they also generate biased content. There’s also bias in algorithms. This is a whole different issue that is not related to using unbiased or biased training data. You can have an unbiased training data, but then if you use a biased algorithm you’re still getting biased content as a result. You can also use these systems in … You can apply them in a biased way. We know that systems like ChatGPT are very sensitive to what kind of verbiage is used. And depending on the verbiage, they can generate completely different content. Again, biases of all kinds can exist.
They also don’t have a moral agency, and I think this is a more like a philosophical question, but to our understanding at the moment they don’t have moral agency. And this is a big problem when it comes to using these systems in research because we’ve worked so hard to try to make a case for employing and training researchers who are equipped with moral decision-making, who can make responsible decisions. And now all of a sudden we have this tool that generates content, collaborates, and helps us in so many ways, but we cannot hold it responsible. We cannot hold it to account. And that is a big, big problem that we are still trying to find ways to deal with it.
Another problem, which I think is more like a [inaudible 00:15:30] problem, is the black box problem. And that is about the fact that we cannot always know how these systems generate what they generate. We cannot reverse engineer a response to find out why it is biased or why it is looking the way it is looking. And this is one of those problems that I think the engineering community has been busy with addressing it, but I think it takes a while and we are not there yet.
Daniel Smith: I want to take a quick break to tell you about CITI’s Program’s new Safe Research Environments Course. This course will provide you with the knowledge and tools to contribute to safer, more inclusive research and academic environments, whether on campus or in the field. You can learn more about this course and others at citiprogram.org. And now, back to the conversation with Mohammad.
You mentioned some pretty significant issues that need to be addressed with these tools, especially when we’re talking about research integrity, where the intent is to increase trust in research, as you mentioned earlier. How are some of these issues currently being addressed? I know you mentioned that there are some engineering approaches to fixing these problems, but can you talk through some of the main ways in which they are currently being addressed or trying to be addressed?
Mohammad Hosseini: Currently, the approach is be cautious. And we have seen, I think one thing that has been very positive, especially this year, has been this movement, I would like to call it, to come up with guidelines about how to use these tools. And in that sense I think universities are doing their best. This is a moving target, so it’s a very difficult space to regulate it and to come up with norms about it, because there’s new tools that are developed and then in two or three months they’re gone, or in two or three months, they just function completely differently.
Like earlier this year we had AI Classifier that was developed by OpenAI, which is the company that also developed ChatGPT. AI Classifier was meant to identify AI-generated text. So it had this text box, you copy paste the content, and then it says to what extent it looks like this text is generated by AI. That tool was there for a couple of months and then it disappeared from the market. Or ChatGPT this year, at the beginning of the year, we were still working with GPT-3.5, and then I think it was around maybe March when the paid version was released, and that was GPT-4. And then there was this range of plugins that were introduced that could do all kinds of things.
So it’s a very difficult space to want to regulate, but universities are still trying their best to come up with guidelines and suggestions on how to use them. We have had some sort of draconian measures that suggest, “Oh no, you should not use this,” or, “We don’t accept content that is generated by AI.” But I think those measures are still a little bit immature because there’s no way for us to exactly identify with 100% accuracy what content, especially text, what content was generated by AI. I think with images they are going to be a little bit more successful, but with text it’s very complicated to figure out which content was generated by AI or what content has weaved AI generated text with human text.
Daniel Smith: Absolutely. And it’ll be interesting to me to see how these tools and technologies and policies evolve in order to address some of these issues.
But I want to go back to what you were talking about with the different uses of AI in research and people’s feelings within research, but also in education and healthcare. I know you and some colleagues recently conducted an exploratory survey about using ChatGPT in education, healthcare, and research. And in that you explored some of people’s feelings about the use of these tools. Can you tell us some more about the results of that survey?
Mohammad Hosseini: Absolutely, yeah. This was a survey that we conducted during an event called Let’s ChatGPT! in February of this year, in 2023. The event was organized by the Institute for Artificial Intelligence in Medicine here at Northwestern in Chicago. I think we had 420 respondents with different roles. We had medical students, graduate students and postdocs, clinical faculty, research faculty, admin people, and so on.
And we found that those who had used ChatGPT were more interested in using it in a wide range of contexts. We also found that those who were in earlier stages of their career were more positive about using AI. And our interpretation was that this could be because maybe more experienced researchers have witnessed the introduction and rise and fall of more technologies, and are generally more likely to be critical about them. Or that we have a generational divide between millennials and those who are older, and we just have different outlook towards technology depending on how old someone might be or at what stage of their career they might be.
The other interesting thing was that, as you rightly said, we explored the use of ChatGPT in three contexts, but of those three, the greatest uncertainty among our respondents was about using ChatGPT in education. So it wasn’t in research, it wasn’t in healthcare, the greatest uncertainty was about using it in education. And this was clarified in the Q&A of our event, and to our understanding was mostly due to the inaccuracies of content generated by ChatGPT. And people are afraid that these systems would be used to explain medical concepts or to find fast solutions, or as a substitute for their own work and understanding the material. Like, I don’t know, you might take shortcuts or not do the assignment and so on, which is harmful for education. And I think the concern is absolutely spot on.
Another point is about risking clinical reasoning skills if these systems are used more often. For instance, we had this person who said something very interesting. They said writing clinical notes helps students internalize the clinical reasoning that goes into decision-making. And so until this knowledge is cemented, using AI would be harmful for medical students. And I think this is one of those things that we don’t think about when we start using these systems, and it’s one of those things that needs time to show us the consequences. We need to see a whole generation that grew up with ChatGPT to see what its impact is going to be on education.
This hype in the beginning when ChatGPT was released about, oh yeah, how to assess students based on that or how to use it for evaluation or how to use it for teaching, which I think are all really cool and creative, but we haven’t seen the long-term impact of it. And that points to the lack of empirical evidence about the effectiveness of these systems when they’re used for teaching different students. I think people who have any experience with teaching a class know that different students have different talents and learn in different ways.
People like Sam Altman, who’s the CEO of OpenAI, or others who are in those roles, are definitely biased because they’re making billions of money from this technology. So of course they are going to heavily promote it and say that, “Yeah, education system can be versatile and we have dealt with previous technologies and so on, and this is just another technology.” The point is, we haven’t seen the consequences of using this technology in the long run. And there’s people who say, “Oh, we should use this as a mentor.” I think it’s great to use this as a mentor, but we should do some experiments with it before going full-on ChatGPT mentorship mode.
Daniel Smith: In terms of those long-term negative impacts on society, such as the impact of the spread of misinformation, how do you think these tools need to evolve to help mitigate those impacts?
Mohammad Hosseini: Yeah, I think that’s quite an interesting and complicated question. I think they have to evolve these tools, I think they have to evolve transparently, and this is what we have not seen yet. They still haven’t shared the training data. Step one of transparency, how did you train the model? They haven’t told us. And there’s been several executive orders, at least here in the US, that demand AI developers to be more transparent and so on.
But I doubt we can achieve the level of transparency that is required from these companies because they enjoy propriety right, copyright, and so on. And they have all the lobbies in the world to bypass all kinds of rules and regulations. Like earlier in the summer, we heard about voluntary commitment, which I wrote about it in the Chicago Tribune. And I think voluntary commitment is a joke because it’s so vague. And beyond that, even when we have established laws and commitments are mandatory, these companies often find ways of navigating around them and push the boundaries of what’s permissible. So transparency is needed, it’s just difficult to enforce.
I also think that these models need to evolve in ways that benefit specific communities, and for that we need a data-driven approach. But at the moment what we see is that these tools are developed for those who, well, one, can afford them and can use them. But also the tools that we are seeing right now mainly solve problems that are … Maybe this may sound a little bit unorthodox, but I think they are problems that are faced by people who sit in an office. The challenges we face, like global warming and so on, are problems that need people who go in the ground, who go to the physical world. Solving those problems don’t need a computer, solving those problems need going out, and these tools cannot do that yet. So we need to try and develop these tools in ways that would enable us to solve real-world issues.
In terms of misinformation in particular, I think we were already challenged by this problem and AI has just exacerbate it to a whole different level. Recently I saw this article that was published in the New York Times that used some AI-generated content to spread misinformation about the war in Gaza. I mean, scary stuff. But it is there and it is being used to do all kinds of crazy things. And I think for this we need to educate specific communities and the public basically, and support them with information literacy. Information literacy I think should be included in the curriculum of K-12 students from early on because this is going to be huge. And if you don’t start training students and all kinds of people, we are going to be in a mess in no time.
And I think AI companies should do more to create awareness about their tools and services. But I also understand that this is costly and they might not be interested in talking about their tools in ways that discredit them. It’s like, it took us almost 100 years or maybe even more to convince tobacco companies to write on the label that this kills. I think it will hopefully not take us that long, but a similar approach would be needed for AI companies to tell users that these tools can be used to tell lies. These use can be used to generate misinformation. Yeah, I think we are a long way ahead of that. But in short, I think we need more education and information literacy.
Daniel Smith: In terms of education and information literacy and just understanding these tools better, are there currently any additional resources out there that you would suggest to our listeners so they can learn more and improve their own understanding of how these systems work and the level of scrutiny that they should give to the information that they’re getting from them?
Mohammad Hosseini: I’ve seen a lot of good content on the internet, and especially in the scholarly space. I’ve seen many universities who are doing amazing job in terms of just educating their own researchers. Believe it or not, there’s no single day that I’m not receiving an invitation to attend a workshop or round table or conference or seminar, you name it. There’s all kinds of events about it. And I think this is, in a way, a good way of educating ourselves about it. I understand that we are putting the onus on the community to educate themselves. And a community that, especially in academia, has already burnt out five times. But I think we have a responsibility to educate ourselves about it.
In my space, which is the university space, there’s a lot of resources. For instance, my university has this dedicated institute that is called Institute for Artificial Intelligence in Medicine that organizes all kinds of ai-related stuff, from round tables and workshops to brainstorming and whatnot. And I’ve seen the same in other institutions. I’ve seen a lot of public libraries that are doing really cool campaigns about information literacy and AI education, obviously here in the US. And I think this is the way to go, especially with libraries. Like public libraries, I think are becoming a central hub of communities in terms of educating them for and with people who know the content and know the context in ways that is beneficial for the entire community.
Daniel Smith: There definitely is an abundance of resources out there. Like you mentioned, I am constantly seeing events and workshops and webinars and so on. And like you were also just mentioning, even within my local community, I know the public library also does host weekly chats with the community about AI and addressing people’s concerns and just talking through the technology and so on. So I second all of those suggestions as well.
On a final note, do you have any thoughts you would like to share that we did not already touch on today?
Mohammad Hosseini: Sure. I think given that your listeners are probably mostly active in research ethics and integrity domains, I have a suggestion for your listeners, and that is that if you can, try to regularly try one or more of these tools, because many people are using them. And as I mentioned in the survey that was conducted by Nature Journal, people are using it in different ways. And this is key because even if you think you don’t need these tools, please be cognizant of the fact that other people are using it. And specifically, if you are in the research ethics and research integrity landscape, it’s very important to use these systems and learn about their strengths and flaws because many researchers are using them.
And if we are to identify these problems that are caused as a result of using them, or if we are to speak about these problems, we need to use them, we need to be aware of them. I understand that there might be some reluctance in terms of what’s going to happen with my data or what’s going to happen with my privacy and so on. But there’s a lot of ways around them. You don’t have to use your real name, you don’t have to use your primary email account. You can just open a dummy email account and start using one of these systems, just to see how it works. And then whenever you’re tired, whenever you’re dissatisfied, just delete your account and you’re off to go.
But don’t think that because you don’t need it, you don’t have to use it. You have to use it because others are using it. That’s my final suggestion for the listeners.
Daniel Smith: I think that’s a wonderful suggestion and a great place to leave our conversation today. So thank you again, Mohammad. It was great talking with you.
Mohammad Hosseini: Thank you so much.
Daniel Smith: And I also invite everyone to visit www.citiprogram.org to learn more about our courses and webinars on research, ethics and compliance. You may be interested in our Essentials of Responsible AI course, which discusses the principles, governance approaches, practices, and tools for responsible AI development and use. And with that, I look forward to bringing you all more conversations on all things Tech Ethics.
How to Listen and Subscribe to the Podcast
You can find On Tech Ethics with CITI Program available from several of the most popular podcast services. Subscribe on your favorite platform to receive updates when episodes are newly released. You can also subscribe to this podcast, by pasting “https://feeds.buzzsprout.com/2120643.rss” into your your podcast apps.
Recent Episodes
- Season 1 – Episode 12: Ethical and Policy Issues for Xenotransplantation Clinical Trials
- Season 1 – Episode 11: Technological and Safety Considerations for Autonomous Vehicles
- Season 1 – Episode 10: Human Subjects Research Ethics in Space
- Season 1 – Episode 9: Impact of Recent Social Media Platform Changes
Meet the Guest
Mohammad Hosseini, PhD – Northwestern University
Mohammad Hosseini is an assistant professor in the Department of Preventive Medicine at Northwestern University Feinberg School of Medicine. Born in Tehran (Iran), he holds a BA in business management (Eindhoven, 2013), MA in Applied Ethics (Utrecht, 2016) and PhD in Research Ethics and Integrity (Dublin, 2021).
Meet the Host
Daniel Smith, Associate Director of Content and Education and Host of On Tech Ethics Podcast – CITI Program
As Associate Director of Content and Education at CITI Program, Daniel focuses on developing educational content in areas such as the responsible use of technologies, humane care and use of animals, and environmental health and safety. He received a BA in journalism and technical communication from Colorado State University.