On Tech Ethics: The Impact of ChatGPT on Academic Integrity

Season 1 – Episode 3 – The Impact of ChatGPT on Academic Integrity

Our guest Chirag Shah, PhD, founding director of InfoSeeking Lab and founding co-director of the Center for Responsibility in AI Systems & Experiences (RAISE), discusses the impact of AI on academic integrity, with a focus on ChatGPT.

Episode Transcript

Click to expand/collapse

Daniel Smith: Welcome to On Tech Ethics with CITI Program. Our guest today is Chirag Shah, and we are going to discuss the impact of AI on academic integrity. And I should note that we are going to focus on ChatGPT given its recent widespread use and attention. Now, before we get started, I want to quickly introduce our guest. Chirag is a professor of information and computer science at the University of Washington in Seattle. He’s the founding director for the InfoSeeking Lab and the founding co-director of RAISE, which is the Center for Responsibility in AI Systems and Experiences. He has also worked at industrial research labs at Spotify, Amazon, and Microsoft.

In addition, I want to quickly note that this podcast is for educational purposes only. It is not designed to provide legal advice or legal guidance. You should consult with your organization’s attorneys if you have questions or concerns about the relevant laws and regulations that may be discussed in this podcast. In addition, the views expressed in this podcast are solely those of our guests. All right, so welcome to the podcast, Chirag.

Chirag Shah: Thank you, Daniel. Nice to be here. And I’m glad you had that little disclaimer because, yes, there will be a lot of opinions here, but I also want to stress that those are my own.

Daniel Smith: I gave just a very brief introduction to you, but can you just start off by telling us a bit more about yourself and possibly your current research interests?

Chirag Shah: Yeah. So I work in information access. And by that I mean systems searches and on search recommendation systems. And I specifically look at making these systems intelligent, proactive, personalized. And as we start doing this, we realize that there are also issues with fairness, equity, bias in these things. So a lot of my work also revolves around trying to identify these issues, mitigate them. And so I find myself in this very interesting intersection where, on one hand, I’m building these AI systems for information access, but I’m also aware of their potential harms and trying to make sure that we can do due diligence to avoid that and still benefit from the AI technology. So a lot of my research at University of Washington, as well as in collaborations with other places, including industry, are in intelligent information access systems and what falls under, broadly speaking, responsible AI.

Daniel Smith: So before we move into some of the specific issues that these systems raise, can you quickly define artificial intelligence? I know there’s a lot of varying definitions out there, and there’s different types of AI which might influence the definition. But to you, what is a succinct definition for our audience?

Chirag Shah: Yeah. So AI refers to making systems that mimic human behaviors and tasks. And this is a classical definition, this is a classical notion of what AI is meant to do. And so we are all familiar with a lot of these sci-fi stories and movies in which we have these autonomous systems, robots usually, that are able to act like humans, think like humans. And so that’s sort of the beginning of it.

But then of course the challenge is how do you know if the system is actually intelligent, or just acting or faking being intelligent? And so that’s where a lot of the AI things today comes down to machine learning, which doesn’t necessarily care about systems looking like or behaving like humans as long as they can do tasks like humans. And so again, there’s a broad field of artificial intelligence, and then there is this subfield, rather large of that, machine learning. And so most of the things that we today encounter in AI are really more about machine learning rather than the classical notion of that AI.

Daniel Smith: So you mentioned the classical notion of robots and things like that. But recently something that has got a lot of public attention is large language models like ChatGPT. So can you briefly tell our listeners, those systems, what large language models do and how they work?

Chirag Shah: Yeah. So the idea behind language models is essentially, and this comes from natural language processing originally, but it comes from human languages. The way we understand languages, we have of course our vocabulary, but that’s not how we learn language. We learn language by hearing phrases and sentences in a certain sequence, certain way, in certain context. And so language models are designed to capture that, which allows them to understand if a phrase is coming from language A or B. Or given language A, what kind of phrases could we generate?

Large language model is essentially just doing this at a really large scale. So the most common famous example is OpenAI’s GPT or generative pre-trained models. So that’s built on a huge corpus taken mostly from Wikipedia and other sources. And so when you input that much text, that much language, you’re able to have this machine, this model, start identifying certain patterns. So even as we are speaking here, and if you’re a English speaking person, you can start to even complete our sentences. And we have this ability to complete sentence because we have seen enough of those instances where the same words are used in that sequence, and we know intuitively and empirically what typically follows. So our ability to finish sentences is essentially a demonstration of how language model or now large language models work.

So they’re able to learn from large amounts of data like that. And because of that, they’re able to capture this patterns that allow them to do auto completion. And once you start doing auto completion, you can also then start generating completely new sentences without even seeing anything before. So that’s what typically these things start doing and that’s what we are seeing. So the example of ChatGPT, or many other just chat based products that are built on top of that GPT are able to generate sentences in response to some questions or query or conversation because they have this build knowledge of what English language works, how it works, what are the patterns, and so it’s able to connect those concepts and generate new sentences rather than simply retrieving or extracting them.

Daniel Smith: So that’s a really helpful explanation, especially as I know I think we’re all probably have at least seen or even maybe experimented a bit ourselves with seeing how things like ChatGPT work. But on that note, what are some ways in which you have seen people using ChatGPT particularly in an academic context?

Chirag Shah: Yeah. So these days people are using ChatGPT for all kinds of things, including things that perhaps they shouldn’t be using. So it’s very easy to find a lot of examples. I mean, originally the intent is to give you answers. So it’s designed for getting you answers. So you can have questions. And unlike a typical search engine where you put that question or a bunch of keywords and you get a bunch of links or sometimes extracted passage, ChatGPT is able to generate that passage. So it’s not giving you some document links, it’s also not just giving you some passage that it has extracted from a webpage, it’s actually creating an answer for you for that moment. And so that’s a really powerful thing to have a personalized answer generate for you.

But that being said, that ability also allows it to do all kinds of other generations. So people have been using it for generating reports. And so beyond just asking questions, you can actually ask it to analyze something. One of the examples that I saw recently is somebody pointing ChatGPT to a paper and asking it to summarize it. And so ChatGPT is able to do that. I’ve seen people using ChatGPT for ideation, generating some concepts, writing proposals, creating arguments, and even writing full papers or full sections of the paper. So again, these are instances that I would argue people shouldn’t be using ChatGPT for, but we are certainly seeing people using it. There are people who have used ChatGPT to construct their exercise and diet plans. All kinds of financial consulting thing. Again, these things are being used, these are cases, but it remains to be seen how advisable these things are for an average user with low information literacy or low kind of motivation for cross-checking some of these answers.

Daniel Smith: So yeah, obviously given those wide uses that you just mentioned, I think it’s compelling for a lot of people to try it for different things. And you mentioned some of the reasons why it should give them pause. But I just wanted to circle back on that and get you to share a bit more on what are some of the major limitations of large language models from an information seeking perspective. And given those limitations, why should it give folks pause?

Chirag Shah: Yeah. So as I described how these things work, I mean think about it for a second, they’re trained on this GPT, which is based on this English corpora, textual corpus. So now imagine if all you have done in your life is just read a ton of stuff. Now clearly that’s going to give you a lot of knowledge, but is that enough for you to be able to do say reasoning? So this is similar to imagine going to a new place. Clearly there is some value we derive going to an actual place versus reading all about it. So ChatGPT and these tools are currently at a place where they’re able to read a lot about a place, but they’re not able to visit it. So they don’t actually know what it is like to be there and do things, feel things. They’re only going about this based on what everybody has written about that place.

So imagine if you were to construct your own model and your own understanding of a place, clearly, I mean you come up with a pretty good understanding of a place. Because assuming that there’s a lot has been written about, talked about that place, but we could all agree that it’s not the same as actually being there. So a lot of the limitations for these tools come from the way these models are built, that they lack the real reasoning.

There is, from an information access standpoint, there’s also a lack of transparency. So in a typical search or [inaudible 00:15:59] system, you would have something returned to you with the source. So you’d know it came from certain websites, certain blog, or whatever it is. Same thing in social media, you can check that. ChatGPT and other GPT based tools, they are synthesizing information. So they’re making stuff up based on what they know, what they’ve read, what they’ve connected. In other words, there is no track back to their sources. And so you just have to trust this information on its face, and you don’t have the ability to learn about how this tool came to synthesize this answer. And what if that answer is really bad, or wrong or biased? You don’t have the ability to question that and validate that. So that’s a big problem.

These things, because they have amazing ability for natural language generation, they are able to converse with us just like a human being would. So that kind of gives us a false sense of authority and trust. And clearly for a lot of the things, these things are able to give us good answers, they’re even right answers, and they’re delivered to us in a language that we understand. They’re not just a machine language. They’re not just a bunch of links or bullet points. They’re actual responses like a human being will write. So that creates this false sense of trust in this authority that makes it hard for us to question that one out of 10 or two out of 10 times that it will give us bad answer or wrong answer, completely biased answer.

So there’s a real danger there where there’s a lack of transparency, and essentially, it’s just trust me, I know what I’m doing. And we take that, we believe that, so that’s actually can be really problematic. And we already seen many, many cases of those bad behavior from ChatGPT, even though it has a lot of great guardrails, they’ve done a really great job not getting into some of the controversial topics or avoiding some of the questions. Even then, people have been able to find things where clearly it’s showing biases. And for a unsuspecting user, they may not realize that there are these challenges because they have no ability to question this, to check, validate. So that’s actually big problem from the information seeking perspective.

And one final thing I would say, when we use other tools for seeking information, whether it’s a search engine, recommendation system, there’s usually a little bit of process involved that, sure, you start with a question or keywords, and then you get a bunch of answers, but you have to sift through that to find something that works for you. And then you find something that changes your perception of what’s good or right, or bad, and then maybe you revise your question or query and so on. And we’ve all done that. So that process is actually quite important.

I mean, this is essentially an educational process that we are not simply doing, let me throw a question and just get the answer and be done. We’re actually going through this process that allows us to learn. And so when these AI agents start giving us straightforward answers, cutting down that whole process, to me that’s one of the biggest dangers of doing this. Because imagine you’re in a classroom, and if you argue that the ultimate goal of this is to just get the right answer, then why are we wasting all our time learning to do that math, or going through that reasoning process? Why not just give everybody just the answer and be done with it?

So cutting down that process, which is of course it has some friction, I mean we have to work. There’s some effort. But there’s a value to that effort. There is value to us putting that work, for our learning process as well as just reasoning processes. So cutting that down to just get us to the answer, even if that’s the same answer we would’ve achieved by working a little hard, it’s very important that we do it that way. So I think there’s no disclaimer given here, there’s no protection given to an average user that these are some of these bad things with it. So these are some of my thoughts on limitations of this LLM based ChatGPT.

Daniel Smith: I think they’re all great points. And I think to kind of highlight something that is not a new concept, which is issues with academic integrity like cheating and plagiarism, and things like that. And it almost seems like, given some of the limitations and concerns that you just mentioned, that this tool could lead to academic integrity issues. And I think that this has already been a concern raised among instructors across all different grade levels and into higher education and beyond. So my question is what impact do you think that large language models will have on existing academic integrity issues in the long term? Will it amplify them? Will it create new issues? What are your thoughts on that?

Chirag Shah: Yeah, I mean we are already starting to see this, right? I think it was in New York that they banned ChatGPT in schools because of the issue of plagiarism. There are other places, not just in the US, but a lot of other countries also already. I mean this, I don’t know, two months old, and we’ve already had waves of these concerns and actual practical actions because of this issue. That’s not even in the future, that’s happening right now.

So there’s already this challenging task called plagiarism detection, which in academic context has always existed. And it’s always a challenging thing to explain to the students what is academic integrity and what is plagiarism? What is okay, what’s not okay? But typically we deal with this in like, oh yeah, you take something from some source, you have to code that or cite that source. If you work in collaboration, you have to act according to this plan, and then so on.

This ChatGPT based thing, takes it to the next level where how do you explain to somebody to what extent is this okay, or is this not okay at all? Because this is not as simple as going to someplace and asking for some information where you can code the source. These things are based on generative AI, which means they’re actually generating new things that may not exist on someplace, somewhere in a specific form that you can code or cite. And so how do you do that? So even if you say it’s okay to use ChatGPT, it’s not clear how you can do the attribution because this is simply having somebody else do your homework. And in that case, most educators are not going to be okay with that, and then they’re going to deny this. But how do you detect this?

So it’s already challenging to detect this kind of plagiarism, which is a new kind of plagiarism. Now we already have tools often built into our course management systems that allow us to find the possibility of plagiarism, but they are almost outdated now for these kind of dealing with this generative AI. Now there are of course some works in progress and some things have come out, but I think it’s going to be this cat and mouse chase where of course these GPT based things are going to keep getting sophisticated, which means we’ll have to have even more sophisticated tools that would allow us to detect the plagiarism. Because no matter how much we enforce this, we’re not going to be able to stop this. People will use these tools to write their reports and do their homework. And so I think it’s going to continue being challenging thing.

The better thing to do really is to educate people about this, and really emphasize why we are doing what we’re doing in classrooms. Because I can see already students questioning what’s so wrong with, say, ChatGPT writing my research proposal? Because ultimately it’s not the proposal that’s real research, it’s my data collection analysis, and that’s the real research. But that goes back to question why is it not real research? Should it be part of the research, writing the proposal? Now, I would say yes. And that means that it’s not okay for you to just use this tool to do that part of the research that I want you to actually learn and write that yourself.

But I think there are a lot of these fuzzy boundaries here. I mean, we go back to the tools like calculator. I mean we all use calculator, and so you could argue what’s wrong with that? I mean, it’s just giving us simple answers that otherwise we would just waste time doing those things. And that’s not even the real problem. We have bigger things to do. So it’s okay that we use calculator. So how is this ChatGPT that much different from a calculator?

And so I think, again, it’s more of a rhetorical question at this point, and I’m happy to answer that too. But the point I’m trying to make is these things are going to keep resonating within our academic environments from teachers, from students, from parents, from policy makers. Where are the boundaries, and how do we enforce them? So, so far, the challenge was really about enforcing the boundaries. And we’ve had these tools and we’ve had these guidelines and training about plagiarism and things like that in our education system. Now they’ve become very challenging. But the very notion of where the boundaries are, that also is now being questioned. So I think we are in a new uncharted territories where it’s no longer just about these tools, but how do we think about the value of education, the very value of education, and some of these education processes? We have to start questioning that because our students are definitely already starting that.

Daniel Smith: Yeah, absolutely. And I think, or at least I’ve seen already, some different guidelines and rules popping up. I’ve seen universities that provide kind of tips and guidelines for if people are going to use or try to use large language models for different activities. And more recently, I’ve actually seen academic journals come out with guidelines of their own. And one example of that is the journal Nature no longer will allow an LLM tool to be accepted as a credited author on a research paper. And their reasoning for that is because attribution of authorship carries with it the accountability for the work, and the tools can’t take such responsibility for that work, which I think is kind of an interesting point that they’ve made. Are there some other strategies or tools that people can use to better navigate these issues as more formal guidelines and rules like those are still in development?

Chirag Shah: Yeah, I think a lot of the work, I mean, I’ve seen proposals and I’ve seen some tools. There’s a GPTZero tool. There are other approaches. Some of them work well within a closed context, some don’t. So for instance, one approach for limited set of questions, let’s say you’re doing a homework assignment, and you have specific questions that you’re giving for writing responses essay-like. One approach is to use ChatGPT to generate lots of responses for the same question. Every time ChatGPT is going to have at least some differences in those responses. And then use that to check against student responses for plagiarism detection. So anyone can even do it. You don’t need a special tool, software, or anything like that. But I mean it does require some effort on the teacher or the TA side to have this little bank of possible answers to compare student responses against for possible plagiarism detection.

I have personally not tested how well this works, but I can see that working to at least good extent. Obviously this can work for any kind of questions, for any kind of open-ended things. But that’s where there are some other tools being developed that I’ve seen. I think it’s still early for some of these efficacies to see how well they work. And then as I say, there’s going to be this cat and mouse game where people will keep finding ways to bypass some of these guardrails because, one, they’re fascinated by this, two, they can see immediate value in using this to construct their responses. And so yeah, we are in uncharted territory.

Daniel Smith: Absolutely. And I think that is a perfect note to end today’s conversation on. I definitely encourage our listeners to check out Chirag’s work to learn more about some of the issues we discussed today. I’ve included a few additional resources in our show notes to get you started. I also invite you to check out CITI Program’s On Research Podcast, which is hosted by my colleague Darren Gaddis. In an upcoming episode, Darren is going to talk about the application of AI in research with Chirag. In that conversation, they’re going to talk about AI in quantitative and qualitative research, some benefits of using AI in a research setting, and some ethical issues to consider. You can subscribe to On Research wherever you listen to your podcasts. Okay. So that is all for today, and I look forward to bringing you more conversations on all things tech ethics in the near future.

How to Listen and Subscribe to the Podcast

You can find On Tech Ethics with CITI Program available from several of the most popular podcast services. Subscribe on your favorite platform to receive updates when episodes are newly released. You can also subscribe to this podcast, by pasting “https://feeds.buzzsprout.com/2120643.rss” into your your podcast apps.

Recent Episodes

Meet the Guest

Chirag Shah, PhD – University of Washington

Chirag Shah is Professor of Information and Computer Science at University of Washington (UW) in Seattle. He is the Founding Director for InfoSeeking Lab and Founding Co-Director of Center for Responsibility in AI Systems & Experiences (RAISE). He works on intelligent information access systems with focus on fairness and transparency.

Meet the Host

Daniel Smith, Associate Director of Content and Education and Host of On Tech Ethics Podcast – CITI Program

As Associate Director of Content and Education at CITI Program, Daniel focuses on developing educational content in areas such as the responsible use of technologies, humane care and use of animals, and environmental health and safety. He received a BA in journalism and technical communication from Colorado State University.