Ted Pedersen [00:00:00] The goal of computer science is to bring new ideas, changes, hopefully betterment to the world. If the scientists and engineers wash their hands of social problems and reality of life that we have, who is going to do that in the context of large language models?
Patricia [00:00:38] Welcome to another episode of The AI Purity Podcast, the show where we explore the complex intersection of artificial intelligence, ethics and its social impact. I’m your host, Patricia, and today we are honored to have with us a distinguished figure in the field of natural language processing and computational linguistics. With over two decades of experience teaching at the University of Minnesota Duluth and serving as a principal investigator at the Minnesota Supercomputing Institute, our guest brings a wealth of knowledge and insight into today’s discussion. His research not only explores the intricate workings of language technology, but also delves into the profound social implications of artificial intelligence. In recent years, he has been at the forefront of examining the potential harms of AI and NLP, shedding light on the ethical considerations that underpin the development and use of these technologies. Today, we have the privilege of learning from his expertise as we explore the impact of AI on society, the ethical foundations guiding its development, and the imperative of responsible innovation in the realm of artificial intelligence. Join us as we embark on a thought provoking conversation and welcome to the podcast, Dr. Ted Pedersen. How are you doing, Dr. Ted?
Ted Pedersen [00:01:47] Well, thank you! It’s a pleasure to be here.
Patricia [00:01:49] We’re really excited. So, we want to ask you, first and foremost, how did you first become interested in the field of natural language processing and computational linguistics?
Ted Pedersen [00:01:59] Well, it goes back quite a while. I long had an interest in languages and learning languages. And so, in high school and as an undergraduate, I studied German, and I really enjoyed it, but I wasn’t terribly successful at it. So, it was kind of frustrating, but it was very thought-provoking. And as I was finishing my undergraduate degree, I had a free elective and decided to take Spanish, which worked a little better. I felt like I was learning a little better, and I began to pursue that even after graduation, when I was working by taking classes at night, and practicing, and listening to the radio, and things like that. And when I went back to graduate school, I was taking a class in compiler construction, and you do a lot of parsing and generating of programing languages. And I realized that, well, this certainly feels a lot like what we do in a language like German or Spanish when you’re trying to learn, and we parse it. We try and generate, and I thought perhaps I had discovered some brilliant idea that you could use compiler technology to process natural language. It turns out that a lot of people had realized that for many years before, and it was one of the ways that machine translation used to be approached. But that allowed me, when I discovered that, that there was this long history of that kind of work in machine translation, it brought me into natural language processing and machine translation, then and now has remained one of the big applications of natural language processing. So, that was what got me started.
Patricia [00:03:44] And, can you explain the significance of your research to develop systems and methods that automatically discover the meaning of written text and understanding its content or context?
Ted Pedersen [00:03:55] Sure. So, the general idea has been that there’s a lot of knowledge that we as human beings record, and convey, and share in written text. And as we all know, that amount of information is staggering. And so to be able to automatically go through, you know, large amounts of text to determine what it’s about, what kind of information is contained in there, what isn’t contained in there, is very practical. First of all, to help manage all of that information and also kind of fascinating to see all the different ways that similar ideas can be expressed, for example.
Patricia [00:04:43] Dr. Ted, you’ve been an educator for quite a few years now. [00:04:47]Do you think AI can help address the needs of diverse learners in educational settings? And if so, can you cite some instances where you’ve observed this? [7.7s]
Ted Pedersen [00:04:55] [00:04:55]I guess the starting point for an answer is to talk about what we mean when we say artificial intelligence. And I think now, and I think over the last year or so, artificial intelligence has come to mean the large language models that we see, like ChatGPT, for example. And that’s really kind of dominated the attention with good reason, because it has a wide range of applicability. So, I think when I’m speaking about AI, I’m probably for the most part referring to those large language models, there are other parts of AI, but I don’t think many of them have reached the same kind of critical mass. I think – so then to answer the question a little more specifically, I think maybe the verdict is still out on that. I think we are early in the days of wide deployment of large language models, and certainly they are being used in educational settings both by teachers and students with and without, you know, the knowledge of each other. And I certainly think it opens up another way to access, and find, and create information. And it may well work better for certain kinds of learners. There are important caveats, though. Large language models depend on large amounts of language, which is typically available in languages like English, and Mandarin, and Spanish, and German and so forth. And there are a lot of languages in the world that are not well represented online, and that large language models don’t really do terribly well with. So, there is a potential for a kind of divide opening up between the languages and people who use the more popular, more online languages as compared to those who don’t. So that would be something I would be concerned about. [121.5s]
Patricia [00:06:58] [00:06:58]Could you share some misconceptions that some people may have about the capabilities of large language models in education? [5.8s]
Ted Pedersen [00:07:06] [00:07:06]Yeah, I think there are a lot of misconceptions right now, and I think that’s the big battle. First, and most importantly, the large language model does not understand what it is generating or what it is answering. It is stitching together content that it has found online, and stitches it together in marvelous and creative ways, but it’s only doing that in such a way to create a reasonable and sort of plausible grammatical response. But it doesn’t do anything or doesn’t reflect any kind of understanding, and it doesn’t reflect any kind of fact-checking. And so, there’s a danger, I think, of very plausible sounding replies or responses coming out of a large language model that contain wild misinformation or just not correct, just don’t have the right facts. And so, I think students and educators alike need to be aware of those limitations, in particular, because it does sound very plausible. You know, the large language models have kind of a voice of authority. They sound very confident. And unfortunately, the information is sometimes just not that good. And, you know, students and instructors alike can both be misled by that. So, I think understanding those kinds of limitations is really important. [90.4s]
Patricia [00:08:38] You were saying earlier that AI-generated text platforms like ChatGPT are being used in schools by students, maybe educators, without them even knowing about each other. [00:08:48]In your experience, have you seen or can you share any examples of successful implementations of AI and education that you’ve come across? [7.7s]
Ted Pedersen [00:08:57] [00:08:57]Sure. So, these are things that I have discussed with colleagues and heard about, not done myself. I’ve actually taken a very cautious approach myself, but I know that some of my colleagues have, for example, had students ask ChatGPT to respond to certain questions or to generate a summary of something, and then ask the students to go back and critique it, to look for the places where it’s right and to look where it’s wrong. And I think that’s actually a really valuable exercise, because students then begin to appreciate that, well, this is not always right. I can’t always trust this. And it can also, I think, turn the tables a little bit and help the students feel a little more empowered or maybe a little more knowledgeable, and so that they’re not just intimidated by these models, and I think that’s a very good idea. I guess one thing I’ve done, I don’t do it all the time, but, I will at times share the output of ChatGPT for a question or a prompt that I may have given to the students just to kind of discuss this is the kind of information that’s included and do the same kind of exercise where I point out some of this isn’t very good, you know, this is not something that you should be relying upon. And, you know, please be aware of that. [89.8s]
Patricia [00:10:27] [00:10:27]And do you think AI can be integrated in a way that can improve accessibility and inclusivity in education? [6.3s]
Ted Pedersen [00:10:35] [00:10:35]I mean, I think so. I am – despite all of the cautions that I’ve already raised, I do feel like the large language models do represent an important kind of advance where I think if we think of them less as authoritative oracles that provide correct answers and so forth, as more of a way of making information more easily available as a kind of upper level of a Google search, in a sense, almost to give you ideas about what you can look for or perhaps ideas to pursue, a kind of brainstorming approach. And I think that can be very helpful to a lot of different kinds of students. And so, the information that is being provided by these models is certainly, you know, information that’s out online. So I think we need to think of it as a kind of maybe redistribution or redisplay of that information in a different way than, let’s say, we would see from a Google search, subject to certain caveats that I’ve already mentioned. [71.0s]
Patricia [00:11:48] And earlier, you were saying that you would sometimes show prompt examples in class. [00:11:53]What are your thoughts on the potential biases that may exist in AI generated content, and how can educators mitigate them? [6.7s]
Ted Pedersen [00:12:01] [00:12:01]Yeah, you have to be really careful, because there is a lot of misinformation and even hateful, and racist, and sexist content online that these models are being trained on. And I know that OpenAI and some of the other providers are trying to include filters and things like that, but even with that, there is a danger that kind of content will come through, and that’s where I think it’s important. As an educator, you can’t anticipate all of the different examples that might slip through, but I think it’s very important to talk candidly with the students about the limitations that this is not always going to be correct. It may, in fact, contain misconceptions, misinformation, biases, discrimination, and, many other negative kinds of things. And so, it’s it’s really a kind of information awareness that I think we need to advocate for and work towards when using these tools. [66.6s]
Patricia [00:13:09] [00:13:09]And speaking of these limitations, what do you think are the key principles or guidelines that should govern the responsible use of AI in educational contexts? [8.3s]
Ted Pedersen [00:13:19] [00:13:19]Yeah, that’s a really important question. And I and I think the first thing I would say is there needs to be a lot of transparency, both coming from the educators towards the students and from the students to their professors and instructors. And by that, I mean, I think the dynamics become very negative if students are using it sort of in a way that makes them feel like they’re maybe breaking the rules or they’re not sure, and so they become maybe a little furtive and secretive about it, and they may turn in something that doesn’t make a whole lot of sense, and then when they’re asked about it, it becomes very awkward. And so, I think it’s really important, first, for educators to be clear about the boundaries of when you can and can’t use these kinds of tools. And it’s very important for students to also be clear about, “Yes, I did use this and this is how I used it.” And I think that should be a part of, you know, any sort of assignments that we are giving that allow for the use of those kinds of tools, is for the students to kind of be very clear about what did it give them, and then how did they adapt that, or modify that, or use that. And it becomes a different kind of assignment. It’s more of a data archeology kind of assignment where they are tracing the development of what they have done. Where did it start with the language model? How did they revise it and what did they get? Which I think could be really a very useful kind of experience. But I think the key is that everybody needs to be upfront about it. And if instructors are using ChatGPT to generate summaries, or questions, or to use in grading, which I would have some reservations about, but let’s suppose that they are, they need to be very transparent about that. I also think it’s important that we understand the copyright implications of using these tools. The content that language models, large language models, are trained on is oftentimes, there’s a lot of copyrighted content there, and there’s litigation pending all about that. It’s also important for us as educators to remember that we, or that students work is copyrighted. When students create something, they have the copyright on it, whether they assert it or not, they have it. And so, we can’t indiscriminately, you know, if we want to submit it to a language model for some kind of commenting or, whatever, we can’t do that. And so, I think it’s important that we have a renewed appreciation for the intellectual property considerations that arise now as we are seeing these models used more and more.[166.3s]
Patricia [00:16:07] [00:16:07]You had a recent discussion, recently, entitled “ChatGPT: Implications for the Classroom,” and there were a few reasons enumerated why educators should be concerned. Like, students stand to lose cognitive function, and AI should – could make instructors and writing centers obsolete. Do you really believe that this is an inevitability or a possibility in the near future? [24.1s]
Ted Pedersen [00:16:32] [00:16:32]I don’t think that’s inevitable at all. I will say that presentation, I worked on that with two colleagues who are members of our writing center here on campus. And so, they’ve been on the front lines thinking about these kinds of issues. It does concern me that students may bypass the opportunity to work with human tutors and instead ask ChatGPT to, you know, give feedback on their essays and things like that simply as a convenience, because I think that’s bypassing a very important kind of relationship and experience that students can have working with a human tutor who can really guide them and help them in their writing in a much broader sense than kind of automatic generative content would provide. I think as far as students losing cognitive functioning, I think that only happens if we don’t really pay attention and kind of, as instructors, ignore the existence of large language models and just kind of hope students don’t use them but don’t really address it, and, kind of bury our heads in the sand. And I think there, it may be that, you know, students may write less whether that would be a great enough reduction to reduce cognitive function, I don’t actually know. I think my biggest concern might be that, schools, you know, k through 12, colleges, universities, these are challenging times as far as budgets go. And it would be a very unfortunate thing if administrators in some area decided that, well, you know, we don’t need quite as many tutors. We can have the students go to ChatGPT or things like that. I don’t know of any particular cases like that, but I know that the budget cutting that is happening at colleges, universities, K-12 schools is pretty brutal. And so, it does worry me that this might be seen as a cost-saving measure, which would not – I don’t think would be a positive change at all. [134.6s]
Patricia [00:18:48] [00:18:48]And in what other ways do you anticipate AI’s continuous improvement, or ChatGPT, for example, its continuous improvement that could pose challenges or concerns for educators? [9.8s]
Ted Pedersen [00:18:59] [00:18:59]In terms of concerns for educators, I think the fluency and the style of the output is going to continue to improve, and I suspect that there will be, and there is to some extent already, the ability to kind of tune the output to a certain level. And sometimes now what will give away a ChatGPT-generated essay is that it’s just written in a style and at a level that doesn’t fit the student or the class. It just – it’s like an 18th century Victorian author, you know, or something, and it just doesn’t fit. And I think if you can begin to tune that output to either a student style or a particular grade level, it could become harder for instructors to notice that and intervene if they feel the need to. It’s also going to be more and more integrated in a lot of different tools. And one of my larger concerns is that students may not even be aware that they are using a large language model when they’re using, you know, a website or a tool that is simply advertised as helping them compose an essay or something like that. It’s going to be built into the background, or the technology is going to be more in the background. And so, even if students are told not to use that kind of tool, they may not even realize that they are. They may think it’s just a feature of word processing. And so, I think those could be challenging problems. And it, again, has to do with, you know, educating and informing as to where these tools are. On the more positive side, I am hopeful that the language models may begin to do better about sourcing where their material is coming from and to make it clear this information is from here. And that, I think, could be very helpful in terms of verifying the correctness of the information and any particular bias or other perspectives that the original source material might have. [131.8s]
Patricia [00:21:12] [00:21:12]And how can educators strike a balance between leveraging AI tools for writing support and ensuring that students develop essential writing skills and competencies? [8.6s]
Ted Pedersen [00:21:22] [00:21:22]Yeah, that’s – I think that’s a great question, and I think this is something that I know a lot of people are working on. And I think students need to be able to write, and they need to be able to write essentially from scratch without, you know, without starting with anything. Because when we write, and when we write in certain ways, we are essentially learning to think. And you don’t want to constrain a student’s thinking or the development of their thinking or their imagination by always having them start with something else, something that’s created by either a teacher or a large language model. Students need to have that blank page, that blank canvas upon which to project their views and ideas, and I think that’s really important. And I think students need to be convinced of the value of that, not just in terms of their writing skills, but in terms of their ability to think and the development of their imaginations. I really feel like the expression that we do when we write, or when we draw, or when we paint, or you know, whatever we do, is a very important thing for students to develop beyond just the skill of writing. I think once they have those kinds of foundations, then there are, you know, interesting things that they can certainly do with large language models and other tools, but I think that foundation does need to be there first. [90.7s]
Patricia [00:22:55] [00:22:55]I totally agree. And on the flip side, what advice would you have to give to educators who are grappling with the decision of whether or how to incorporate AI writing tools into their teaching practices? [11.4s]
Ted Pedersen [00:23:08] [00:23:08]I think there are a few things that I would suggest, and one of them is don’t feel like you’re going to be left behind or you’re going to be regarded as, you know, an old fuddy duddy if you don’t embrace AI immediately. I think it’s okay to be cautious on this, and it’s it’s important to educate ourselves. I also think for certain kinds of writing, it’s important to encourage students. It’s important to tell them why you believe it’s important. So, there are certain things that I assign where I really want to hear the students voice. And I really want to hear the students opinion. And I want to hear them connected to their own experiences in, let’s say, other computer science classes or their education in general. And a large language model cannot do that. And so I, you know, explain that. And I think for the most part, that’s fairly convincing. If, you know, if the students understand that you want to hear that and not some kind of version of perfection, you know, for the most part, they respond. And so, I think if you have a need like that as an educator, or if that’s one of the things you want to achieve in assignments, it’s really important to explain that if you are going to use the AI tools, I think it’s very important to give the students some guidance and some boundaries. I think just unrestricted use of anything, any way you want, is inviting problems. And so, I think you have to give the students some guidance on, you know, I would like you to use or you may use ChatGPT to generate three ideas for a story you will write. And then you pick one of those and you can use ChatGPT to start the story, and then you will modify it, and you’ll show me all these steps. You know, just provide some structure like that, because I think just saying, “Eh. Use it any way you want,” it doesn’t really help the students. They often aren’t quite sure of what they’re getting or what they’re doing even. [131.5s]
Patricia [00:25:21] Dr. Ted, let’s move away a little bit from AI and education and talk about the social impacts of AI and large language models. Please tell us what inspired you to start assessing the social impacts of large language models and artificial intelligence?
Ted Pedersen [00:25:33] Well, I think it’s just been seeing the incredible reach of these techniques and how that’s developed over just a few years, even before ChatGPT became a household name. There were clear indications in the natural language processing research community that things were shifting and changing towards these large language models. And with that, what we noticed very early on are these issues of bias and even hatred, and hate speech, and sexism being generated by these models before ChatGPT was released, and I guess the end of November of 2022. The large language models that were out there – there were large language models out there, and they were research tools, and they were not particularly well-filtered. And so, there was a lot of very disturbing content that was being generated. And it just draws your attention, and it it became fairly clear that this was the direction the field was going, and it was clear that there was both this kind of harmful output, and then also, you know, very significant intellectual property concerns about the data that the models were being trained on. So, I think it was a combination of those issues that really made this a priority for me.
Patricia [00:27:04] [00:27:04]Dr. Ted, you are currently co-editing a book entitled “The Relationship of AI and Islamophobia,” and you said, online that you are increasingly thinking about, where social issues and computing intersect, particularly how AI amplifies and generates racism and Islamophobia. Could you please elaborate on some of your findings and how you perceive AI can amplify racism and Islamophobia in society? [23.5s]
Ted Pedersen [00:27:29] [00:27:29]Sure. So, I think the reality is that when you look online just at social media and other other content that we have out there, there is an awful lot of hateful content. There is an awful lot of misinformation, and all of that is being swept up into these large language models, and it’s being stitched together in various different ways that preserve the hatred or even increase it by blending together perhaps 2 or 3 particularly vitriolic examples. And so, you know, this is how the amplification can occur. And unfortunately, there’s also a feedback loop that can occur where hateful content, and social media, and other forums is clearly being generated by language models and large language models. And so, they’re adding to the hatred and racism and Islamophobia that is online, so that the next time around, when the large language models are trained again, it finds even more, and even more, and it’s, you know, dangerous for both the large language models. And I think it also is tending to make some of our online experiences a whole lot less, you know, less pleasant, to say the least. And sometimes, it makes another hateful and dangerous. So, I think those are kind of some of the concerns that have motivated that work. [100.4s]
Patricia [00:29:11] [00:29:11]And have you encountered any challenges or resistance in raising awareness about the intersection of AI, and racism, and Islamophobia? And if so, how have you navigated them? [9.8s]
Ted Pedersen [00:29:22] [00:29:22]The typical objection when there is one is that, well, you’re a computer scientist, which is true, I’m a computer scientist, and you’re sort of getting too far out of your lane here. You know, you should leave the racism and Islamophobia for the sociologists or other people, and you should kind of just focus on the nuts and bolts of computing. And you hear that actually moreso from people in computer science, or engineering, and so forth. And it’s kind of disappointing, because – it’s more than disappointing – it’s disturbing, because the goal of computer science, the goal of engineering, the goal of science, is to bring new ideas, changes, hopefully betterment to the world. And if the scientists and engineers wash their hands of the social problems and reality of life that we have, I mean, who is going to do that in the context of large language models? For example, the technology is – it’s understandable, but it is non-trivial. And so, if computer scientists aren’t looking at that and connecting it to various kinds of social issues, well, who is going to do that? And so, I think we do have a responsibility throughout science and engineering to care about the impact of what we do on the world, both for good and bad. [92.5s]
Patricia [00:30:56] [00:30:56]Dr. Ted, do you believe there are inherent biases in AI systems, and do you think this could contribute to perpetuating discriminatory attitudes towards marginalized communities? [9.0s]
Ted Pedersen [00:31:06] [00:31:06]Yes. There’s a there’s a short answer to that. Yes, and the reason is that many of these systems are based on historical data that encodes and reflects the bias and the discrimination that has existed in society. And so, these models are taught that history. And they learn the racism, they learn the sexism, they learn the Islamophobia, and they will continue making decisions that perpetuate that unless there is some kind of intervention. [35.1s]
Patricia [00:31:43] [00:31:43]Well, do you think it’s the lack of diversity in the tech industry that influences the design and implementations of these biased AI systems? [7.9s]
Ted Pedersen [00:31:52] [00:31:52]I certainly think it’s a part of it, yes. I think, the value of diverse teams and being aware of different kinds of discrimination and hatred that you can find online is it’s very important. So, for myself as a white man, I have a certain sort of experience online. I have a long history being online, and I have a certain experience in social media and other places that has, generally speaking, been fairly positive. Not always, but if you talk to, for example, a black woman who is active online, active in social media, it’s a whole different experience. And there’s a lot of targeting and a lot of hate, and I wouldn’t necessarily realize that. And so, if I’m working on some kind of, social media application, let’s say, I think it would be very important to have different people with these different experiences as a part of that, because otherwise, you may well overlook some pretty serious safety factors. One of the conjectures, and I think it’s a reasonable one, is that one of the reasons we have so many problems with social media and hate speech online now is that back when we were developing the internet and the web and so forth, it was largely done by fairly isolated groups of men, and often white men, who just weren’t really thinking about harassment and other kinds of problematic online content and behavior, and that that left a lot of open doors for this kind of behavior and content even today. [109.8s]
Patricia [00:33:43] [00:33:43]And so, to correct that, what do you think are the ethical considerations that developers and policy makers can take into account to mitigate the negative impacts of AI on racial and religious minorities? [11.3s]
Ted Pedersen [00:33:56] [00:33:56]I think, first of all, being very conscious of what you’re using for training data, if you are developing a kind of large language model. I think, unfortunately, what has happened is that the amounts of data that have been used are overwhelming, and it’s not really clear sometimes what we’re getting. And so, I think taking a more judicious and curated approach to that could certainly be an important step to eliminate, you know, some of the obviously problematic content from training data, for example. I think in other settings, social media is driven by engagement, and likes, and so forth, and not necessarily quality or informativeness. And I think some social media platforms are pretty clearly relying on a certain amount of hate to drive engagement, and I think that’s just the wrong model for developing those kinds of platforms, because it creates a terrible experience. So, I think somehow changing and limiting the – or changing the incentives or social media could be very helpful to improve the quality of the experiences that people have there. [82.0s]
Patricia [00:35:19] [00:35:19]How do you think interdisciplinary collaborations between computer science, social sciences, and humanities contribute to a more nuanced understanding of the ways that AI impacts social issues? [11.9s]
Ted Pedersen [00:35:33] [00:35:33]Oh, I think it’s critical. I think, you know, as a computer scientist, I have a certain background and a, you know, particular focus on, you know, the academic world, and people in other disciplines, you know, sociology, and political science, and ethnic studies, and black studies, and so forth have a whole different background that is just as deep and just as rich as mine is in technology. And it is remarkable, actually, what you learn when you interact with people like that. I should also say, I think, history and historians are very important. I think we neglect and misunderstand sometimes that the racism, and sexism, and so forth that we see online is not just because of the online environment. It is a reflection of a very long history that sometimes goes back way before we had computers even. And so, I think the historical perspective is very important in addition to the social sciences and other humanities. [74.3s]
Patricia [00:36:49] [00:36:49]And on the other side, what role do you think individual users and consumers of AI technologies play in challenging and combating some of these discriminatory practices? [10.2s]
Ted Pedersen [00:37:00] [00:37:00]Yeah. I think individual users can have a lot of power if they, you know, first, if they see content that is objectionable, to report that both to the company and perhaps, you know, sharing it in other forums, so that there is attention paid to it. The large technology companies are very sensitive to bad publicity. And so, if users are raising concerns about certain kinds of output or content, I think it can actually have some impact. Now, it may lead to a kind of short-term fix or a, you know, almost kind of a denial by the company, but at least draws their attention to it. So, I think individual users can play an important role. And, you know, and I also think too when a platform becomes unusable or hostile, you know, just maybe making the choice to depart that – to leave can, you know, have an impact as well. Because without users, these these sites cannot continue. [70.6s]
Patricia [00:38:13] Dr. Ted, was there something that prompted your shift in focus toward investigating the potential harms of language, technology, and its broader societal implications?
Ted Pedersen [00:38:23] I think it was a few things, and some of it is just about me. I’ve had a pretty long history in natural language processing, as you mentioned. And for many years, I was working at a pretty technical low level and dealing with, you know, statistical problems or, you know, machine learning algorithms or, you know, things of that nature. And at that time, there wasn’t a lot of impact on the world, in my opinion, of natural language processing technologies. So, it felt like the right place to be. Now, as natural language processing and large language models is having real impact in the world, it just doesn’t feel right to me, for me personally, to be spending my time in the weeds, if you will. Working out what seem to me sometimes to be, you know, important but narrow technical problems just because the impacts are out there in the world. And I think people who have a computer science background and who can speak with at least some authority about those issues, it’s important to do so. And so, I think that’s a big part of what’s motivated me.
Patricia [00:39:42] And can you elaborate on any key findings or insights you’ve discovered regarding the ethical foundations, or lack thereof, in the decision making processes behind language technology development?
Ted Pedersen [00:39:54] Yeah. I mean, I think the big one is the unrestricted use of whatever content is scrapeable by the large language model companies. I can’t defend that, because it is violating copyright in many cases, and it is also sweeping up and amplifying some of this hateful, racist kind of content without really a second thought. It’s more so the idea that, well, we need as much data as we can possibly get, so we’re going to take everything, but I think that’s indefensible in an ethical sense. And I think legally, there are some questions regarding the copyright portion of that that are to be decided, but I think the ethics of it are not really defensible at all. It’s really just a matter of convenience and because we can, which is not a good reason.
Patricia [00:40:53] And in your opinion, what are the main challenges in mitigating the negative social impacts of NLP and AI technologies?
Ted Pedersen [00:41:01] Well, I think it’s the fact now that large language models in particular are out in the world, and they’re getting integrated into lots of other tools that are not obviously large language models. And so, their reach is spreading and their visibility is decreasing. We don’t always know when we are getting their output. And so, I think that makes it particularly concerning, because people may well get some kind of misinformation or hate speech that they can’t really trace the origins of. And so, they may just accept it, or not question it, or not know what to do about it, and I think that’s a very worrisome thing.
Patricia [00:41:49] [00:41:49]What do you see as some of the most pressing ethical concerns or risks associated with the widespread adoption of NLP and AI in society? [8.8s]
Ted Pedersen [00:41:59] [00:41:59]I think it’s this transparency question. I do worry quite a bit about what happens when large language models are kind of everywhere, and we don’t even realize it or see them. I think that potentially gives a lot of power to a few, you know, companies that make those available and encourage their deployment. And so, I think being more transparent about where they are and how they’re being used is important, so that we know when we are using them in effect. And then also, I really think that there needs to be some pretty significant changes in the content that these models are being trained on. In particular with respect to copyright or other kinds of, you know, content that they don’t really have authorization or permission to use. I think making progress on that could be a very important step forward. [63.6s]
Patricia [00:43:04] [00:43:04]And how do you think policymakers should approach regulating the development and use of NLP and AI technologies to minimize their negative social effects? [9.8s]
Ted Pedersen [00:43:15] [00:43:15]Yeah, I think policymakers should not be overwhelmed, and they should not be intimidated by the technology. They should use almost what I would say is common sense. And that if, you know, if I were building a publishing empire or writing a book by taking copyrighted content and slightly reorganizing it, they would find me to be, out of bounds and violating the law and so forth. And I think sometimes because of the technology and because of the apparent wonder of it, and the size of it, and the kind of glowing switches, and whistles, and bells, and all the rest, there is a tendency to kind of back away from it. But I think instead, it’s important to look at what it really is doing and what it’s really using, and don’t be overwhelmed by it all. We still need copyright. We need to respect copyright as we have for a long time. We need not to allow humans or language models to slander people, or to threaten them, or to make their lives unlivable. And so, you know, we need to feel empowered to put limits on some of these kinds of technologies, I think, and policy makers are the people who can actually do that. [82.0s]
Patricia [00:44:38] [00:44:38]So, we’ve been talking about the negative impact of AI in society. On the flip side, are there any recent advancements or trends in NLP and AI research that you’ve seen to be significantly positive for society? [12.7s]
Ted Pedersen [00:44:52] [00:44:52]Going back to my first answer about how I got into this in machine translation, that problem of translating from one human language to the next or to another automatically. The progress that we’ve seen in translation over the last 5 to 10 years has, to my mind, been staggering. And the quality for many language pairs is much better. And this does not mean that translation is solved and that all languages can be translated to all other languages, but the situation is so much better, and that is so useful. It opens up worlds to people, to be able to access language and literature, you know, in a language you don’t know. And I think progress there is certainly continuing, and it’s very much based on the, you know, the kind of large language model technology. You can ask those to do translations for you, and they do them very well, and I think this can be – I think this is a tremendous advancement, and I would like to see it extended to languages that are not as present online, because I think that would be, even more wonderful, really. So, I think that’s a great development. I also think that this ability now to engage in a dialog with something like ChatGPT and explore ideas or, you know, just kind of have a conversation about something that gets you thinking… That’s not necessarily a bad thing. You know, depending on what you’re doing and why you’re doing it, I think it can actually be kind of invigorating, and it can help you, you know, get started on something. And so, I think the, you know, the brainstorming potential in certain areas with proper kinds of guidance, I think can actually be very helpful. And I think sometimes students in particular can really benefit from that, because sometimes you do just get stuck at the beginning of something, and you need a few ideas to jolt you along. I think that’s very positive and encouraging. And then I guess the final point I would make is that in a very large sense, it is kind of satisfying, I guess, in a sense, to see the field that I’ve been a part of since the 1990s reach a point where there’s some real big impacts, good and bad, around the world with regular people. And I think all of us in the field believed that was a possibility, and that was often why we were doing it, that we thought language technology could help people and could make the world in some way better. And I think that can still happen. I think we need to be very careful about where we’re at now, but I think there is still that potential and that’s very satisfying to see. [174.4s]
Patricia [00:47:48] [00:47:48]Dr. Ted, earlier, you said that you believe there is a need for greater transparency and accountability in the development and use of NLP and AI technologies. How do you think this can be achieved? [10.2s]
Ted Pedersen [00:47:59] [00:47:59]I think for students and educators, it is being open about it and not stigmatizing the use of these tools. If you do not want your students to use them, then be clear about that. Be clear about the reasons. If you want them to use those tools, or if you will allow them to use those tools, give them some guidelines. Make it okay for them to talk about it and to ask questions, and to ask if something is going too far. So, open up that kind of communication. I think for the tools themselves, I think as apps and applications are developed, there should be a kind of disclosure. When we are using chat bots now, I think the the right thing to do in that case, and many companies do this, is they say, “You are communicating with our automated chat agent,” you know, and “We will connect you with a human” next or something, making it clear what you are dealing with. And I think for large language models, I think some kind of disclaimer or disclosure that, you know, you’re starting to use a tool now that is going to be generating text based on this large language model, I think would be very important and very helpful. And so, I’m kind of thinking in those terms. [81.7s]
Patricia [00:49:22] [00:49:22]How do you see public awareness and understanding of the social impacts of NLP and AI evolving in the coming years? [6.1s]
Ted Pedersen [00:49:30] [00:49:30]I think it’s an open question, and I think it depends a lot on the kinds of education that students get in school at all levels about large language models, and I think also, in particular, parents of younger children. I think it’s very important for the parents to understand what these models are and aren’t, because I think sometimes parents, if a teacher says, “I don’t want my sixth graders using ChatGPT for the writing assignments,” some parents will, I have heard, be worried. Like, why can’t my student use the most recent technology for their writing? It seems like that would be helpful to them. And so, I think having conversations about that at all levels of education is especially important, because otherwise, there will be these kinds of misconceptions, and I think using large language models will just be seen as a kind of competitive advantage that you have to use at all costs, because if you don’t, you’re falling behind, and I think that’s not right. [66.8s]
Patricia [00:50:38] And looking ahead, what do you think will be the most important considerations for ensuring that NLP and AI technologies are developed and deployed in ways that benefit society as a whole?
Ted Pedersen [00:50:50] I’m going to go back to transparency again. I think the large language model companies need to be more transparent about what they’re doing. What kind of data are they using? What are these models? Who is using them? I think making it more apparent and more visible is extremely important. And I think in particular, if we are able to know better the kinds of data that these models are trained on that will help us understand better the kinds of risks that they could represent.
Patricia [00:51:25] [00:51:25]And Dr. Ted, I would like to get your position on the use of AI text detectors in school as a flip side to the use of ChatGPT. [7.0s]
Ted Pedersen [00:51:33] [00:51:33]Yeah, I think it’s a very interesting technical problem, and I think it’s a very interesting issue. I think there’s a place for it, but I think we as educators have to be very careful. Because going back to an earlier point I made, students own the copyright on what they create. So, a student turns an essay to me. They have the copyright on it. Whether they assert it or not, they do. And so, I do not have the right to send that somewhere else. You know, without that, students consent to, let’s say, an AI detector or plagiarism detector or some other kind of repository, because I’d, in fact, be making a copy of it. And so, educators in particular need to be respectful of copyright. I think the way that we manage to Turnitin for plagiarism detection here at the University of Minnesota is that when students are submitting work in a class that’s using Turnitin, the students actually submit it to Turnitin, not the instructors. And I think that’s actually a good model, because then the students are fully aware of what’s happening to their work. And I think if something is done in that way, it can be a very useful exercise and a useful service both for students and educators alike, but I think we have to think through how we manage the kind of intellectual property rights of students, because I think it’s really important that we be good examples of that as educators. [93.5s]
Patricia [00:53:09] [00:53:09]Thank you, Dr. Ted. And do you have any last message, advice or insight for our listeners out there? Users of AI technologies, educators, students, and policymakers? [8.2s]
Ted Pedersen [00:53:19] [00:53:19]Yeah, goodness. I think maybe the big message is don’t be overwhelmed. Don’t be intimidated. Don’t be afraid to ask questions. Don’t be afraid to admit to what you don’t know, because there’s a lot here that’s very new, and it’s all moving really fast. And I think we’re in a time now where all of us probably need to know more than we do. I mean, I certainly feel that about myself. And so, I think nobody should feel like they’ve been left behind. This is just a new thing that’s moving really fast, and it’s okay to ask for help, to ask for guidance, to talk about it, and to maybe admit what we don’t know or what we’re not sure about. [44.3s]
Patricia [00:54:05] Thank you so much, Dr. Ted, for the valuable insights that you’ve shared with us today! And of course, thank you to our listeners for joining on another enlightening episode of The AI Purity Podcast. Stay tuned for more in-depth discussions and exclusive insights into the world of artificial intelligence, text analysis, and beyond. Don’t forget to visit our website. That’s www.ai-purity.com, and share this podcast to spread the word about the remarkable possibilities that AI Purity offers. Will also link Dr. Ted Pedersen’s YouTube channel down below in the description, so you can see more of him and hear more of his insights. Until next time, keep exploring, keep innovating, and keep unmasking the AI. Thank you so much again, Dr. Ted, for this opportunity. We wish you good luck, and thank you so much!
Ted Pedersen [00:54:46] Thank you!
Patricia [00:54:47] Thank you! Goodbye!