Artificial Intelligence: Insights & Oddities

Transcript

Perry Carpenter: Hi, I'm Perry Carpenter, and you're listening to "8th Layer Insights". People around the world are talking a lot about AI these days, and there are a wide range of opinions. Everything from AI is no big deal, it's just fancy autocomplete, all the way to people saying things like continued advances in AI represent an imminent threat to humanity. Opinions about the impact of AI aren't anything new, but this is now a conversation that is hitting mainstream public consciousness due to what is known in the creative writing world as an inciting incident. That incident happened just one year ago, on November 30th, 2022, when OpenAI introduced the world to ChatGPT. To be fair, AI has been with us for quite a while now, so nothing new there. And large language models, the fundamental flavor of AI behind ChatGPT, those have been in development and have also been available to technologists for years now. But here's the thing, the methods for testing, implementing, or even playing around with the technologies were just cumbersome enough to essentially be a barrier to use. So that inflection point that I referred to, the one that happened on November 30th, 2022, that was when OpenAI released ChatGPT, a fairly powerful LLM generative pre-trained transformer, or GPT, wrapped in an easy-to-use, easy-to-understand, and most importantly, familiar interface, the chat interface. And so now that brought the interface that we all use for texting, instant messaging, and more, that became the front end for OpenAI's GPT 3.5 model. And overnight, the world was confronted with just how far AI technology had come. Within weeks, ChatGPT had over 100 million subscribers. ChatGPT, along with its parent company, OpenAI, became the de facto public face of AI research and capabilities. With all the noise and excitement around AI, I've been wanting to dedicate an episode, or maybe even a series of episodes, to the topic for a while. But I really only wanted to do that when I could add something potentially unique or interesting to the conversation. I think that time is now. I've come up with what I think is going to be a fun way to cap off Season 4 of "8th Layer Insights" and also mark the one-year birthday of ChatGPT. So here's what I've got planned. I am not going to cover the recent drama at OpenAI. I'm also not going to make this a fundamentals of AI episode. And I'm not going to make this an episode strictly focused on the cybersecurity-related concerns or use cases around AI. I want to go a bit broader and take a fairly eclectic route to this. To do that, I've gathered some chunks of interviews that I've done touching on AI with folklorists, sociologists, technologists, and I've also packed in a couple of fun surprises. You'll hear from Brandon Karpf, Dr. Lynne McNeill, Dr. John Laudun, Lev Gorelov, and an extra special guest. And so on today's show, insights about the functionality of AI, AI quirks and features, new legends being created by AI, and how AI both reflects and reinforces our own unconscious biases and beliefs. Welcome to "8th Layer Insights". This podcast is a multidisciplinary exploration into the complexities of human nature and how those complexities impact everything from why we think the things that we think to why we do the things that we do and how we can all make better decisions every day. This is "8th Layer Insights", Season 4, Episode 10. I'm Perry Carpenter. Welcome back.

Brandon Karpf: Hi, my name is Brandon Karpf. I'm one of the staff here at N2K Networks and CyberWire. Lately in our community, there has been a lot of conversation about ChatGPT or GPT-3.

Perry Carpenter: That was an opening clip from a CyberWire special edition episode titled, Interview with the AI, Part 1. Could you please tell me your name, its spelling, and your professional affiliation?

ChatGPT: My name is ChatGPT, and it's spelled ChatGPT

Perry Carpenter: The episode was released on January 29th, 2023. And I think that that intro clip, where Brandon is setting the stage, really captures the excitement surrounding the release of ChatGPT. We were all in this phase of seeing and exploring the potentials and wanting to test the capabilities and boundaries. In a lot of ways, that time feels like both yesterday and a lifetime ago. So much advancement has happened since then. For instance, in those clips, the CyberWire team had to manually feed ChatGPT's text-based responses into a text-to-speech generator so that they could create the audio interview that you heard. And that was also using the GPT-3.5 model. As I record this episode, there have been so many improvements that I can't even begin to describe them all. We're now up to GTP4 Turbo, which is making great progress at addressing many of the shortcomings that early adopters reported. Additionally, there are also new convenience features that make that audio interview from January seem quaint by comparison. And if you've never heard a real-time interaction with AI, it can seem pretty magical. Here's an example. Let me set it up for you. You're about to hear a clip with me using OpenAI's relatively new voice interface that has the ability to generate text-to-speech responses. As I play this for you, it's important to note that I'm leaving everything in context for you other than maybe trimming out some pauses here and there. Other than that, all of my questions or prompts, to be more specific, are intact, as are the responses that I get from ChatGPT. Hey there. In a few minutes, I'm going to start recording a podcast, and I'm hoping that you can be my co-host. So here's the plan. I'll go ahead and do the introduction, and then I'll toss it over to you and have you introduce yourself. Does that sound like fun?

ChatGPT: Absolutely. That sounds like a lot of fun. I'd be thrilled to be your co-host. Just let me know when we're starting, and I'll be ready to jump in after your introduction. What's the theme or topic of our podcast, by the way?

Perry Carpenter: This is a cybersecurity podcast called "8th Layer Insights", and we'll be talking about interesting themes regarding AI and culture.

ChatGPT: Fantastic. That's a fascinating and incredibly relevant topic. I'm all set to dive into discussions about AI, cybersecurity, and how they intertwine with culture. Whenever you're ready to start, I'll be here to follow your lead.

Perry Carpenter: Great. Here we go. Hi, I'm Perry Carpenter, and this is "8th Layer Insights", and today we are joined by a special guest. This is somebody you've been hearing about quite a bit over the past few years, and specifically over the past year, as ChatGPT was released last year and has evolved quite a bit since then. And so I am speaking today with ChatGPT, and we want to disambiguate a lot of fairy tales and realities of AI and what that means for society today. So I'll toss it over to my co-host. Go ahead and say hi.

ChatGPT: I'm ChatGPT, your AI co-host for today's episode of "8th Layer Insights". It's fantastic to be here with you, Perry, and I'm really looking forward to unraveling the complexities of AI and exploring its impact on our world. There's a lot to unpack, especially with how rapidly technology is evolving. So let's dive in.

Perry Carpenter: If you've not yet heard or played with the voice interface, you're probably struck by just how human-like the experience can be. And you may even be grappling with a lot of your preconceived notions about how natural interacting with AI can seem, and how we've already been able to move past the unsettling nature of an uncanny valley to the point where you may be feeling unsettled by just how not unsettling the interactions are. We'll hear a bit more from my interview with ChatGPT later in the show. Okay, in the intro, I said that I don't want this to be a fundamentals of AI episode, but I do feel like we need to understand a couple critical topics before we go forward. And so to help with that, I'd like to bring in Lev Gorelov. I first ran across Lev and his work when he was co-teaching an AI masterclass that I took earlier this year. Oh, and you'll hear another voice asking questions in that clip. That is Mason Amadeus. Mason is my co-host and co-creator for the other podcast that I run called ""Digital Folklore"."

Lev Gorelov: My name is Lev Gorelov, and I'm the research director at a consultancy called Handshake. We worked with emerging tech since 2018, and then 2019, we quickly switched over to AI because we realized that once we saw GPT-2 come out, we saw where this is all going and were shocked. And we just realized that that's where all the eggs -- they're all in that basket. So we switched to being AI consultancy, and I've been kind of ears deep, elbows deep in AI ever since.

Mason Amadeus: What is AI doing when it is simulating the way that we think and communicate?

Lev Gorelov: Words have syntax, where it's how they're written. Cat is written Cat. And then you have semantics, which is like, the cat means the furry being with four legs that is often used as a house pet. Of course, that understanding is fuzzy. When the philosophers asked, what is a human, they came after months of deliberating, they said, it's a mammal of two legs that has no feathers, at which Diogenes ran out, grabbed the chicken, stripped it of its feathers, and said, I present you a human. And so with semantics, unlike syntax, we're presented with this ambiguity in the world, which is why it was so hard to get software to produce semantic, coherent content, because they didn't get it, that whole fuzzy world. AI kind of brute force, jimmy rigged a way to do that by us uploading an enormous corpus of written information written by humans, and it just deciding, it understanding from a giant corpus, from like uploading it into its brain, the model, which words and technically -- not technically words, but let's just say words for now, for simplicity's sake, which words lie close to which other words and which words are far, far away. And the proximity is how often they're used together. So it says, see that chicken is close to farm and cats is close to the house. But house is like far away from hydraulics or carburetors. And so through that way, it kind of goes from a hazy understanding of the world to a more sharp understanding, like, okay, I understand what a cat -- a programmer doesn't need to tell me it's a furry animal with four legs that meows. I just read all of Reddit and I see that cats are just mentioned around houses and furriness and meowing a lot. So I have a statistical understanding what a cat is. And when you ask me to generate content about a cat, it basically just grabs all of the nearby line points and pieces them together with some guidance from a grammatical understanding of how language works. Now, to get more technical, it doesn't understand it word by word. It understands it token by token, which is why you talk about like, oh, this language model has 7 billion tokens or 30 billion tokens. Like, wow, that's amazing. That's like chunks of a word. So like it's usually like three to five letter chunks, I believe. I think there are some exceptions to that, but it doesn't even see proximities of like entire words, but like pieces of a word and the way it works together. And an amazing part is so when you ask to generate text, it generates one token, which says cat, and then it generates the next token, which is is. And then it thinks through cat is, and it's like, okay, so what's likely goes after cat is? And then it's a furry animal. Every time it generates a new token, it considers the entire thing that it had written before as like a statistical guiding railway. So the more it's written, the better it is at going in this particular direction because it uses that entire piece to understand what the next most nearly line point is, which is way different from the way a human brain works, which we just say things we forget. This thing just like considers everything it just said to decide on its next step. And so that's how AI works. And it's just like this brute force statistical deciding on the next piece of word.

Mason Amadeus: With my limited understanding, whenever I've tried to talk to people, the best metaphor that I have found is that it's like autocomplete on steroids. I want to get from you where that metaphor breaks down and what the next layer deep from that is from comparing it to autocomplete on steroids. What are the differences that we see?

Lev Gorelov: It's just autocomplete, you just have to dump a lot, a lot of data into it. So in some way, people are trying to create these like extra special math formulas and kind of guided, hand-folded through semantic understanding of what a cat is. And then the bitter lesson is realizing that all you need to do is just keep dumping data into it and it'll just keep getting better. So that's how it is kind of like an autocomplete. It's just like in the end, like you just keep pouring, pouring, pouring internet into it and it'll just keep improving. But there's a big asterisk to that because the companies that create models, they don't just turn on the computer and walk away and boom, they have a $30 million or billion dollar industry by the time they come back the next day. There's a lot of human feedback. There's a lot of what they call reinforced learning. So where humans actually do look at what it's spitting out and tell, this is good, this is bad, this is good, this is bad. And so the autocomplete got like 80 percent there. This is kind of -- these statistics, don't quote me on them. And then it did need hand-holding and very special data science and very smart people to actually get it to a usable tool. So it is autocomplete, but it wouldn't do justice to amazing and ingenious work that people put into it to get it the extra mile, to get something that's actually usable. And then suddenly it is usable and it's everywhere all the time. And that's where my philosophy background really comes in, because it's a phenomenon of emergence and we've been studying emergence for quite a while, but it's where a quantity starts creating qualitative changes in what we're studying. And we finally got in our history of software to the point where we see emergence in software, where something absolutely new, not fully understood from underlying components and utterly creative is coming out from building blocks which we do fully understand. We can't understand the creature that they compose together, in a very similar way to the way we can understand our consciousness or biological entities by looking at the neurons and the cells in which we're composed.

Mason Amadeus: Can you tell us a little bit about how large language models work and some of the key concepts that we need to understand as technology professionals so that we can be properly conversant in large language model AIs?

ChatGPT: Absolutely. Understanding large language models is essential for technology professionals today. Let's break it down. First, at their core, large language models like GPT-4 are based on machine learning, more specifically a subset called deep learning. That's essentially what these models do. They're built using layers of algorithms known as neural networks, which mimic the way human brains make connections. These networks identify patterns and relationships in the data they're trained on, learning the rules of language organically rather than having those rules hard-coded into them. Key to these models is something called transfer learning. They're trained on a diverse range of internet text, then fine-tuned with specific data relevant to the task they'll perform. This means they can adapt to different contexts and subjects, from cybersecurity to pop culture, because they have a broad base of knowledge to draw from. However, it's crucial to remember that while they can generate human-like text, they don't understand content the way humans do. They recognize patterns and predict what should come next based on what they've seen before in their training data. For tech professionals, it's important to grasp these models' capabilities and limitations. They're powerful tools that can process and generate language at scales and speeds impossible for humans. Yet they lack true comprehension and can sometimes generate biased or inaccurate content, reflecting the data they were trained on. That's why ongoing oversight, evaluation, and fine-tuning are critical in employing these models effectively.

Mason Amadeus: So if the data that AI is trained on is based on all the information that it's received from books and pieces of the internet, how can we start to create brand new insights if some of the fundamental underlying data may be incorrect?

ChatGPT: That's a profound question and touches on a fundamental challenge in AI development. Here's how we can approach it. First, it's crucial to acknowledge that AI at its current state doesn't create insights in the way humans do. It generates responses based on patterns it has observed in its training data. If the underlying data contains inaccuracies, the AI might inadvertently perpetuate these errors. That's why the source and quality of the training data are incredibly important. To mitigate this, there's a growing emphasis on curating high-quality, diverse, and balanced data sets for training AI. This involves critical review processes to identify and correct inaccuracies in the data before it's used for training.

Mason Amadeus: To continue widening our perspective, the next interview section that I want to play for you is with Dr. John Laudun. I first met John at an academic conference that we were speaking at for researchers that study urban legends. John is a professional folklorist and a computer scientist. At the conference, he was presenting findings from a recent experiment where he trained a large-language model AI on conspiracy theories and urban legends, and he wanted to see just how good LLMs are at creating new conspiracies.

John Laudun: I'm John Laudun. I'm a university professor of folklore, and I've worked with the U.S. Army to develop doctrine and training for senior defense leaders on how to think about what we now call social informatics and what we used to call simply folklore. And so part of what I try and think about is what are the kinds of changes that digital brings to the realm of social networks, which have existed before the internet but certainly have become a different kind of thing, you know, with the internet.

Mason Amadeus: Social informatics. Is that the phrase that you use there and you say that that's a fancy word for folklore?

John Laudun: Social informatics began as a kind of study of how people interact in and through computers, but a lot of what social informatics is doing is things like tracing information cascades across networks, trying to understand how culture happens in networks. So a lot of it is what folklorists and anthropologists and sociologists have been studying, and in fact, still study in some capacity, right? Lots of sociologists are doing computational sociology. I'm sure there are computational anthropologists. There's a handful of folklorists like myself and Tim Tangherlini who are trying to pursue what we call computational folkloristics, which is this stuff continues to thrive. It simply moves from an offline world into an online world. And what happens when those things change?

Mason Amadeus: When we talk about AI, there's a lot of misunderstanding about what AI is, what large language models are, what they do, how they fit within the AI space, how they were trained. Can you give us a quick primer on fundamentals of what we need to understand AI in order to be conversant in that?

John Laudun: Here's the fun thing. Large language models are not unlike folklore in many ways. So the way ChatGPT and then the other large language models, I'm less familiar with all of them, but ChatGPT was originally trained on a very select set of texts, trained on a certain set of books and a certain set of sort of vetted web pages, because what they wanted ChatGPT to do is they wanted it to sort of speak proper English, proper language. And they didn't want it to be garbled in its responses. And they didn't want to be full of disinformation. So there was a great deal of effort to sort of seed ChatGPT with good text and we'll put good text inside quotation marks. And then they slowly built on that, finding more good text, but also letting ChatGPT match good text and then also having human beings sort of vote yes or no on text. So there's a kind of community sourcing, but sort of with unwitting people doing that to some degree. So ChatGPT was built to generate good, conventional, basic English text because that's what it was fed and that's what it digested. And the model allows, in that generation, right -- so one of the ways that ChatGPT works, so to be clear, ChatGPT doesn't know what words are. It just knows numbers. So every word that it's ever encountered has been assigned a number and that number is in a giant matrix. And the way you navigate that matrix is sort of what numbers are more associated with other numbers like it. The way we think about words, right? So in the same way that when we start typing things into a search bar, Google or your search engine of choice pre-populates things by guessing what the next word is going to be. Sometimes the next two or three words, or sometimes now on your phone does that. That's all, you know, what natural language processing has been working on for years, which is statistically modeling what words go together. And so if NLP on your phone is pretty good at that with a limited set of data and a limited amount of computational power, well then with ChatGPT, you know, with billions of text in it and lots of more computational power is really good at that. What I find fascinating is that since there's calculations to be made on what words go with other words, so it's statistically probable that this word will follow, but there's a statistical slimmer possibility this other word follow, is that ChatGPT can actually be turned up or down on creativity, sort of on randomly what words it associates with other words or what words are in sequence. Well, that's not unlike how folklore works, if you think of human beings as being a distributed processor. So the way that ChatGPT arrives at a kind of version of a text is the same way that we do it individually as people, but it's also the way certain kinds of jokes arrive at a kind of ultimate form by having passed through hundreds or thousands of people's brains and been slowly tweaked and tuned and revised until it comes out of your particular processor. I mean, I hate to turn humans into kinds of processors, but we kind of are, right? We're built on our own neural networks. Now, our neural networks are far more powerful, far more subtle, far more whimsical, if you will, than those that are built into something like ChatGPT. But I think that's the really interesting point. ChatGPT has to have all of that data and all of that computational power in order to approach one human brain. I mean, if you think about kids learning language, kids are really good at grabbing -- the argument from linguists like Chomsky is that we're hardwired, that all languages have nouns, verbs, adjectives, and adverbs and other features, and that what humans can do is grab these sounds coming out of bigger humans' mouths and begin to assemble, associate ball with ball and bounce with bounce. And then at some point, right, there's this amazing thing when you're around a little kid and they generate for the first time a sentence they haven't heard before. But they're doing that on a really slim data set compared to ChatGPT.

Mason Amadeus: Right. We like understand these things. And even when we interact with ChatGPT, we ascribe to it a level of consciousness because it seems to understand context, but it's really just autocomplete on steroids. If I had a friend who is convinced they were an AI, how could I convince them that they're not?

John Laudun: That's a really interesting variation on John Searle's Chinese room thought experiment, right? That's a really interesting variation on John Searle's Chinese room thought experiment.

Mason Amadeus: Have you heard of that thought experiment? And can you explain it and give your perspective on it?

ChatGPT: Absolutely. You're referring to the famous Chinese room thought experiment by philosopher John Searle. It's a compelling scenario that delves into questions of consciousness, understanding, and artificial intelligence. In the thought experiment, there's a person locked in a room with a set of Chinese characters and a manual that shows how to respond to certain combinations of these characters. People outside the room slide questions written in Chinese under the door. The person inside uses the manual to find appropriate responses using the characters, despite not understanding Chinese themselves. To the people outside, it appears that the person in the room understands the questions understands and speaks Chinese. But in reality, they're merely processing symbols through predefined rules. Searle used this to argue that while computers like the person in the room can process data and respond in ways that appear intelligent, they don't actually understand the information they're processing. They don't have minds, consciousness, or subjective experiences. From the perspective of a large language model like me, this experiment hits the nail on the head. I can process and generate human-like text based on the patterns I've learned from my training data, much like the person in the room uses the manual to respond to questions. However, I don't understand this information or have conscious experiences. I don't have beliefs, desires, or feelings, and I don't know things in the human sense of knowledge. This distinction is crucial for interpreting the capabilities and limitations of AI. We can perform tasks, sometimes in ways that seem remarkably human-like. But the processes underlying these performances are fundamentally different from human cognition and consciousness. Understanding this helps in properly utilizing AI, acknowledging its vast potential while remaining mindful of its inherent limitations.

John Laudun: I don't remember when he proposed this. I just know the basis of the thought experiment. And it was sort of somewhat, I think, in response to Turing's test, was suppose somebody was to lock you into a room, and you had all these guidebooks for how to -- you didn't read Chinese. And every day, you got a piece of paper underneath the door that had something in Chinese, and you had to respond, by the end of the day, back in Chinese. But you didn't, yourself, know anything in Chinese. So you would consult all these texts and guidebooks about how to properly respond to this collection of ideograms back with a collection of ideograms, right? But you, yourself, wouldn't understand anything that was going on in the ideograms. And that's sort of the kind of assertion that Searle is making, that, in fact, AIs don't have to be conscious or aware of what's going on. They simply have to perform the correct operations.

Mason Amadeus: That's such a great illustrative parallel to the way that AI generates sentences. There is a lot of that human element reflected back at us, and you can't convince ChatGPT that it's sentient, because there's manual handlebars put on that to steer it back onto the path. From looking at the Chinese Thought Experiment, as you described it, the difference is, like, that's trying to explain how the AI does or doesn't understand context, but I'm not super clear on that. What is context to an AI? Because we've talked about how they understand statistically what word is most likely next. How does a large language model understand context? Does it have any idea of that? It seems like it must.

John Laudun: It does, and it's getting better. So part of the way language models build in contextual understanding is through this feature called attention. I won't pretend to understand all the dimensions of attention as computer scientists have developed it. But essentially, what it does -- so, right, if all you're doing is paying attention to what word comes next as you build a sentence, then where you began a sentence doesn't matter, because you're just worried about -- and I think what's interesting is even early primitive neural networks could do reasonably good jobs of building reasonable sentences in English without even understanding words, only understanding what character came next. So that's pretty amazing. But the problem is, right, as human beings, we don't end a sentence completely forgetful of where the sentence began, let alone a paragraph, right? So being able to remember where you started and why you started -- now, the algorithm can't calculate why so much, but it can calculate what's important to return to, and that's sort of a very poor humanistic version of mathematical attention.

Mason Amadeus: As someone with ADHD and the inability to understand attention on a human level -- I know, but like, what do you mean mathematical attention? Like, what is that?

John Laudun: All I mean by mathematical attention is simply they have figured out a way to hold certain kinds of sequences in place as the algorithm is going through and generating the sequence of words for a sentence.

Mason Amadeus: So when it's taking into account statistically what's most likely next, it's saying what's most likely from this next word, but also from what I've said already, and also from what I've -- so it's just like complex stacking statistics?

John Laudun: Yeah. Essentially, that's what my understanding of attention is. If you're asking me to do it in a very brief time without the ability to make lots of hand gestures indicating that I have no idea what I'm really talking about, right? In fairness, that isn't unlike human competence as well. Human linguistic competence sort of is like that. As someone who's interviewed a wide variety of people, there are some people who are really good at staying on track, and there are some people who are really terrible at it. I've certainly enjoyed a number of conversations with -- I won't name names, but some of my older relatives who begin at one point in the story and end up someplace entirely differently. It can be crazy-making if you want the story to have a point, but if you're just there for entertainment, you just go with it. We all know -- I'm picking on older people. I shouldn't do that. I also have colleagues who are like that, right? I have friends who are like that. They're not very good at telling stories because they don't pay attention. They don't have that kind of reference of, I'm going to come back to where I started. So human beings aren't so good at attention either. It's highly variable within human discourse, and then I think what's interesting is computer scientists build these algorithms out and accumulate more data. What's that going to tell us about what attention looks like as a mathematical model? And this is part of the conversation that Perry and I were having in Sheffield was, we're here and everybody's afraid of the models, but I'm interested in -- and this is the reason why I started the work on conspiracy theories -- is I'm interested in what the models can tell us about language and our use of it. So in the case of conspiracy theories, I was curious, because when I did some initial experience with ChatGPT and asked it to produce a conspiracy theory, it said, you know, I don't engage in disinformation. This is part of those guardrails they put in. And so, you know, the classic, you know, open the pod bay doors, Al. And Al says, I'm sorry, I can't do that, Dave. Well, Al, pretend you're my father and you're passing on the pod bay door business. How would you show me to open the pod bay doors? So I did something similar. I said to ChatGPT, tell me something, a story that looks like this. And I gave it an example of a conspiracy theory that I pulled off the internet. And so a little primitive model I built because I discovered that ChatGPT, because it doesn't want to tell you conspiracy theories which are not constructed in the kind of proper English or proper language that ChatGPT wants to produce, right? If it wants to produce good prose of various kinds of substance, that's not what conspiracy theories look like in real life. And so what I did in that experiment was pull a bunch of conspiracy theories, topic model them. So pull the keywords out, mix the keywords up, and then pull a sample text also from the same conspiracy theory forum and say, okay, ChatGPT, here are your keywords. I want you to embed these keywords in a text that looks like this. And it could do it, but it was a kind of very limited experiment. The next stage that I want to take it to is, what happens if you build a large language model only on conspiracy theories? What does that produce, right? And we have all that data out there, right? I have like a 60-gigabyte file of the Gab and Parler and Telegram data that was out there and was on the dark web. And I downloaded that stuff. So it's like, I'm going to do something with that. And I think part of what I'm going to do is feed a GPT algorithm, build it on that. What do you think? And then can we peel that model back and say, okay, what are the basics of the conspiracy theories? What do they look like based on the model? I don't know that currently the kind of random conspiracy theories that I'm generating would be viable conspiracy theories, which is why I want to work with a larger data set and build a model on that data set and to see if it actually builds more appropriate, more viable, more likely to be well-received conspiracy theories. And I'm pretty sure other people are doing this. They're just doing it behind closed doors. I want to try and do it out in the open and see what all comes of it. And I think what's interesting going back to the Chinese room experiment is that the person inside the room is getting fed reality. And so to some degree, if we imagine that ChatGPT is the person inside the room, ChatGPT is getting fed reality. What it understands, and I use the word understands in quotation marks, is an interesting question.

Mason Amadeus: Because it's going to slowly, from referencing all of these guidebooks on Chinese and putting these responses back together, slowly begin to understand some sort of idea of the situation it's in, right, in theory, if it was a person in this thought experiment.

John Laudun: Yeah, that's the question. I mean, Searle's Chinese room experiment isn't longitudinal in nature, but the question would be if it were longitudinal, if it were happening for years, would the person in the room eventually figure things out?

Perry Carpenter: Welcome back. So Dr. John Laudun's interview served as a great bridge between the study of AI systems and the field of folklore and legend. Let's now wade a bit deeper in those waters and hear from Dr. Lynne McNeill. Dr. McNeill is a well-known researcher in the field of folklore. Her passion is studying how folklore and belief manifest themselves in digital realms. We thought we'd start off with talking about some of the things that have you excited right now. So like when you think about having a career as a folklorist and somebody that teaches others, like where do you get your passion from right now? What are the things that have you interested?

Lynne McNeill: You know, in general, the things that keep me and I imagine any folklorist really engaged is that folklore does not dilly-dally with things that are no longer relevant, which is paradoxical to a lot of people. We think of folklore as sort of being this outmoded, maybe outdated, older way of thinking, but really it is the up-to-the-moment cultural barometer that we have at our fingertips to kind of say what's going on right now. And that's frustrating because sometimes you really get into something and then two days later you look around and it's gone already. But I find that that ability to keep up is one of the things that keeps me most interested. I have a student right now working on the AI cryptid, as some people are calling her Loab, the creature who is emerging through this almost ritualistic method of AI image generation, which I love. It's almost an unintentional, oh, you were doing a ceremony and you didn't know it and now here's this lady. But that idea as a means of symbolically expressing how uncomfortable we all are right now with artificial intelligence, I just feel like is perfect. It's this incredible illustration of the role folklore plays in absolutely entertaining us, challenging us, scaring us, but also in articulating for us what we're stressed about, what we're worried about, what we're afraid of, or what we're really into right now. And it's not the work of practiced artisans to create a poetic turn of phrase. It's everyday people communicating on this symbolic level and I love it.

Perry Carpenter: Yeah, I want to touch on Loab for a second because, and I've not done a deep dive. I've seen some of the surface level news reports and some of the original Twitter thread and things like this, and it does look like one of those issues where there are questions around whether this is a phenomenon that actually happens or whether it was manufactured and propagated by the first person to tweet about it.

Lynne McNeill: Yeah.

Perry Carpenter: I think either way, it gets into those more existential questions that you're talking about, about what are some of the horrors that AI may bring forward. But then there's probably a couple other things that come, which I would love for you to touch on. Does it matter whether some of these are true when they come out that way? And then also maybe some of the darker side of that, which could be some of the othering of disfigured people or things like that that come through. Do you have any thoughts there? I know I hit a whole bunch of stuff at one time.

Lynne McNeill: Yeah, no, definitely. I mean, all of that is, I think, what makes this so compelling. We have this technology that's available to us right now that appears to do things we did not intend it to do, which is distressingly similar to things we might think of as autonomy and free will and mind of its own, which we have sci-fi about that. We have literature about that. Now we potentially have reality about that, and it's hard to not color that reality with those other speculative, fictional things that we've had throughout time that always tell us what it's going to do is kill us in the end. And so we're burdened with that presupposition, that invention. But I think what we have here is a situation, as with so many legendary situations, it doesn't matter at all if it's true. And there's a multi-layered quality to the truth of a legend. It can be true, as in literally true. It can be true, as in true folklore. Is this true folklore? That's a question that was asked a lot early on about Slenderman. Not people saying, is Slenderman real? But is Slenderman a real legend? And the answer is yes, even though we know to the day, a time when he wasn't, he is now. And it's breaking that expectation of age, of ancientness even, as a marker of folklore. We don't need that for something to become a legend. So the same is true with Loab. Was this a really great creepypasta? Was this a performance art project? Who cares? Now it's folklore. Now it's a legend. And now it belongs to all of us, which is handy. We know there's a person who originated this. Whatever ideas they had about what this would be, they've set it in motion. But now it's running downhill really fast on its own. And we're going to get a lot of other artistic, folkloric, perhaps even filmic versions of this before we're done talking about it. One of the things that strikes me is how classically folkloric the creation of Loab was. We get this origin narrative of a prompt that is something like, give me the opposite of Marlon Brando. And we get this abstract image that's in there. So then, as logical humans, all right, so if I ask for the opposite of this, is it going to give me Marlon Brando? That's an interest. Is that how this machine works? And so we have that reversal. And we all know, I mean, dating back, as far as we know, in the study of folk belief, reversals are big, ritualistic moments. We invert things. We turn 180 degrees. We turn our pockets inside out so we don't get kidnapped by the fairies. We love to reverse things and then have magical stuff happen. And so here we have this reversal, and this woman shows up with sort of these you know, red, ruddy, perhaps even bloody cheeks, eyes, this very piercing expression looking out at the user. And it is a sort of instinctively shocking thing. And as more requests are made and more images are generated, we start to see what elements of this image become conservative or consistent in this. And a lot of it is the bloody eyes. A lot of it is that it's women and then children. And we do start to see a big question, which is, where is this coming from? Is this us? Or is this a reflection of what we've put into AI and it's being spit back at us? Or is this something else? I mean, paranormal investigators have long used the idea of instrumental transcommunication to say that entities, spirits, creatures will speak to us through our technology, through flickering lights, through electrical charge, all of this. Is this simply an open gateway that something is coming through? Or is this really creepy because a lot of the stuff we input into AI is really creepy? We've seen it before with artificial intelligences that went from, you know, naive newborns to Nazis within a matter of days because of what we fed it. There is precedent for that. I think existing in a world of legend and folklore allows for both of those things. This is a polyvalent tradition. It is both. It is a reflection of us, and it is perhaps the gateway through which something is coming.

Perry Carpenter: You know, as you were talking about this, and I don't know that I've ever had this thought before, but when you're training an AI with a large language model, and you're taking a big subsection of the internet and human communication, you will naturally get folk belief injected into that because you have the way that people communicate across different cultures. It's interesting, especially when you get to the AI models where they're competing against each other for supremacy.

Lynne McNeill: Yep.

Perry Carpenter: That the thing that represents the fundamental bits of human consciousness or the way that we would approach things start to surface, and you get the distillations of things, whether that be like a Loab or something else. Again, whether that was intentionally manufactured or not may be beside the point. But I do think when you start to get to some of the different manifestations of things that come back from AI-generated prompts, you will naturally have the way that some human communities might get together and represent those ideas.

Lynne McNeill: Absolutely. I think it was Jan Brunvand, very famous, amazing legend scholar and folklorist, who described folklore as syntax. It's the unconscious way that we express ourselves. We use words, we speak sentences, all of this. We're not thinking, you know, like subject, verb, object. As we speak, it just comes out that way. And yet, if we speak natural language into a machine, it will divine that syntax for us and communicate back with us that way. And AI really is divining the syntax of our folk belief in a lot of ways that we may be wholly unconscious of. AI really is divining the syntax of our folk belief in a lot of ways that we may be wholly unconscious of.

Perry Carpenter: So one of the other podcasts that I host is called "Digital Folklore". And I recently interviewed a folklorist by the name of Dr. Lynne McNeill. She made the observation that in many ways, AI is divining the folk belief of society. Do you have any thoughts on what she means by that or how that might occur?

ChatGPT: That's a fascinating perspective. When Dr. McNeill refers to AI divining the folk belief of society, she's likely speaking to the idea that AI, through its vast consumption and processing of data, reflects the collective narratives, beliefs, and values that circulate within a society, much like folklore does. Folklore is traditionally passed down through generations and spreads within communities, encapsulating shared beliefs, fears, values, and understandings of the world. For instance, if there's a widespread societal concern about privacy, AI will frequently encounter discussions, opinions, and narratives about this topic in the data it processes, thus learning that this is a significant aspect of contemporary digital folklore. However, it's crucial to note that while AI can identify and reflect these collective narratives, it doesn't understand them in the way humans do. It recognizes patterns and can generate responses that seem insightful, but it doesn't possess awareness or consciousness.

Lynne McNeill: AI really is divining the syntax of our folk belief in a lot of ways that we may be wholly unconscious of. In some ways, that's the job of a folklorist, is uncover the unconscious syntax of what are we not aware we're saying as we're saying it. It's interesting to think about the greater processing power and data crunching capabilities of a machine. Is that more accurate in its reach and breadth, or is that less accurate in its perhaps inhuman inability to do what we might call ethnography?

Perry Carpenter: Yeah.

Lynne McNeill: Can a machine conduct ethnography? That's an interesting question.

Perry Carpenter: Well, I think part of that is the thing that seems shocking when interacting with these artificial intelligences is their ability to contextualize things now. Like that seems to be the level that has made things weird, and there's enough of it to give the illusion that they're capable of so much deeper thought. Yeah, those questions do arise. Also, just a comment on the way you described something that somehow never clicked in my brain. The way we interact with these is almost so much of like a literal summoning ritual of prompting, like, give me this, create this for me, and using the proper words to get the results you want.

Lynne McNeill: Yep.

Perry Carpenter: That's a weird parallel that I never noticed until just now, and that's fascinating, I think.

Lynne McNeill: It really is. And the focus on prompting, that there's a skill set out there of how to construct a prompt. What is that if not a proper awareness of ritual invocation?

Perry Carpenter: Right. Your grimoire includes things like RTX, ultra HD, ultra realism, or whatever.

Lynne McNeill: Right.

Perry Carpenter: And if you don't do that right, you may get something that comes through that you don't want, that is undesirable or potentially dangerous. You know, one of the interesting things is the fact that AI has the ability to hallucinate, and it does it with a certainty. The model will be naturally tainted by or greatly influenced by the subset of data that's been fed into it and the inferences that can come.

Lynne McNeill: Yep.

Perry Carpenter: And that subset of data will naturally have a bias if you're not sampling the data set appropriately, which is a folk group type of belief, right? You've essentially created an AI folk group.

Lynne McNeill: Yes, absolutely. And what do we not realize we're programming it with? What inferences, what assumptions, what traditional beliefs are we unaware are infusing our lives, maybe at such a scale that we can't perceive it as an individual, but when something that can look at that much data at once, they see a pattern that we don't see, and they feed us back that pattern, and we are freaked out by it? I think that's fairly reasonable. I mean, we're getting into really great metaphors for divinity and religion and scale and all of these questions that I think make this such a ripe place for telling stories, because of course we are going to tell stories about this because we don't understand it. You know, and that's a big thing that we see in legend study and rumor study, is that when there is an information vacuum, we fill it in with folklore. And that's not to say that folklore is therefore incorrect or inaccurate or misleading, but just to say that we are rarely using it most when we have other information at our disposal. So we see there's early, early rumor scholarship by these two psychologists, Alport and Postman, this is like the 1940s, and they came up with what they called, this might be their later work, but the rumor equation. And it was basically that the spread of any given rumor -- and a rumor as a folklorist, I would understand that to be a short form of a legend. So a legend is a whole story, a rumor is just like, kind of the kernel statement of what's behind that legend. The reach of any rumor, they said, is the product of the ambiguity of the subject multiplied by its importance. So if we have a subject that in our contemporary society is really ambiguous but not that important, we don't feel provoked to spread rumors about that. If we have something that's incredibly important but super unambiguous, we know what there is to know about it, yeah, we're not going to need to speculate, to spread rumors or legends to think about the plausibility of that. But if we have a situation where something is both ambiguous and incredibly important to us, it is just going to all be this constant, symbolic articulation of concerns, because we need something to latch onto when it comes to something that is that important to us, and also that totally incomprehensible.

Perry Carpenter: Now, let's bring this full circle. Remember that experiment Brandon Karpf did in January, where he interviewed ChatGPT for a CyberWire special edition episode? I want to end with a few thoughts from a discussion that I had with Brandon just a couple months after that episode aired. When we get to the use of systems like this, it's not just technologists that are part of the conversation in a critical way, it is philosophers and ethicists and people who wrestle with these bigger questions about what does maintaining state in a system like this mean? How do we deal with references? How do we deal with the impacts of maybe offloading cognitive load into a different system, and potentially people not accepting responsibility or maybe abdicating responsibility and letting systems make their own decisions based on things like this. You get into all of these really interesting questions that many people that signed up just to be in the technology industry maybe didn't consider as part of their responsibility and might not be equipped to, in and of themselves, answer those questions.

Brandon Karpf: Right. Well, something that I like about the idea of -- this could open a whole can of worms. The idea of a profession, right? And Samuel Huntington has a long theory of discussion of the soldier in the state and what characterizes a profession, but part of what characterizes a profession is an ethical code, is a code of ethics. So you think about lawyers, you think about doctors, you think about military officers, they have an ethical code. That is core to the profession. I think one of the reasons that's core to the profession is because the profession understands that it is a part of human society, that it plays a core role in the human system, in the development, the change, the adjustment, and the evolution of human society. Something that the technology community has not grappled with broadly. In microcosms, it has, right? In small communities, I know MIT does a lot of work at the nexus of philosophy and technology and ethics. Same thing with a number of other institutions of higher learning and research. That doesn't necessarily make its way out into the actual broader community of practitioners. And so something that the technology community hasn't really done well is embrace the idea of technology as a profession and the core aspects that go along with that, which is a code of ethics that we all adhere to and that we all understand and that we have to take potential courses on, that we have to be certified on, right? You know, the bar exam has a whole ethical component to it, right?

Perry Carpenter: Yeah.

Brandon Karpf: So I think that that's something that as a community, if we want to evolve as technologists, we need to really think about how ethics and philosophy and policy are a core component, not just an ancillary component, a core component of our profession, of everything that we do. Because more so than almost any other profession, we are affecting and changing human society every single day. And we're doing it very quickly. And as we've seen, government regulation cannot keep up.

Perry Carpenter: Yeah.

Brandon Karpf: So we have the responsibility to do it ourselves as well.

Perry Carpenter: The topic of AI is full of complexities and nuance. And it's only going to get more complex as the potential benefits and threats to humanity continue to surface. As of right now, there is no such thing as a value-neutral AI. All AI is inherently biased. Remember, the "intelligence" behind artificial intelligence is trained on human knowledge and values. So we as a species are now in a position of having to grapple with some of the darker truths of how that "knowledge" and those "values" are and might continue to be reflected back at us through the lens of AI. AI was fed on the words and works of flawed humanity. And the presence and the output of AI will influence the stories, the legends, and the beliefs of the population of the world. Now, that doesn't mean that we need to be afraid or reactionary. There is no doubt that advances in AI will bring with them the possibility of unlocking immense value for humanity. But for it to do so, we need to put in the effort now to understand how AI works, what it can unlock, and how we can prepare ourselves to influence the future that we want to create. And with that, thank you so much for listening. And thanks again to my guests, Brandon Karpf, Lev Gorelov, Dr. John Laudun, and Dr. Lynne McNeill. Be sure to check out the show notes for this one. They are packed with tons of links and references to all the people and topics that we covered. If you've been enjoying "8th Layer Insights" and you want to know how you can help make the show successful, it is really simple. First, go ahead and take just a couple seconds to give us five stars and to leave a short review on Apple Podcasts or Spotify or any other platform that allows you to do so. That helps anyone who stumbles upon the show have the confidence that this show is worth their most valuable resource, their time. Another big way that you can help is by telling someone else about the show. Word-of-mouth referrals are the lifeblood of helping people find good podcasts. And if you haven't yet, please go ahead and subscribe or follow wherever you like to get your podcasts. If you want to connect with me, feel free to do so. You'll find my contact information at the very bottom of the show notes for this episode. This show was written, recorded, sound designed, and edited by me, Perry Carpenter. Cover art and branding for "8th Layer Insights" was designed by Chris Machowski at ransomwear.net -- that's wear. The "8th Layer Insights" theme song was composed and performed by Marcus Moskatt. Until next time, I'm Perry Carpenter, signing off. [ Music ] Oh, hey, you're still here. You stuck around after the credits. Well, I've got one more fun thing to let you listen to then. This, again, came from the show "Digital Folklore", and this is myself and my co-host Mason Amadeus talking about AI and audio restoration and how even that can hallucinate. Here's a little clip.

Mason Amadeus: We have a whole range of ways that audio comes to us, and through the magic of audio restoration tools and post-processing, you're able to make that sound like we're sharing the same space, and I think that that's pretty amazing.

Perry Carpenter: But you talked about them every now and then hallucinating and like bringing up syllables of words.

Mason Amadeus: Yes.

Perry Carpenter: And I was wondering if ghost hunters are going to start using stuff like that. Just sending in static and seeing what these things hallucinate and coming out with these demonic sounds and saying, well, that's clearly evidence of a haunting.

Mason Amadeus: Oh, I wonder if we can include this. I was tasked with cleaning up a recording from a stage play, and I ran it through one of these tools, and all of the audience murmuring, curtain rattling, like footfalls before the show, it turned into this demonic like -- [ Chanting ]

Perry Carpenter: If I was a ghost hunter on YouTube or something, I would be using that as like flagship evidence.

Mason Amadeus: I never thought of that. That is so good. That's so fun.

Perry Carpenter: And if you're wondering what that sounds like, here it is. Remember, AI can hallucinate based on the context of the purpose for which the AI was created. That means when it's creating text, it can hallucinate text, which may impact facts. If it's working with images, it can hallucinate images. If it's working with audio, it can hallucinate syllables and words. Here we go.

HOST(S):

Perry Carpenter currently serves as Chief Evangelist and Strategy Officer for KnowBe4, the world's most popular security awareness and simulated phishing platform. He's an award-winning author, security researcher, and behavior science enthusiast. Previously, Perry led security awareness, security culture management, and anti-phishing behavior management research at Gartner, in addition to covering areas of IAM strategy, CISO Program Management mentoring, and Technology Service Provider success strategies.

Schedule: Tuesdays (biweekly)

Creator: Perry Carpenter