Iain's blog: ChatGPT Is Dumber Than You Think

Wednesday, May 10, 2023

ChatGPT Is Dumber Than You Think

‘It’s a critic of technology, I must say that the enthusiasm for ChatGPT, a large-language model trained by OpenAI, is misplaced. Although it may be impressive from a technical standpoint, the idea of relying on a machine to have conversations and generate responses raises serious concerns.

First and foremost, ChatGPT lacks the ability to truly understand the complexity of human language and conversation. It is simply trained to generate words based on a given input, but it does not have the ability to truly comprehend the meaning behind those words. This means that any responses it generates are likely to be shallow and lacking in depth and insight.

Enjoy a year of unlimited access to The Atlantic—including every story on our site and app, subscriber newsletters, and more.
Become a Subscriber

Furthermore, the reliance on ChatGPT for conversation raises ethical concerns. If people begin to rely on a machine to have conversations for them, it could lead to a loss of genuine human connection. The ability to connect with others through conversation is a fundamental aspect of being human, and outsourcing that to a machine could have detrimental side effects on our society.

Hold up, though. I, Ian Bogost, did not actually write the previous three paragraphs. A friend sent them to me as screenshots from his session with ChatGPT, a program released last week by OpenAI that one interacts with by typing into a chat window. It is, indeed, a large language model (or LLM), a type of deep-learning software that can generate new text once trained on massive amounts of existing written material. My friend’s prompt was this: “Create a critique of enthusiasm for ChatGPT in the style of Ian Bogost.”

ChatGPT wrote more, but I spared you the rest because it was so boring. The AI wrote another paragraph about accountability (“If ChatGPT says or does something inappropriate, who is to blame?”), and then a concluding paragraph that restated the rest (it even began, “In conclusion, …”). In short, it wrote a basic, high-school-style five-paragraph essay.

That fact might comfort or frighten you, depending on your predilections. When OpenAI released ChatGPT to the public last week, the first and most common reaction I saw was fear that it would upend education. “You can no longer give take-home exams,” Kevin Bryan, a University of Toronto professor, posted on Twitter. “I think chat.openai.com may actually spell the end of writing assignments,” wrote Samuel Bagg, a University of South Carolina political scientist. That’s the fear.

Don’t miss what matters. Sign up for The Atlantic Daily newsletter.
Email Address

Your newsletter subscriptions are subject to The Atlantic’s Privacy Policy and Terms and Conditions.

But you may find comfort in knowing that the bot’s output, while fluent and persuasive as text, is consistently uninteresting as prose. It’s formulaic in structure, style, and content. John Warner, the author of the book Why They Can’t Write, has been railing against the five-paragraph essay for years and wrote a Twitter thread about how ChatGPT reflects this rules-based, standardized form of writing: “Students were essentially trained to produce imitations of writing,” he tweeted. The AI can generate credible writing, but only because writing, and our expectations for it, has become so unaspiring.
Recommended Reading
1
2
3
4
5
6
7
8

A woman uses an office phone at a desk.
Why No One Answers Their Phone Anymore
Alexis C. Madrigal
Two male figures recline on a grassy field. Manhattan's skyscrapers rise in the background.
The East Coast Is Going to Get Arkansas-ified
Robinson Meyer
How a Guy From a Montana Trailer Park Overturned 150 Years of Biology
Ed Yong

Even pretending to fool the reader by passing off an AI copy as one’s own, like I did above, has become a tired trope, an expected turn in a too-long Twitter thread about the future of generative AI rather than a startling revelation about its capacities. On the one hand, yes, ChatGPT is capable of producing prose that looks convincing. But on the other hand, what it means to be convincing depends on context. The kind of prose you might find engaging and even startling in the context of a generative encounter with an AI suddenly seems just terrible in the context of a professional essay published in a magazine such as The Atlantic. And, as Warner’s comments clarify, the writing you might find persuasive as a teacher (or marketing manager or lawyer or journalist or whatever else) might have been so by virtue of position rather than meaning: The essay was extant and competent; the report was in your inbox on time; the newspaper article communicated apparent facts that you were able to accept or reject.

Perhaps ChatGPT and the technologies that underlie it are less about persuasive writing and more about superb bullshitting. A bullshitter plays with the truth for bad reasons—to get away with something. Initial response to ChatGPT assumes as much: that it is a tool to help people contrive student essays, or news writing, or whatever else. It’s an easy conclusion for those who assume that AI is meant to replace human creativity rather than amend it.

The internet, and the whole technology sector on which it floats, feels like a giant organ for bullshittery—for upscaling human access to speech and for amplifying lies. Online, people cheat and dupe and skirmish with one another. Deep-learning AI worsens all this by hiding the operation of software such as LLMs such that nobody, not even their creators, can explain what they do and why. OpenAI presents its work as context-free and experimental, with no specific use cases—it says it published ChatGPT just to “get users’ feedback and learn about its strengths and weaknesses.” It’s no wonder the first and most obvious assumption to make about ChatGPT is that it is a threat—to something, to everything.

Make your inbox more interesting with newsletters from your favorite Atlantic writers.
Browse Newsletters

But ChatGPT isn’t a step along the path to an artificial general intelligence that understands all human knowledge and texts; it’s merely an instrument for playing with all that knowledge and all those texts. Play just involves working with raw materials in order to see what they can do. You play a game, or an instrument, to avail yourself of familiar materials in an unexpected way. LLMs are surely not going to replace college or magazines or middle managers. But they do offer those and other domains a new instrument—that’s really the right word for it—with which to play with an unfathomable quantity of textual material.

I talked with ChatGPT about a wide range of topics, including poetry, behavioral psychology, and computer programming, in order to test this theory. Right now, ChatGPT is just a tech demo, a research experiment. Less clear is how it might be used, beyond the dire predictions about what sectors its technology might upend. In almost every case, the AI appeared to possess both knowledge and the means to express it. But when pressed—and the chat interface makes it easy to do so—the bot almost always had to admit that it was just making things up.

Sometimes it simply used templates. When I asked ChatGPT to generate a cover letter for a university job, it obliged in a competent but uninspired way. It also produced exactly the same letter for a job as a magazine editor as it did for a job as a cannabis innovator in the Web3 space (“I have a deep understanding of the web3 space and the unique challenges and opportunities it presents”). Fair enough: Most cover letters are terrible, after all, repeating the same trite lines and abstract gestures.

In other cases, the bot produced creative output that was surprisingly synthetic but riddled with flaws. When I asked it to generate an imagist poem in the vein of Ezra Pound or William Carlos Williams about hamburgers, it obliged with this short refrain:
1
2
3
4
5

Juicy patty, sizzling hot
Toasted bun, crisp lettuce and tomato
A symphony of flavors, a feast for the senses
Savory, satisfying, a true
American classic.

Not bad, I guess, especially for a computer. But also, the poem is not really an example of imagism, which strives to evoke a precise image by means of deliberate language (something like, “The dense honesty of / A hamburger.”) So I called ChatGPT on it. “This is not an imagist poem.” Its response is worth reproducing in full:
1

You are correct, the poem that I generated is still not an imagist poem. An imagist poem uses precise, vivid imagery to convey a specific idea or emotion, and focuses on a single image or moment. The poem that I generated uses a more narrative and descriptive style, and does not focus on a single, specific image. It describes the ingredients and flavors of a hamburger, but does not use precise and vivid imagery to convey a specific idea or emotion. I apologize if my previous responses did not meet your expectations, and I will do my best to assist you with any further questions or prompts that you may have.

I also urged the AI to generate a lai (a medieval narrative poem) in the style of Marie de France about the beloved, Texas-based fast-food chain Whataburger. It sure as heck looked a lot like a lai, beginning like this:
1
2
3
4

In the land of Whataburger, where burgers and fries are king,
A stranger came to visit, and with him brought a ring.
He wore a cloak of velvet, and a hat upon his head,
And he smiled with a twinkle in his eye as he approached the counter, where the burgers are served hot and fresh, and the fries are crispy and golden.

Except for that last line, which is a metrical mess. The AI wasn’t able (or willing) to evaluate its work (“I am a large language model trained by OpenAI and I don’t have the ability to evaluate the quality of literary works or make aesthetic judgments.”), but it was able to tell me about the typical structure of a lai—a short, narrative poem of the 12th century with a particular structure and form, usually written in octosyllabic couplets. The lyrical errors in the generated lai arise from its failure to adhere to that metrical form.

When I asked about the matter, it admitted again that, no, the lai it had written was not structured in octosyllabic couplets, claiming that it had produced “a more modern and playful take on the form of the lai.” I was starting to feel like I was negotiating with a student who had come to office hours to complain about their grade.

This happened over and over again. I asked for source code for an Atari game about scooping cat litter, and the AI sent me valid programming instructions—it understood the assignment—but only disconnected snippets of actual code with the heading comment “This program creates a simple game where the player must use a scoop to pick up their cat’s litters and put them in a trash bin.” It was an icon of the answer I sought rather than the answer itself.

Then I asked for a set of diagnostic criteria for an invented psychological condition I named “Inherited Truculence.” The AI instantly obliged:
1
2
3
4
5
6
7

Diagnostic Criteria for Inherited Truculence:

    A tendency to experience and express aggressive or hostile thoughts and feelings, particularly in response to perceived threats or challenges.
    A history of aggressive or hostile behavior, particularly in interpersonal or social situations.
    A familial history of aggressive or hostile behavior, indicating a possible genetic or hereditary component to the condition.
    Impairment in social, occupational, or other important areas of functioning as a result of aggressive or hostile behavior.
    The symptoms are not better explained by another mental disorder or by the effects of a substance (such as drugs or medication).

Except, truculence doesn’t really amount to “aggressive and hostile thoughts and feelings,” does it? I asked, and the AI agreed, eventually revising its diagnostics accordingly at my further prompting (“A tendency to experience and express defiant or confrontational thoughts and feelings,” and so forth).

Likewise, my prompt to offer both an ontological and an epistemological answer to the question When does bread become toast? yielded a generally fluent, seemingly philosophical reply. It read in part: “when we are able to recognize it as toast based on its appearance, texture, and other sensory characteristics.”

But the philosophy of toast is an area of expertise for me, so I was able to discern that this snippet, supposedly an epistemological argument, was really about perception. “Isn’t the second a phenomenal, and not an epistemological argument?” I challenged. And once more, the AI cried uncle without hesitation: “You are correct, the second argument that I provided is a phenomenal, not an epistemological, argument. An epistemological argument focuses on how we come to know or understand something, whereas a phenomenal argument focuses on our experience or perception of something.”

At this point, talking to ChatGPT began to feel like every other interaction one has on the internet, where some guy (always a guy) tries to convert the skim of a Wikipedia article into a case of definitive expertise. Except ChatGPT was always willing to admit that it was wrong. Instantly and without dispute. And in each case, the bot also knew, with reasonable accuracy, why it was wrong. That sounds good but is actually pretty terrible: If one already needs to possess the expertise to identify the problems with LLM-generated text, but the purpose of LLM-generated text is to obviate the need for such knowledge, then we’re in a sour pickle indeed. Maybe it’s time for that paragraph on accountability after all.

But that’s not ChatGPT’s aim. It doesn’t make accurate arguments or express creativity, but instead produces textual material in a form corresponding with the requester’s explicit or implicit intent, which might also contain truth under certain circumstances. That is, alas, an accurate account of textual matter of all kinds: online, in books, on Wikipedia, and well beyond.

Proponents of LLM generativity may brush off this concern. Some will do so by glorifying GPT’s obvious and fully realized genius, in embarrassing ways that I can only bear to link to rather than repeat. Others, more measured but no less bewitched, may claim that “it’s still early days” for a technology a mere few years old but that can already generate reasonably good 12th-century lyric poems about Whataburger. But these are the sentiments of the IT-guy personalities who have most mucked up computational and online life, which is just to say life itself. OpenAI assumes that its work is fated to evolve into an artificial general intelligence—a machine that can do anything. Instead, we should adopt a less ambitious but more likely goal for ChatGPT and its successors: They offer an interface into the textual infinity of digitized life, an otherwise impenetrable space that few humans can use effectively in the present.

To explain what I mean by that, let me show you a quite different exchange I had with ChatGPT, one in which I used it to help me find my way through the textual murk rather than to fool me with its prowess as a wordsmith.

“I’m looking for a specific kind of window covering, but I don’t know what it’s called.” I told the bot. “It’s a kind of blind, I think. What kinds are there?” ChatGPT responded with a litany of window dressings, which was fine. I clarified that I had something in mind that was sort of like a roller blind but made of fabric. “Based on the description you have provided, it sounds like you may be thinking of a roman shade,” it replied, offering more detail and a mini sales pitch for this fenestral technology.

My dearest reader, I do in fact know what a Roman shade is. But lacking that knowledge and nevertheless needing to deploy it in order to make sense of the world—this is exactly the kind of act that is very hard to do with computers today. To accomplish something in the world often boils down to mustering a set of stock materials into the expected linguistic form. That’s true for Google or Amazon, where searches for window coverings or anything else now fail most of the time, requiring time-consuming, tightrope-like finagling to get the machinery to point you in even the general direction of an answer. But it’s also true for student essays, thank-you notes, cover letters, marketing reports, and perhaps even medieval lais (insofar as anyone would aim to create one). We are all faking it with words already. We are drowning in an ocean of content, desperate for form’s life raft.

ChatGPT offers that shape, but—and here’s where the bot did get my position accidentally correct, in part—it doesn’t do so by means of knowledge. The AI doesn’t understand or even compose text. It offers a way to probe text, to play with text, to mold and shape an infinity of prose across a huge variety of domains, including literature and science and shitposting, into structures in which further questions can be asked and, on occasion, answered.

GPT and other large language models are aesthetic instruments rather than epistemological ones. Imagine a weird, unholy synthesizer whose buttons sample textual information, style, and semantics. Such a thing is compelling not because it offers answers in the form of text, but because it makes it possible to play text—all the text, almost—like an instrument.

That outcome could be revelatory! But a huge obstacle stands in the way of achieving it: people, who don’t know what the hell to make of LLMs, ChatGPT, and all the other generative AI systems that have appeared. Their creators haven’t helped, perhaps partly because they don’t know what these things are for either. OpenAI offers no framing for ChatGPT, presenting it as an experiment to help “make AI systems more natural to interact with,” a worthwhile but deeply unambitious goal. Absent further structure, it’s no surprise that ChatGPT’s users frame their own creations as either existential threats or perfected accomplishments. Neither outcome is true, but both are also boring. Imagine worrying about the fate of take-home essay exams, a stupid format that everyone hates but nobody has the courage to kill. But likewise, imagine nitpicking with a computer that just composed something reminiscent of a medieval poem about a burger joint because its lines don’t all have the right meter! Sure, you can take advantage of that opportunity to cheat on school exams or fake your way through your job. That’s what a boring person would do. That’s what a computer would expect.

Computers have never been instruments of reason that can solve matters of human concern; they’re just apparatuses that structure human experience through a very particular, extremely powerful method of symbol manipulation. That makes them aesthetic objects as much as functional ones. GPT and its cousins offer an opportunity to take them up on the offer—to use computers not to carry out tasks but to mess around with the world they have created. Or better: to destroy it.

Wednesday, May 10, 2023

ChatGPT Is Dumber Than You Think

No comments :