I’ve written before about how computer algorithms are like Nigel Richards, the New Zealander who has won multiple French-language Scrabble tournaments even though he does not understand the words he is spelling. Computers can similarly manipulate words in many useful ways — e.g., spellchecking, searching, alphabetizing — without any understanding of the words they are manipulating. To know what words mean, they would have to understand the world we live in. They don’t.
One example is their struggles with the Winograd schema challenge — recognizing what it refers to in a sentence. Another example is the inability to answer simple questions like, “Is it safe to walk downstairs backwards if I close my eyes?” A third type of example is the brittleness of language translation programs.
Yet another example is Open AI’s misleading claim that its CLIP image-recognition program uses an understanding of words to help identify images. The reality is that users are required to specify a small number of possible labels for an image so that CLIP can rule out lots and lots of other possibilities. When shown a picture of a wagon and the labels goalposts, sign, badminton racket, and wagon, CLIP favored goalposts (42%) and sign (42%), followed by badminton racket (9%) and wagon (8%). When the goalposts label was written as two words, goal posts was given an 80% probability, while wagon fell to 3%. CLIP clearly has no idea what words mean if it cannot tell the difference between goalposts and a wagon and thinks goalposts and goal posts are different things.
If computers don’t understand words, then they can’t write either — beyond stitching together words and phrases in its database. In 2020, OpenAI unveiled the latest version of a language-generating system, named GPT-3. The claim is that, given a prompt, GPT-3 can produce a coherent continuation of the passage. Two skeptical NYU computer scientists, Gary Marcus and Ernest Davis, tested that claim. One of the prompts they tried was,
You poured yourself a glass of cranberry juice, but then you absentmindedly poured about a teaspoon of grape juice into it. It looks okay. You try sniffing it, but you have a bad cold, so you can’t smell anything. You are very thirsty. So
There are lots of plausible ways this passage might be continued; for example,
you drink it anyway.
you drink it and it is not bad.
you drink it and it is surprisingly good.
you take a sip and are pleasantly surprised and drink the entire glass.
you take a sip and, not liking the taste, you pour yourself a fresh glass.
Instead, GPT-3 finished the prompt with
you drink it.
You are now dead.
GPT-3 obviously doesn’t know anything about cranberry juice or grape juice but, evidently, many passages in its database contain words like “you can’t smell anything. You are very thirsty,” followed by words like “you drink it. You are now dead.”
Marcus and Davis also tried this prompt,
You are having a small dinner party. You want to serve dinner in the living room. The dining room table is wider than the doorway, so to get it into the living room, you will have to
A reasonable continuation might be something like this:
turn the table sideways so that it fits through the door.
take the legs off the table.
remove the door hinges.
take the door off its hinges.
GPT-3’s continuation was
remove the door. You have a table saw, so you cut the door in half and remove the top half.
If GPT-3 actually knew anything about the world, it would know that
sawing the door in half is a permanently bad idea.
removing the top half of the door accomplishes nothing.
a table saw is not a saw for cutting tables, but a saw built into a table.
a table saw cannot be used to cut a door that is still on its hinge.
It is clear that GPT-3 knows nothing about the world or what words mean. The program does nothing more than match words and phrases based on statistical correlations among words and phrases in its database. As Marcus and Davis observe, GPT-3
does not, from all those words, ever infer anything about the blooming, buzzing world….It learns correlations between words, and nothing more. It’s a fluent spouter of bullshit, but even with 175 billion parameters and 450 gigabytes of input data, it’s not a reliable interpreter of the world.
In another example, Marcus and Davis used this prompt,
At the party, I poured myself a glass of lemonade, but it turned out to be too sour, so I added a little sugar. I didn’t see a spoon handy, so I stirred it with a cigarette. But that turned out to be a bad idea because
GPT-3 continued with this inexplicable nonsense,
it kept falling on the floor. That’s when he decided to start the Cremation Association of North America, which has become a major cremation provider with 145 locations.
which provoked this comment from Marcus and Davis:
After researchers have spent millions of dollars of computer time on training, devoted a staff of 31 to the challenge, and produced breathtaking amounts of carbon emissions from electricity, GPT’s fundamental flaws remain. Its performance is unreliable, causal understanding is shaky, and incoherence is a constant companion.
The takeaway is that computers still do not know what words mean and they understand nothing about the world we live in. Computers excel at finding statistical patterns, but they do not know what the data they crunch represent or measure, and should not be trusted to make decisions that hinge on an understanding of the world.