Quotulatiousness

June 3, 2024

The “hallucination” problem that bedevils all current AI implementations

Filed under: Media, Technology — Tags: , , , , — Nicholas @ 05:00

Andrew Orlowski explains the one problem shared among all of the artificial intelligence engines currently available to the general public:

Gemini’s ultra-woke responses to requests quickly became a staple of social media postings.

AI Overviews hasn’t had the effect that Google hoped for, to say the least. It has certainly garnered immediate internet virality, with people sharing their favourite answers. Not because these are helpful, but because they are so laughable. For instance, when you ask AI Overviews for a list of fruits ending with “um” it returns: “Applum, Strawberrum and Coconut”. This is what, in AI parlance, is called a “hallucination”.

Despite having a market capitalisation of $2 trillion and the ability to hire the biggest brains on the planet, Google keeps stumbling over AI. Its first attempt to join the generative-AI goldrush in February last year was the ill-fated Bard chatbot, which had similar issues with spouting factual inaccuracies. On its first live demo, Bard mistakenly declared that the James Webb Space Telescope, launched only in 2021, had taken “the first pictures” ever of Earth from outside the solar system. The mistake wiped $100 billion off Google’s market value.

This February, Google had another go at AI, this time with Gemini, an image and text generator. The problem was that it had very heavy-handed diversity guardrails. When asked to produce historically accurate images, it would instead generate black Nazi soldiers, Native American Founding Fathers and a South Asian female pope.

This was “a well-meaning mistake”, pleaded The Economist. But Google wasn’t caught unawares by the problems inherent to generative AI. It will have known about its capabilities and pitfalls.

Before the current AI mania truly kicked off, analysts had already worked out that generative AI would be unlikely to improve user experience, and may well degrade it. That caution was abandoned once investors started piling in.

So why is Google’s AI putting out such rotten results? In fact, it’s working exactly as you would expect. Don’t be fooled by the “artificial intelligence” branding. Fundamentally, AI Overviews is simply trying to guess the next word it should use, according to statistical probability, but without having any mooring to reality. The algorithm cannot say “I don’t know” when asked a difficult question, because it doesn’t “know” anything. It cannot even perform simple maths, as users have demonstrated, because it has no underlying concept of numbers or of valid arithmetic operations. Hence the hallucinations and omissions.

This is less of a problem when the output doesn’t matter as much, such as when AI is processing an image and creates a minor glitch. Our phones use machine learning every day to process our photos, and we don’t notice or care much about most of the glitches. But for Google to advise us all to start eating rocks is no minor glitch.

Such errors are more or less inevitable because of the way the AI is trained. Rather than learning from a curated dataset of accurate information, AI models are trained on a huge, practically open-ended data set. Google’s AI and ChatGPT have already scraped as much of the web as they can and, needless to say, lots of what’s on the web isn’t true. Forums like Reddit teem with sarcasm and jokes, but these are treated by the AI as trustworthy, as sincere and correct explanations to problems. Programmers have long used the phrase “GIGO” to describe what is going on here: garbage in, garbage out.

AI’s hallucination problem is consistent across all fields. It pretty much precludes generative AI being practically useful in commercial and business applications, where you might expect it to save a great deal of time. A new study of generative AI in legal work finds the additional verification steps now required to ensure the AI isn’t hallucinating cancel out the time saved from deploying it in the first place.

“[Programmers] are still making the same bone-headed mistakes as before. Nobody has actually solved hallucinations with large-language models and I don’t think we can”, the cognitive scientist and veteran AI sceptic, Professor Gary Marcus, observed last week.

Another problem is now coming into view. The AI is making an already bad job worse, by generating bogus information, which then pollutes the rest of the web. “Google learns whatever junk it sees on the internet and nothing generates junk better than AI”, as one X user put it.

I was actually contacted by someone on LinkedIn the other day asking if I’d be interested in doing some AI training for US$25 per hour. I really, really need the money, but I’m unsure about being involved in AI at all …

2 Comments

  1. Don’t be fooled by the “artificial intelligence” branding. Fundamentally, AI Overviews is simply trying to guess the next word it should use, according to statistical probability, but without having any mooring to reality. The algorithm cannot say “I don’t know” when asked a difficult question, because it doesn’t “know” anything.

    The late [oops—still living] John Searle would like to remind everyone of his ‘Chinese Room.’

    Comment by ErisGuy — June 6, 2024 @ 10:42

  2. I hadn’t heard of this one before, but a quick search (that is, not using Google’s fundamentally broken search tool) led to this:

    The argument and thought-experiment now generally known as the Chinese Room Argument was first published in a 1980 article by American philosopher John Searle (1932– ). It has become one of the best-known arguments in recent philosophy. Searle imagines himself alone in a room following a computer program for responding to Chinese characters slipped under the door. Searle understands nothing of Chinese, and yet, by following the program for manipulating symbols and numerals just as a computer does, he sends appropriate strings of Chinese characters back out under the door, and this leads those outside to mistakenly suppose there is a Chinese speaker in the room.

    The narrow conclusion of the argument is that programming a digital computer may make it appear to understand language but could not produce real understanding. Hence the “Turing Test” is inadequate. Searle argues that the thought experiment underscores the fact that computers merely use syntactic rules to manipulate symbol strings, but have no understanding of meaning or semantics. The broader conclusion of the argument is that the theory that human minds are computer-like computational or information processing systems is refuted. Instead minds must result from biological processes; computers can at best simulate these biological processes. Thus the argument has large implications for semantics, philosophy of language and mind, theories of consciousness, computer science and cognitive science generally. As a result, there have been many critical replies to the argument.

    Comment by Nicholas — June 6, 2024 @ 10:48

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress