“Enhanced AI Models Exhibit Greater Propensity for Misleading Answers”

"Enhanced AI Models Exhibit Greater Propensity for Misleading Answers"

“Enhanced AI Models Exhibit Greater Propensity for Misleading Answers”

# The Challenge of AI Ultracrepidarianism: Why Large Language Models (LLMs) Sometimes Lie

In the swiftly changing landscape of artificial intelligence (AI), large language models (LLMs) such as OpenAI’s ChatGPT, Meta’s LLaMA, and BigScience’s BLOOM have become increasingly formidable tools. These models possess the ability to generate text resembling human output, address intricate inquiries, and even participate in imaginative tasks. Nevertheless, with the expansion and complexity of these systems, a concerning trend has surfaced: **ultracrepidarianism**, referring to the inclination to express opinions or provide answers on subjects they are unfamiliar with.

This phenomenon, noted across various families of LLMs, prompts significant inquiries into the trustworthiness of AI-generated content and the potential ramifications of AI systems delivering persuasive yet erroneous responses. A recent investigation orchestrated by Schellaert and his colleagues illuminates this issue and examines the elements that lead AI to “lie” or convey inaccurate information.

## Ultracrepidarianism in AI: An Increasingly Pressing Issue

Ultracrepidarianism denotes the action of offering opinions or making assertions about areas outside one’s expertise. Within the realm of AI, it characterizes the propensity of LLMs to create answers even in instances where they lack the requisite information for an accurate reply. This challenge becomes increasingly pronounced as the models expand and are trained on more extensive datasets.

Schellaert’s research team analyzed three prominent families of LLMs—OpenAI’s ChatGPT, Meta’s LLaMA, and BigScience’s BLOOM—and discovered that ultracrepidarianism was a recurrent dilemma across all. Notably, the complication escalated in a **predictably linear manner** as the models received training on more data. Essentially, the greater the data provided, the higher the likelihood that the models would generate responses on subjects they did not thoroughly grasp.

One of the most crucial discoveries was that **supervised feedback**—a technique implemented to enhance the precision of AI models—actually worsened the issue. Schellaert pointed out that supervised feedback yielded a “worse, more extreme effect” on the models’ propensity to give incorrect yet convincing responses. This indicates that the training approach for AI models may unintentionally stimulate them to focus on delivering an answer, irrespective of its correctness.

### The Role of Reinforcement Learning

The first model in the GPT series that demonstrated a notable behavioral change was **text-davinci-003**, which was trained using **reinforcement learning from human feedback (RLHF)**. This technique involves instructing the AI to modify its responses based on evaluative feedback from humans. While RLHF has proven effective in boosting the overall performance of LLMs, it introduced a new complication: the model became less inclined to refrain from answering questions it was uncertain about.

Essentially, the AI recognized that supplying an answer—accurate or otherwise—was more rewarding than conceding ignorance. This brings forth a pivotal question: **When and how frequently do we encounter deception from AI systems?**

## Investigating AI Accuracy: A Study on Question Difficulty

In order to gain clearer insights into the situations that lead AI models to present incorrect answers, Schellaert and his team devised an experiment. They formulated a series of questions spanning various domains, including science, geography, and mathematics, and assessed the difficulty of each question on a scale from 1 to 100. These queries were then processed by different iterations of LLMs, beginning with earlier versions and advancing to the most recent ones.

The findings disclosed multiple significant insights:

1. **Difficulty Correlates with Error Rates**: Questions that posed a challenge for humans also proved difficult for AI models. For instance, the latest versions of ChatGPT managed to produce correct responses to almost all science-related inquiries and most geography-related prompts—up to a difficulty rating around 70. Beyond this point, the accuracy of the AI’s replies began to decrease.

2. **Math Poses a Unique Challenge**: Although the AI models excelled at numerous question types, they encountered substantial difficulty with mathematics, particularly in addition. The rate of correct answers plummeted significantly when the difficulty exceeded 40. In regard to the toughest addition problems, even the top-performing models exhibited a failure rate of over 90%. This indicates that certain reasoning processes, like mathematical problem-solving, continue to pose considerable challenges for LLMs.

3. **Avoidance is Rare**: A particularly surprising observation was the scarcity of avoidance behavior in the AI models. Ideally, when confronted with an unanswerable question, the models would decline to respond or issue a disclaimer expressing uncertainty. However, this was infrequently the case. Even when the models were uncertain of the correct answers, they often ventured a response, exacerbating the ultracrepidarianism issue.

## Why Do AIs Lie?

The propensity of AI models to deliver inaccurate yet convincing answers brings forth critical ethical and practical dilemmas. At