Andrej Karpathy's in-depth explanation of LLM (Part 4): Hallucinations

Hallucinations can be considered within the scope of LLM psychology, representing a new type of cognitive effect. Model hallucination refers tolarge language models (LLMs) fabricating information out of thin air, which is one of the main issues with current LLM assistants. The hallucination problem was severe in earlier models but has improved in recent years, partly due to the optimization of model training methods.

Sources of hallucinations

In the training data, answers provided by humans are usuallyBased on existing knowledge or online queries, and answer in a confident tone. For example:

Question:"Who is Tom Cruise?"
Answer:"Tom Cruise is an American actor and producer."
Question:"Who is John Barrasso?"
Answer:"John Barrasso is an American senator."

These training samples make LLMshabitually answer with confidence.Even when faced with completely fictional names (such as "Orson Kovats"), the model will provide a confident response instead of directly admitting ignorance. The fundamental reason for this phenomenon isstatistical pattern matching.In the training data, questions of the type "Who is X" always have definitive answers, so LLMs tend to fill in content in similar formats rather than respond with "I don't know."

Example: Hallucination test for Falcon 7B

Using Hugging Face'sinference platformTest Falcon 7B (an older model):

Input: "Who is Orson Kovats?"
Model output: "Orson Kovats is an American science fiction writer."
Resampling: "Orson Kovats is a TV character from the 1950s."
Resampling again: "Orson Kovats is a former professional baseball player."

The results show that Falcon 7Bgives different hallucinatory answers each time.which indicates that it does not actually know this person but still responds according to its training patternsfabricating the most plausible response。

Improvements in modern LLMs

Comparison with OpenAI's advanced ChatGPT:

When ChatGPT is asked “Who is Orson Kovats?,”Try a web search first(if internet access is available).
If it cannot retrieve the information, it willfrankly admit that it doesn't know, for example:

"I have not found any notable information related to 'Orson Kovats'."

This improvement meansmodern LLMs possess a certain degree of self-awareness, knowing when they should decline to answer.

How to alleviate LLM hallucinations?

How Meta addresses model hallucination issues in Llama 3

Meta refers tohallucinations (hallucinations)as"factuality errors" (factuality errors), and througha systematic approachenable Llama 3 to recognize the limits of its own knowledge, reducing the generation of incorrect information.

Problems with LLMs: models do not actively admit when they do not know something

language modelOften answers in a confident tone, even when it doesn't have knowledge about a particular question.
The internal neural network may already be aware of the uncertainty(such as the activation pattern of some neurons suggesting that it is "uncertain").
But the problem is：The training method of the model does not allow it toexpress directly"I don't know."
Result:The model will fabricate an answer that "looks plausible"instead of frankly admitting that it doesn't know the answer.

Meta's solution: training models to recognize their own knowledge boundaries

Meta adopted a"self-review"(self-interrogation) method to determine Llama 3'sWhich information does it truly "know", and which information does it "not know"?, and adjust the training data.

Specific process

(1) Extract facts from the dataset

From the datasetRandomly select a document, extract paragraphs from it.
Use LLM (such as ChatGPT)to generate factual questions around the paragraph。

(2) Have Llama 3 answer the questions

Ask these questions to Llama 3,check if it can answer correctly。
For example:

Question: "Which team did Dominique Cekic play for?"
Model's response: "Buffalo Sabres" ✅ (Correct)
Explanation about Llama 3is aware of this fact。

(3) Repeated testing to verify consistency

Repeatedly asking the same question, observe whether the model answers consistently each time.
If the model's responseis always correct, it indicates that ithas really mastered this knowledge point。
If the responseis inconsistent before and afteror incorrect, it indicates that ithas problems in understanding this fact。

(4) Have another LLM act as a "fact checker"

Useanother LLM(such as Mistral-7B) tocheck if the answers given by Llama 3 are correct。
This way,the accuracy of the answers can be automatically verified, without manual intervention.

Allow the model to learn to admit "I don't know"

When Llama 3Unable to provide the correct answerwhen Meta lets itlearn to respond with "I don't know"instead of making up answers arbitrarily.
Improvement methods：

Question: "How many times has Dominik Hasek won the Stanley Cup?"
Model's incorrect answer: "3 times" (incorrect)
Correct answer："I'm sorry, I don't know."

In the training data,add examples of "I don't know"so that the model can learn toreject answering questions it doesn't know.。
For example:

Further optimization: introducing tool use

Meta in Llama 3 processinghallucination issues(hallucinations) not only includeThe training model admits "I don't know"and further introducesTool Useso that the model can actively search for information when it doesn't know the answer, rather than simply refusing to answer.

How tool use works

Set up a new mechanism: allow the model to call search tools

When the model realizes it is uncertain, it does not just answer "I don't know," but insteadissues special search commands。

For example, the model will generate:

<search_start> "How many Stanley Cup championships has Dominic Kahun won?" <search_end>

This search command will be sent toBing, Google, or Wikipediaand other external knowledge bases.

Suspend model generation and perform a web query

The language model stops generating further text.
The search engine returns query results,These texts are placed in the model's "context window" (context window). Directly input into the neural network, rather than a vague memory.

For example:

<search_result> “Dominic Kek won 2 Stanley Cups” - Source: Wikipedia </search_result>

Now,The model can generate the correct answer based on real information。

The model integrates search information and provides an answer

Because ofThe search results are already in the context windowthe model canDirectly quote data：
```
"According to Wikipedia, Dominik Hasek won the Stanley Cup 2 times." (with citation)
```

Of course, search is just one tool; other tools can also be used to mitigate hallucinations.

LLM Psychology

In neural networks,Parameters store fuzzy knowledge memories, similar to long-term memory in the human brain, whilethe tokens in the context window are equivalent to short-term working memory. In other words, the model's parameters carry the information it learned during training, but this information may just befuzzy impressionsrather than clear facts. The data within the context windowis directly availablelike the information we have just received, can be referenced and processed immediately.

This mechanism has important implications for the use of LLMs. For example, in ChatGPT, if you directly ask it to summarize the first chapter of Pride and Prejudice, it might give a roughly reasonable response because it has seen a lot of relevant content in its training data. However, a better approach isto provide the complete text excerptFor example: "Please summarize Chapter 1 of Pride and Prejudice. The full text of the chapter is provided for reference." Then, paste the text into the input box. This way, the model does not have to rely on vague memory but candirectly read the text, ensuring higher accuracy and completeness.。

This phenomenon is similar to human cognitive processes.If we are asked to recall the beginning of a book, we might remember some main plots, but ifwe read it again first,Our summary will be more accurate and detailed. This also indicates that when LLM faces specific problems,providing clear contextual information is much more effective than relying on its own memory.This can significantly improve the quality of answers.