In 2023, a New York lawyer named Steven Schwartz submitted a legal brief citing six relevant court cases. The cases had plausible names, realistic docket numbers, and detailed factual summaries. None of them existed. His colleague had used ChatGPT to research the brief and had not verified the citations. The attorney was sanctioned by the federal court. The cases were, in the vocabulary of AI research, hallucinations.

The incident drew widespread attention because it illustrated something that users of AI language models had been discovering more quietly: these systems do not just sometimes get things wrong. They fabricate information confidently, fluently, and in formats specifically designed to seem credible. A hallucinated citation looks exactly like a real citation. A hallucinated statistic arrives in the same confident sentence structure as an accurate one. There is no internal signal that distinguishes the invented from the retrieved.

Understanding why this happens requires understanding what language models actually are — and what they are not.

"These models do not 'know' facts in any meaningful sense. They generate text that is statistically consistent with their training data. When the training data contained many cases of a pattern, the model reproduces it reliably. When it did not, the model generates something that looks like it should be right — which is sometimes exactly right and sometimes entirely fabricated." — Yann LeCun, various public statements (2022–2024)

The term "hallucination" is borrowed from human psychology and is imprecise — these are not perceptual failures. The more technically accurate term is confabulation: generating plausible, confident, false information without any awareness that it is false. But hallucination has become the standard term in the field, and it captures something important about the phenomenology: the model presents its inventions with the same quality of conviction as its accurate outputs.


Key Definitions

Hallucination (AI) — A confident, fluent output from a language model that is factually incorrect. The model produces information that appears authoritative but is not grounded in its training data or in factual reality. Hallucinations may include fabricated citations, invented statistics, false historical claims, or incorrect biographical information.

Confabulation — The psychological term for producing false memories or false information without intention to deceive, and without awareness of the falseness. Originally used to describe a symptom in patients with certain neurological conditions, confabulation is technically more accurate than "hallucination" for describing AI factual errors.

Token prediction — The fundamental operation of a language model: given the preceding context, predict the probability of each possible next token. Language models are trained to minimize the loss on this prediction task across a large corpus of text. They are not trained to retrieve verified facts — they learn statistical patterns over text and generate continuations consistent with those patterns.

Grounding — The property of a language model output being supported by specific, retrievable source material rather than generated from statistical inference alone. Retrieval Augmented Generation (RAG) attempts to ground model outputs by providing relevant source documents at generation time.

Retrieval Augmented Generation (RAG) — A technique for reducing hallucinations by augmenting the language model with a retrieval system. When a query is received, a retriever fetches relevant documents from a knowledge base; the model then generates a response grounded in those documents. RAG significantly reduces hallucinations for factual queries within the knowledge base's scope.

Calibration — For a predictive model, the alignment between expressed confidence and actual accuracy. A well-calibrated model that says "I am 90% confident" is right approximately 90% of the time. Language models are poorly calibrated on factual queries: their expressed confidence does not reliably distinguish correct from hallucinated information.

Intrinsic hallucination — A hallucination that directly contradicts the model's input or training data — generating an output that conflicts with information explicitly present in the context.

Extrinsic hallucination — A hallucination that cannot be verified against the model's input or training data — generating information that is simply absent from verifiable sources, such as inventing a citation.

Knowledge cutoff — The date after which a language model's training data was not collected. The model has no knowledge of events after this date, and queries about recent events near or after the cutoff are especially prone to hallucination as the model generates plausible-sounding continuations with no training signal to constrain them.

Chain-of-thought prompting — A prompting technique in which the model is asked to show its reasoning step by step before producing an answer. Chain-of-thought reduces certain types of errors — particularly reasoning errors — by externalizing intermediate steps that can be checked, though it does not eliminate factual hallucinations.


Why Hallucinations Happen: The Core Mechanism

Language Models Are Not Databases

The fundamental source of hallucinations is the mismatch between what language models are and what users expect them to be. Users typically expect factual accuracy — that a statement produced by the model is a statement about something real. Language models are not information retrieval systems. They do not look things up. They generate text.

During training, a language model processes enormous quantities of text and learns the statistical patterns governing how that text is organized: what words typically follow what other words, in what contexts, with what structures. The model compresses these patterns into billions of numerical parameters.

When generating a response, the model produces tokens by sampling from the probability distributions it has learned. It produces what is statistically most probable given the input and the model's learned patterns — not what is factually correct. Most of the time, what is statistically probable is also factually correct, because the training data was mostly accurate. But the model has no mechanism for distinguishing the two.

"A language model has no more access to truth than a very well-read person who happens to be very good at writing. They can write convincingly about almost anything, regardless of whether what they write is accurate." — Emily Bender, Timnit Gebru, et al., On the Dangers of Stochastic Parrots (2021)

The Plausibility Pressure

When a model is asked a specific factual question — "What are the three most recent papers by [researcher X]?" — it generates a response that is consistent with what such a response would look like. This means: a numbered list, realistic paper titles in the domain of the researcher's work, plausible journal names, reasonable years. The model has learned the format and style of citation lists, and it generates text that fits that format.

The generation process has no internal alarm that fires when the specific content is invented. The model experiences no uncertainty at the level of the output — it simply produces what is most probable. If plausible-sounding citations are more probable given the context than a statement that the researcher's recent papers are unknown, it generates plausible-sounding citations.

This is why hallucinations are so often specifically formatted correctly. The model is not confused — it has successfully generated what was asked for, in the right format, with fluent language. It has failed only at the task the user was actually trying to accomplish: getting accurate information.


Patterns and Risk Factors

High-Hallucination Content Types

Empirical research and practical experience have identified several content categories that are disproportionately likely to contain hallucinations:

Citations and bibliographic information: Citations represent one of the highest-risk outputs. The model knows the format of a citation perfectly; it knows the domain of the researcher's work; it generates a citation-shaped output. Whether the specific paper exists is not constrained by the generation process. Studies have found hallucinated citation rates ranging from 20% to over 60% depending on the model and query type.

Specific numerical claims: Statistics, percentages, dates, quantities. When a model generates "73% of respondents said..." or "the project cost $2.4 billion," these numbers are generated from distributional patterns, not retrieved from a database. Specific numbers in specific contexts are difficult to verify without direct research.

Legal and regulatory specifics: Case citations, statute numbers, regulatory requirements, specific legal standards. Legal research conducted entirely through AI is particularly high-risk because the specific citations and standards are exactly what matters, and these are among the most reliably hallucinated outputs.

Medical and scientific specifics: Drug interaction data, clinical trial results, specific diagnostic criteria. The general patterns of medical language are well-represented in training data; the specific facts underlying any given clinical claim are much less reliable.

Recent events and the knowledge cutoff: Events near or after the training cutoff produce hallucinations at especially high rates, because the model is generating statistically plausible continuations with little constraining signal.

Low-Hallucination Content Types

Conversely, certain content types are much less prone to hallucination:

Widely-documented, frequently-occurring facts: The capital of France, the year World War II ended, the author of 1984. These facts appear so frequently in training data that the model has extremely strong statistical signals pointing to the correct answer.

Mathematical operations and formal reasoning: Addition, subtraction, logical deductions from stated premises. These tasks do not require factual recall; they require computation. Modern large models handle these reliably within certain complexity limits.

Rephrasing and summarization of provided content: When the model is given text to summarize or rephrase, it is working from provided input rather than generating from statistical inference. Hallucinations in summarization occur when models insert information not present in the source, but this is less frequent than hallucinations in knowledge retrieval.


Techniques for Reducing Hallucinations

Retrieval Augmented Generation (RAG)

RAG addresses the core problem: the model generating facts from statistical inference rather than from verified sources. In a RAG system, a retriever first searches a knowledge base for documents relevant to the query. These documents are included in the model's context window when generating the response. The model can then generate responses grounded in the provided sources.

RAG significantly reduces hallucinations for queries within the knowledge base's scope. The model still generates text — but now it is generating text that is constrained by specific provided documents rather than purely by statistical inference. Hallucinations within the supported domain drop substantially.

RAG does not eliminate hallucinations. The model may still misrepresent what the provided documents say, generate claims not supported by them, or produce hallucinations in domains not covered by the knowledge base.

Chain-of-Thought and Explicit Reasoning

Asking models to show their reasoning step-by-step reduces errors on reasoning tasks by making intermediate steps visible and checkable. It does not directly prevent factual hallucinations — the model may reason from a false premise with perfect logical validity — but it makes the reasoning auditable and often catches errors that would be invisible in direct answer generation.

Uncertainty and Hedging Prompts

Instructing models explicitly to say "I don't know" or "I am not certain" when they lack confident knowledge can shift the distribution of outputs toward expressing uncertainty for low-confidence claims. This does not fully solve the calibration problem — models are still poorly calibrated on which claims they should be uncertain about — but it can reduce the absolute confidence of hallucinated outputs.

Verification Against External Sources

For high-stakes applications, the most reliable approach is treating AI outputs as drafts that require verification against authoritative sources. Citations should be checked. Statistics should be traced to primary sources. Factual claims in important domains should be confirmed independently.

This is not a limitation that will necessarily be resolved by larger or better models — it is the appropriate epistemic stance for a technology that generates statistically plausible text.

"The correct mental model for language models is a very capable assistant who reads everything and remembers nothing exactly. They know the shape of knowledge. They do not reliably know the facts." — Andrej Karpathy, public statements (2023)


The Future of Hallucination Reduction

Research into hallucination reduction is one of the most active areas of AI development. Approaches including better calibration training, constitutional AI methods that teach models to distinguish confident from uncertain claims, improved retrieval integration, and external tool use (allowing models to make database queries and web searches rather than generating facts from memory) have all shown promise.

Progress is real but uneven. Modern frontier models hallucinate substantially less than their predecessors on standard benchmarks. But they have not been eliminated, and the remaining hallucinations tend to be in precisely the domains — specific facts, citations, regulatory details — where they cause the most harm.

For related concepts, see large language models explained, AI limitations and failure modes, and retrieval augmented generation explained.


References

  • Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., Ishii, E., Bang, Y., Madotto, A., & Fung, P. (2023). Survey of Hallucination in Natural Language Generation. ACM Computing Surveys, 55(12), 1–38. https://doi.org/10.1145/3571730
  • Maynez, J., Narayan, S., Bohnet, B., & McDonald, R. (2020). On Faithfulness and Factuality in Abstractive Summarization. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. https://arxiv.org/abs/2005.00661
  • Lewis, P., et al. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. Advances in Neural Information Processing Systems, 33. https://arxiv.org/abs/2005.11401
  • Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. https://doi.org/10.1145/3442188.3445922
  • Wei, J., et al. (2022). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. Advances in Neural Information Processing Systems, 35. https://arxiv.org/abs/2201.11903
  • Huang, L., Yu, W., Ma, W., Zhong, W., Feng, Z., Wang, H., Chen, Q., Peng, W., Feng, X., Qin, B., & Liu, T. (2023). A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions. arXiv preprint arXiv:2311.05232. https://arxiv.org/abs/2311.05232

Frequently Asked Questions

What is an AI hallucination?

An AI hallucination is a confident, fluent, plausible-sounding output from a language model that is factually incorrect. The model may invent citations, fabricate statistics, create non-existent historical events, or generate false biographical information — all while presenting the information with the same confidence and coherence as accurate outputs. The term is borrowed loosely from human psychology; more technically accurate terms include 'confabulation' or 'factual error.'

Why do AI language models hallucinate?

Language models are trained to predict the most probable next token given the preceding context — not to retrieve verified facts. They learn statistical patterns in text rather than storing verified facts in a database. When generating a response, the model produces whatever text is most consistent with the patterns it has learned, which may be a coherent, plausible continuation that happens to be false. The model has no internal fact-checker and no direct access to ground truth.

Are AI hallucinations random errors or systematic?

Both. Some hallucinations appear in specific, predictable patterns: rare or obscure facts are more likely to be hallucinated than common ones; requested formats (lists, citations) increase hallucination rates; specific domains (legal, medical, historical specifics) are more prone to hallucination; and questions about recent events post-training cutoff will produce hallucinations. Other hallucinations appear to be effectively random — the model produces a plausible-sounding falsehood without any obvious trigger.

Do AI models know when they are hallucinating?

Not reliably. Language models do not have direct access to their own uncertainty in the way that probability estimates might suggest. They may output uncertain-sounding hedges ('I believe' or 'I think') more frequently when they are less certain, but this is not a reliable indicator — models will frequently express confident certainty while hallucinating. Calibration of expressed uncertainty to actual accuracy varies significantly across models and question types.

How can you reduce AI hallucinations?

Several techniques reduce (but do not eliminate) hallucinations: Retrieval Augmented Generation (RAG) grounds the model's outputs in retrieved documents; chain-of-thought prompting reduces errors on reasoning tasks; asking models to cite sources and then verifying those citations; instructing the model to say 'I don't know' when uncertain rather than generating; using fine-tuned models trained on domain-specific verified data; and using automated fact-checking pipelines that verify claims against reliable sources.

What types of content are most likely to be hallucinated?

Citations and bibliographic information are among the most commonly hallucinated outputs — plausible-looking citations to papers, books, or articles that do not exist. Statistics, dates, specific biographical facts about less prominent individuals, legal cases, regulatory details, and recent events near or after the model's training cutoff are all high-risk categories. Well-known, widely-documented facts appear much less frequently in hallucinations than obscure or specific facts.