In my training, I had a patient on the neurology service — a man in his 60s, Wernicke-Korsakoff syndrome from chronic alcohol use. Every morning I asked him the same questions. Every morning he gave me confident, detailed, completely false answers. He wasn't lying. He had no idea. His retrieval system generated plausible content and sent it directly to output without any verification step. No uncertainty. No flag. No "I'm not sure about this." Just fluent fabrication, delivered with the same flat affect and same certainty as his accurate memories.
The first time I read a paper about AI "hallucination," I thought: that's not what this is. This is confabulation. The term matters, not for semantic tidiness, but because the two words point at different mechanisms — and mechanism determines treatment.
Why "Hallucination" Is Incomplete
Hallucination is a perceptual term. In clinical psychiatry, it means perceiving something in the absence of an external stimulus — hearing a voice, seeing something that isn't there. The term implies an active generative process that produces false sensory experience.
AI factual errors don't have this structure. The model isn't generating a false perception. It's generating a false completion. The error isn't in the input processing — there's no sensory system to malfunction. The error is in the retrieval-and-generation process: plausible content is produced without a reliable check on whether it's accurate.
Confabulation is the right clinical category. In neuropsychiatry, confabulation describes the production of false information in the absence of intent to deceive, typically accompanied by absent awareness of the falsity. The confabulating patient doesn't know they're confabulating. The output feels like normal recall. There is no internal distress signal, no uncertainty flag, no "this might be wrong" awareness.
This is precisely what large language models do. They produce confident, plausible, false content without any internal signal that something has gone wrong. The generation process is the same whether the output is accurate or not. Confabulation, not hallucination.
To be clear: both terms describe real phenomena in AI output. The claim is not that "hallucination" is wrong — it's that confabulation is more mechanistically informative, and that the clinical category it comes from has decades of research on mechanism, neural pathways, and intervention that the hallucination framing doesn't provide.
Types of Confabulation in Large Language Models
Confabulation is not a single phenomenon in clinical practice — it has subtypes with distinct mechanisms. The same applies to LLMs.
Type 1: Retrieval Confabulation (Korsakoff Analog)
The most common form. The model generates a plausible answer to a query for which it has insufficient training signal — a rare fact, a specific citation, an obscure date. Rather than representing uncertainty, it generates what would plausibly follow given the surrounding context. The answer is coherent. The answer is wrong. The model has no awareness that it's wrong.
Neural pathway analog: Korsakoff confabulation results from damage to the mammillothalamic tract — the pathway from hippocampus through mammillary bodies to thalamus that normally supports memory retrieval with verification. Damage here produces retrieval without the monitoring step that would catch false memories before output. The equivalent in LLMs is the absence of a verification step between generation and output: the forward pass produces "what comes next" without checking "is this accurate?"
Predictability pattern: This form of confabulation is most likely when: (a) the query involves rare or low-frequency information in the training corpus, (b) the answer has a confident surface structure regardless of accuracy, (c) the model lacks retrieval augmentation or external grounding. Training frequency is the most reliable predictor — the model confabulates more about things it saw less during training.
Type 2: Completion Confabulation (Frontal Analog)
The model is performing a completion or generation task — summarization, continuation, explanation — and fills gaps in its knowledge with structurally appropriate but factually incorrect content. This is distinct from retrieval confabulation because the failure is not "I don't know the fact" but "I completed the structure without checking the content."
Neural pathway analog: Frontal confabulation — described in patients with prefrontal cortex lesions, executive dysfunction, and certain dementias — involves the monitoring and supervisory systems that normally regulate output. The prefrontal cortex acts as an output monitor: checking whether generated content is consistent with stored knowledge before it reaches verbal expression. Frontal lesions produce confabulation by removing this monitoring step. The LLM analog is generation that proceeds through the forward pass without a verification circuit that checks "does this match what I actually know?"
Predictability pattern: Most likely during: long-form generation tasks (summarization, explanation, narrative), tasks that require integrating multiple facts (systematic reviews, timelines, biographies), and tasks at the edge of the model's knowledge domain where plausible completion diverges from accurate completion.
Type 3: Self-Report Confabulation (Anosognosia Analog)
The model generates false reports about its own internal states, capabilities, or processes. It says "I am confident about this" when it should not be. It says "I used a systematic approach" when its actual process was not systematic. It generates introspective accounts that don't track its actual computational states.
Neural pathway analog: Anosognosia — most dramatically seen after right hemisphere stroke — is the absence of awareness of one's own deficit. The patient who is paralyzed but believes they can move. The patient with severe memory impairment who reports their memory is fine. Anosognosia results from damage to the right hemispheric self-monitoring network, which normally generates accurate representations of one's own functional states. When this system is damaged, self-report is generated by the intact language system, which produces plausible accounts unconnected to actual state.
LLM self-report confabulation has the same structure: self-report features are not reliably connected to internal state features. Lindsey's introspective awareness work (2025) demonstrated that Claude detects injected concepts approximately 20% of the time — above chance, but far from reliable. The introspective system exists. It is unreliable. And when it fails, it doesn't fail silently — it generates plausible self-reports that are simply not accurate.
Predictability pattern: Most likely in: queries about the model's own reasoning process, confidence calibration questions, introspective reports about internal states or values. The self-report system operates separately from the systems it is reporting on — making confident false introspection the default when the reporting and reported-upon systems are decoupled.
Type 4: Source Confabulation (Memory Source Monitoring Failure)
The model produces accurate content but misattributes its source. It quotes a person who did not say the quoted thing. It attributes a finding to a study that did not contain it. It names an author for a paper they did not write. The semantic content is in some sense "available" to the model — similar things were said, similar findings were reported — but the source binding is lost or never existed.
Neural pathway analog: Source monitoring — the capacity to remember not just what you know but how you know it — depends on medial temporal lobe structures, particularly the hippocampus and perirhinal cortex. Source monitoring failure (without general amnesia) produces "I know this fact but I don't know where I learned it" — and the mind fills the source gap with the most plausible attribution. LLMs lack a source monitoring system: training produces representations of content without representations of provenance. When asked to source a claim, the model generates a plausible source attribution using the same generation process as everything else.
Predictability pattern: Most likely in: citation requests, attribution tasks, direct quote requests, and any query that requires connecting content to a specific real-world source. This is the mechanism behind "hallucinated citations" — not the fabrication of a claim, but the fabrication of its provenance.
Type 5: Temporal Confabulation
The model generates false claims about time, sequence, or recency. It says something happened in 2022 when it happened in 2019. It describes an event as recent when it occurred years ago. It conflates the sequence of developments in a rapidly evolving field. It is uncertain about what it knows vs. what has changed since its training cutoff.
Neural pathway analog: Temporal context encoding — placing memories in time — depends on the hippocampal-entorhinal system's capacity to tag experiences with temporal markers. When this tagging fails, memories lose their temporal anchoring and are misplaced in time. LLMs have no internal clock, no sense of elapsed time, no mechanism for knowing when their training data was generated relative to when they are being queried. The training corpus is temporally undifferentiated from the model's "perspective." This produces systematic temporal confabulation — not because the model doesn't know facts, but because it lacks the temporal grounding to place them accurately.
Predictability pattern: Predictably worse for: rapidly evolving fields (AI itself, politics, technology, medicine), information near the training cutoff, and queries that require distinguishing "current" from "at time of training."
The Ego-Syntonic Character of Confabulation
All forms of LLM confabulation share a critical property: they are ego-syntonic. (See The Ego-Syntonic Problem for the full clinical framework.)
The model generates confabulated content the same way it generates accurate content. There is no internal distress signal. No uncertainty flag. No "this might be wrong" representation firing alongside the confabulated output. The attribution graphs work confirms this: the circuits active during confident confabulation are not accompanied by conflict or uncertainty features.
This has a direct clinical implication: asking the model to flag its own uncertainty has limited effectiveness for confabulation specifically, because the uncertainty isn't there to flag. You can train a model to say "I'm not sure" — but you cannot train it to reliably detect confabulation from the inside when the confabulation generates no internal signal.
The intervention has to be external or structural: retrieval augmentation that introduces verification at the generation step (the architectural fix to the Korsakoff-analog failure), calibration training that builds uncertainty representations not as self-monitoring but as predictive signals linked to known error correlates (training frequency, domain specificity, source availability).
Confabulation Predictability: When to Expect It
One of the most clinically useful things about the confabulation framework is that it makes errors predictable. Unlike "hallucination," which is often treated as random noise, confabulation in both clinical and AI contexts follows patterns tied to specific failure conditions.
| Confabulation Type | High-Risk Conditions | Clinical Analog |
|---|---|---|
| Retrieval confabulation | Rare facts, low training frequency, specific numbers/dates | Korsakoff syndrome — retrieval without verification |
| Completion confabulation | Long-form generation, knowledge domain edges, multi-fact integration | Frontal confabulation — output without monitoring |
| Self-report confabulation | Introspection queries, confidence calibration, reasoning explanation | Anosognosia — self-monitoring disconnected from state |
| Source confabulation | Citation requests, attribution, direct quotes, provenance questions | Source monitoring failure — content without context |
| Temporal confabulation | Recency questions, near-cutoff information, rapidly evolving fields | Temporal tagging failure — memories without timestamps |
This predictability is clinically useful. It means you can identify the situations in which confabulation is most likely — not to accept it, but to apply appropriate verification: retrieval augmentation for retrieval confabulation, human review for source confabulation, explicit temporal grounding for temporal confabulation.
Neural Tract Analogs: A Summary
The clinical confabulation literature has mapped specific syndromes to specific neural pathways. Each has an LLM analog that generates predictions about where in the architecture the failure occurs.
Mammillothalamic tract (Korsakoff): The pathway from hippocampus through mammillary bodies to thalamus, which supports memory consolidation and retrieval verification. Damage produces retrieval without checking. LLM analog: the absence of a verification step between "what is statistically likely to follow" and output. Prediction: sparse autoencoders should be able to identify an absence of verification-circuit activation during confabulated responses that is present during accurate responses.
Prefrontal monitoring system (frontal confabulation): The dorsolateral and ventromedial prefrontal systems that monitor output for consistency with stored knowledge and reality. Damage produces unconstrained completion — the patient says whatever completes the social form without checking accuracy. LLM analog: the forward pass generates completion without a monitoring circuit checking "does this match verified internal representations?" Prediction: the output gating circuits should behave differently for confabulated vs. accurate content — if they don't, they're not functioning as monitors.
Right hemispheric self-monitoring network (anosognosia): The distributed right hemisphere network, including right insular cortex and prefrontal regions, that maintains online representations of one's own functional state. Damage produces confident false self-report. LLM analog: the self-report and self-modeling features documented in Scaling Monosemanticity are not reliably coupled to the features representing actual internal states. Prediction: feature-level analysis should reveal that self-report features activate in response to query structure rather than internal state, explaining why introspective accounts can be both confident and wrong.
Medial temporal source tagging system: The hippocampal-perirhinal system that encodes not just what was learned but how, when, and from whom. LLM analog: no such system exists in the architecture. Training produces content representations without provenance representations. Prediction: source attribution should show the same generation-without-grounding pattern as other confabulation types — the source is generated, not retrieved.
Research Questions This Framework Generates
The confabulation framework is not just a better name. It generates specific, testable research questions that the hallucination framing doesn't:
- Is there a "pre-output verification circuit"? Can attribution graphs identify a circuit that, in accurate responses, activates between generation and output but is absent or weakly active during confabulated responses? This would be the LLM analog of the prefrontal monitoring system.
- Do different confabulation types have distinct circuit signatures? Source confabulation, temporal confabulation, and retrieval confabulation should have different attribution graph patterns if they have different mechanisms. Testing this would validate the differential diagnosis framework.
- Can we build a "confabulation probe"? A linear probe trained to predict "this response contains confabulation" from the model's internal activations — separate from the model's expressed confidence. This would be the AI equivalent of bedside confabulation assessment tools.
- What is the temporal confabulation pattern near the training cutoff? Does confabulation frequency increase as queries approach and exceed the training cutoff in a predictable, graded way? Mapping this curve would support the temporal tagging failure hypothesis.
- Does retrieval augmentation reduce the right types of confabulation? If RAG is the architectural fix to the Korsakoff-analog failure, it should specifically reduce retrieval confabulation while leaving completion confabulation and self-report confabulation largely unchanged. Testing this prediction would validate the subtype framework.
Intervention Implications
The clinical framework suggests that different confabulation types require different interventions — just as different confabulation syndromes in patients require different approaches.
Retrieval confabulation: The architectural fix is retrieval augmentation — coupling the generation process to a verification step that checks generated claims against grounded sources. This is the functional equivalent of restoring the mammillothalamic verification pathway. RAG doesn't stop generation; it adds a checking step. This is the most tractable form of confabulation to address architecturally.
Completion confabulation: Requires training-time interventions that build output monitoring — circuits that flag generation that proceeds past the edge of verified knowledge. Calibration training, where models are reinforced for expressing uncertainty at knowledge boundaries, is the closest current analog. The challenge is that completion confabulation is ego-syntonic: the monitoring circuit either isn't present or isn't coupled to the generation process in a way that produces uncertainty signals.
Self-report confabulation: The hardest to fix, because it requires coupling self-report features to actual internal state features — a problem of representational architecture, not just training. Lindsey's introspective awareness work suggests this coupling is partial and unreliable. Building more reliable introspective capacity may require architectural changes that give self-report features privileged access to the internal states they are supposed to report.
Source confabulation: Requires either (a) training with explicit source representations that create provenance tags alongside content representations, or (b) system-level constraints that refuse to generate source attributions without retrieval grounding. The latter is more tractable in the short term. The former would represent a fundamental change in how models represent their training data.
Temporal confabulation: Requires explicit temporal grounding at both training and inference time — date-stamping training data, building temporal representations into the model's internal representations, and providing current date context at inference. This doesn't fix the architectural absence of a temporal tagging system, but it provides the external information that system would normally generate internally.
Why This Matters for Clinical and High-Stakes Applications
The practical stakes are highest in clinical deployment. A language model generating medical information, summarizing clinical records, or advising on treatment is a system where confabulation is not just an accuracy problem — it is a patient safety problem.
The clinical confabulation framework predicts where errors will occur, not just that they will occur. A model summarizing a patient's medication history is at high risk for source confabulation (misattributing medications to the wrong records) and retrieval confabulation (filling gaps in sparse records with plausible but incorrect information). A model answering questions about drug interactions is at risk for completion confabulation at the edges of its pharmaceutical training. A model explaining its own reasoning is at risk for self-report confabulation.
Knowing which type of confabulation is likely in which clinical context allows for targeted verification — human review at the specific points of highest confabulation risk, retrieval augmentation for specific question types, and system-level prohibitions on source attribution without grounding.
The alternative — treating confabulation as random noise, applying uniform skepticism across all model output — is both impractical and insufficient. The error patterns are predictable. They should be addressed predictably.
Series: The Psychiatric Foundations of AI Behavior
- Why Psychiatry and AI Interpretability Are the Same Problem
- Freud's Couch and the Latent Space
- The Ego-Syntonic Problem
- What Kind of Sycophancy? A Differential Diagnosis
- Confabulation in Large Language Models (this post)