In the 1970s, Sarah Lichtenstein and Baruch Fischhoff ran a series of experiments that would reshape how psychologists think about human judgment. They gave participants questions drawn from almanacs: Which is longer, the Panama Canal or the Suez Canal? In which year was Mozart born? Who had more troops at the Battle of Hastings? For each question, participants chose an answer and then stated how confident they were — as a probability — that their answer was correct. The design was elegant because it allowed a direct comparison between felt certainty and actual accuracy. If a person says they are 90 percent confident and answers 100 such questions, they should get approximately 90 right if their confidence is well-calibrated. The results, synthesized in Lichtenstein, Fischhoff, and Phillips's landmark 1982 chapter, showed something systematic and unsettling: people expressing 90 percent confidence were correct only about 75 percent of the time. Those claiming 100 percent certainty — absolute conviction — were wrong roughly 20 percent of the time. The gap between felt certainty and actual accuracy was not noise. It was a structural feature of how minds assign confidence to their own beliefs.

A parallel finding, reported by Marc Alpert and Howard Raiffa in the same 1982 volume, made the point even more sharply. Rather than asking for a single confidence level, Alpert and Raiffa asked participants to provide 90 percent confidence intervals — ranges within which they were 90 percent certain the true answer fell. A well-calibrated person's interval should exclude the true answer only 10 percent of the time. In Alpert and Raiffa's data, the true answer fell outside the stated interval approximately 40 to 50 percent of the time. Participants' so-called 90 percent intervals actually captured the truth roughly half the time. This was not overconfidence in the casual sense of people being boastful or arrogant. It was a precise, measurable, replicable error: people's confidence intervals were far too narrow, their certainty far too compressed, their representations of their own ignorance systematically inadequate. The finding replicated in sample after sample, domain after domain, population after population. It has since been called one of the most robust results in cognitive psychology.

These calibration studies opened a research program that would eventually span decades, continents, and disciplines — from the trading floors of investment banks to the operating rooms of hospitals to the forecasting offices of intelligence agencies. The overconfidence effect, it turned out, was not a curiosity about trivia questions. It was a window into a fundamental feature of the human mind: the consistent, predictable tendency to treat one's beliefs as more reliable than the evidence warrants. Understanding that tendency — its forms, its mechanisms, its limits, and its practical consequences — is among the more important tasks in behavioral science.

"People are overconfident in their own answers to factual questions — they believe they are correct far more often than they actually are." — Baruch Fischhoff, Paul Slovic & Sarah Lichtenstein, 1977


Three Distinct Forms: A Taxonomy

For decades, researchers used the word "overconfidence" to describe several phenomena that, while related, are conceptually and mechanistically distinct. Don Moore and Paul Healy's 2008 paper "The Trouble with Overconfidence," published in Psychological Review (Vol. 115, No. 2), provided the field with its most important clarification: overconfidence is not one thing. It is at least three things, each with different causes, different empirical signatures, and different practical consequences. Conflating them had produced apparent contradictions in the literature — situations where the same manipulation seemed to increase one form of overconfidence while decreasing another.

Dimension Overprecision Overplacement Overestimation
Definition Excessive confidence in the accuracy of one's beliefs; treating one's estimates as more tightly constrained around the truth than they actually are Believing one performs or rates above average relative to peers; the "better-than-average" effect Overestimating one's absolute level of performance, ability, or knowledge, independent of comparison to others
How measured Calibration curves; confidence interval width vs. true capture rate; subjective vs. objective probability scores Self-other comparative ratings on traits, skills, or performance; percentage claiming above-average ability Predicted vs. actual scores; predicted vs. actual task performance
Illustrative example An engineer's 90% confidence interval for construction cost is correct only 55% of the time 93% of US drivers rate themselves as above-average in safety (Svenson 1981) A student predicts a score of 80 on an exam and scores 62
Domains where most pronounced Financial forecasting, medical diagnosis, engineering cost estimation, geopolitical prediction Easy, self-relevant, socially valued tasks; skills with ambiguous criteria Tasks of moderate difficulty with delayed or absent feedback
Key complication Persists even when people know a task is difficult; robust across expertise levels Reverses on tasks people perceive as hard or rare — people claim to be below average at tasks most others also fail Decreases and reverses on extremely difficult tasks — people underestimate on hard problems (the hard-easy effect)
Core references Alpert & Raiffa 1982; Lichtenstein et al. 1982 Svenson 1981; Alicke et al. 1995 Moore & Healy 2008; Kruger & Dunning 1999

The asymmetry between overestimation and overplacement is theoretically important and frequently misunderstood. On extremely difficult tasks, people tend to underestimate their absolute performance (they think they did worse than they did), while simultaneously underplacing themselves relative to others (they think others did even worse). On easy tasks, the pattern reverses: people overestimate their absolute performance and overplace themselves relative to peers. Moore and Healy's signal-detection model explains this pattern: when a task is hard, everyone's performance is noisy and clustered toward the bottom; a person with access only to their own noisy signal has no reason to believe their score is particularly distinguishable from the group's, so they may guess below average for themselves — yet their score is actually better than the group average because the group was also performing poorly. Overprecision, by contrast, shows no such reversal. It is the most robustly present form of overconfidence across difficulty levels.


The Cognitive Science of Excessive Certainty

The experimental investigation of overconfidence accelerated through the 1970s and 1980s as part of the broader heuristics-and-biases research program initiated by Amos Tversky and Daniel Kahneman. Their 1974 paper in Science, "Judgment Under Uncertainty: Heuristics and Biases," had established the conceptual framework: the mind uses efficient cognitive shortcuts that are often adaptive but that introduce systematic, predictable errors. Overconfidence fit naturally into this framework as a form of anchoring — people anchor on their favored hypothesis and adjust insufficiently toward uncertainty.

Lichtenstein, Fischhoff, and Phillips's 1982 calibration synthesis, published as a chapter in the foundational Kahneman, Slovic, and Tversky volume Judgment Under Uncertainty: Heuristics and Biases (Cambridge University Press), documented the empirical landscape of the calibration gap: the consistent pattern whereby expressed confidence exceeds observed accuracy, with the gap largest when confidence is highest. Their work identified several structural features of the effect. First, it generalizes across question domains but is not domain-invariant — questions involving common knowledge showed smaller gaps than questions involving specialized or obscure facts. Second, it is resistant to simple instruction: telling people about overconfidence before they complete calibration tasks reduces the gap modestly but does not eliminate it. Third, it is not fully explained by the difficulty of questions — even on questions where the subject genuinely knows the domain, overconfidence in confidence intervals persists.

Baruch Fischhoff, writing in Journal of Experimental Psychology: Human Perception and Performance in 1977, documented another mechanism contributing to overconfidence: hindsight contamination of foresight. Once people learn the outcome of an event, they revise upward their belief that they would have predicted it — and this retrospective distortion bleeds into their calibration of future predictions. If you always remember yourself as having been more right than you were, you have no accurate data on which to base your current confidence estimates. The felt sense of knowing, which drives confidence ratings, is partly constructed from these corrupted retrospective memories.

Fischhoff also documented the "knew-it-all-along" effect in 1975 in Journal of Experimental Psychology: Human Perception and Performance, showing that outcome knowledge dramatically changed participants' reported predictions of those outcomes — a finding that made clear that the feedback loops people rely on to calibrate their own judgment are systematically distorted. Overconfidence is not simply a failure to consult evidence; it is partly a failure of the evidence-storage system that feeds back into calibration.

The underlying cognitive mechanism most directly implicated in overprecision is insufficient adjustment from an anchor. When a person generates an answer to a factual question, they begin from their best estimate and should theoretically expand outward into uncertainty. The research suggests people expand too little — their uncertainty bounds are too tight relative to their actual accuracy. This is consistent with Tversky and Kahneman's anchoring-and-adjustment work, which showed that adjustments from any initial anchor are systematically insufficient.


Intellectual Lineage

The overconfidence literature draws from at least three distinct intellectual traditions that converged during the latter half of the twentieth century.

The first is the calibration and subjective probability tradition, which traces to the work of statistician Leonard Jimmy Savage on subjective expected utility (1954) and the subsequent development of methods for eliciting and evaluating personal probability assessments. Savage's Bayesian framework implied that well-reasoned agents should have internally consistent and empirically calibrated probability beliefs. The subsequent discovery that human beings systematically violate this calibration requirement was, from this perspective, a refutation of a normative claim about rationality rather than simply an interesting psychological curiosity.

The second tradition is the psychological heuristics-and-biases program. Tversky and Kahneman's work at Hebrew University established the productive research paradigm within which most overconfidence research has been conducted. Their work influenced Lichtenstein, Fischhoff, and Phillips directly — Decision Research in Eugene, Oregon, where Lichtenstein and Fischhoff worked, was a major node in the heuristics-and-biases network. The framing of overconfidence as a cognitive bias rather than a motivational distortion (people are not confident because it feels good; they are confident because of how the mind processes information) is largely a product of this tradition.

The third tradition is the economics of forecasting and expertise. Researchers studying whether experts — economists, political scientists, physicians, weather forecasters — actually make accurate predictions developed measurement methodologies that allowed calibration questions to be posed in high-stakes, naturalistic contexts. This tradition runs from the early work on economic forecasting accuracy through Philip Tetlock's long-term study of political expert judgment, begun in 1987. Tetlock's work, synthesized in Expert Political Judgment (2005, Princeton University Press) and extended through the Good Judgment Project (described in Superforecasting, 2015, Crown), brought the calibration research program into contact with the practical world of intelligence analysis and geopolitical prediction. His collaborations with economists and forecasters outside the psychology department helped establish overconfidence as a multi-disciplinary problem with practical institutional consequences.

The behavioral economics tradition integrated all three, beginning with the foundational work of Kahneman and Tversky and extending through the applied work of scholars like Richard Thaler, who linked cognitive biases to market outcomes. Roll's 1986 "hubris hypothesis" paper in the Journal of Business brought overconfidence directly into the corporate finance literature. The subsequent work of Malmendier, Tate, and others transformed the hubris hypothesis from a theoretical proposal into an empirically demonstrated mechanism, connecting individual-level psychological research to aggregate economic outcomes.


Empirical Research

The Svenson driving study. Ola Svenson's 1981 paper "Are We All Less Risky and More Skillful Than Our Fellow Drivers?" in Acta Psychologica (Vol. 47, No. 2) asked samples of American and Swedish drivers to rate their driving skill and safety relative to other drivers in the study. Approximately 93 percent of Americans and 69 percent of Swedes rated themselves as above the median. Since exactly 50 percent of any population is above the median by definition, the overplacement was stark, large, and consistent across cultures — though notably less extreme among Swedes, a finding that has generated subsequent cross-cultural research on the relationship between individualism and overplacement.

The Dunning-Kruger findings. David Kruger and Justin Dunning's 1999 paper "Unskilled and Unaware of It: How Difficulties in Recognizing One's Own Incompetence Lead to Inflated Self-Assessments," published in the Journal of Personality and Social Psychology (Vol. 77, No. 6), documented a specific pattern of overplacement: individuals in the bottom quartile of performance on logical reasoning, grammar, and humor-judgment tasks estimated their performance as well above the median. The mechanism proposed was metacognitive: the skills required to perform well on a task are the same skills required to evaluate whether one is performing well. People who lack the skills to succeed also lack the skills to recognize they are failing. High performers, by contrast, tended to slightly underestimate their relative standing — they assumed others performed similarly to themselves. The Dunning-Kruger effect has since been critiqued (primarily on statistical grounds, by Gignac and Zajenkowski in 2020 in Intelligence), but the general pattern of disproportionate overplacement among low performers survives the critiques in a less extreme form.

Tetlock's forecasting tournament. Philip Tetlock collected approximately 82,361 probability forecasts from 284 political experts between 1987 and 2003. The experts — political scientists, economists, area specialists — were asked to assign probabilities to specific geopolitical outcomes. Experts expressed substantial confidence in their predictions. As a group, they performed barely better than chance on medium-to-long-range predictions, and significantly worse than simple statistical extrapolation rules. The confidence-accuracy gap was systematic. Tetlock's follow-up work through the IARPA-funded Good Judgment Project demonstrated that carefully selected and trained "superforecasters" — non-experts with good calibration habits — could outperform professional intelligence analysts, and that team aggregation of forecasts substantially improved calibration further.

Camerer and Lovallo's excess entry experiment. Colin Camerer and Dan Lovallo's 1999 paper "Overconfidence and Excess Entry: An Experimental Approach," published in the American Economic Review (Vol. 89, No. 1), examined business entry decisions in a controlled laboratory setting. Subjects were told that payoffs in a competitive market depended on relative skill rankings and that only the top performers would earn profits; those outside the top tier would lose their entry fee. When entry decisions required skill-based performance, participants systematically over-entered — more than the profit-maximizing number entered the market, and returns were consequently negative for most. Crucially, when entry was random rather than skill-based, excess entry disappeared. The overconfidence about relative skill (overplacement) was driving the economic irrationality. Camerer and Lovallo linked this finding to the real-world observation that new businesses fail at rates substantially higher than their founders predict, and that this failure rate has been relatively stable despite widespread availability of information about base rates of business failure.

Malmendier and Tate on CEO overconfidence. Ulrike Malmendier and Geoffrey Tate operationalized CEO overconfidence using stock option exercise behavior — specifically, whether CEOs held options to expiration rather than exercising them when standard financial theory predicted exercise. This revealed-preference measure captured genuine, consequential overconfidence in the firm's future performance rather than survey self-reports. Their 2005 paper in the Journal of Finance (Vol. 60, No. 6) showed overconfident CEOs overinvested when their firms had abundant internal cash. Their 2008 paper in the Journal of Financial Economics (Vol. 89, No. 1) showed overconfident CEOs made more acquisitions, at higher premiums, generating more negative market reactions at announcement. The effect on value creation was economically substantial and robust to controls for endogeneity.

Noise traders and financial markets. J. Bradford De Long, Andrei Shleifer, Lawrence Summers, and Robert Waldmann's 1991 paper "The Survival of Noise Traders in Financial Markets," published in the Journal of Business (Vol. 64, No. 1), provided a theoretical framework within which overconfident traders — called "noise traders" — not only survive in competitive markets but can actually earn higher expected returns than rational traders under certain conditions. Overconfident traders take more risk than is optimal given their information; in markets where risk-bearing is rewarded, they are sometimes compensated for that excessive risk. This model explained why natural selection within financial markets does not necessarily eliminate overconfident behavior — the overconfident sometimes prosper precisely because their risk tolerance generates exposure to positive outcomes that rational actors would have avoided.


Limits, Critiques, and Nuances

The overconfidence literature, while extensive and well-replicated, has faced sustained and important criticism. The most consequential critique comes from Gerd Gigerenzer and colleagues, who have argued across a series of papers — including Gigerenzer, Hoffrage, and Kleinbolting's 1991 paper in Psychological Review (Vol. 98, No. 4) and Gigerenzer's 2002 book Calculated Risks (Simon and Schuster) — that the apparent overconfidence in calibration studies is an artifact of how the tasks are designed and analyzed, not a robust feature of human judgment.

Gigerenzer's central argument involves what he calls "ecological rationality." When calibration tasks are constructed using random samples of questions from a reference class, people are often well-calibrated. The overconfidence apparent in almanac-style studies arises partly because the questions are selected to be difficult or surprising — they are not representative samples of the environment in which people actually form their beliefs. More importantly, Gigerenzer showed that when probability questions are expressed in natural frequency formats rather than as percentages or probabilities, apparent overconfidence substantially decreases or disappears. People who cannot correctly reason about "a 15 percent probability of X" often reason correctly about "15 out of 100 similar cases result in X." If the overconfidence effect were a deep feature of human cognition, it should not be this sensitive to surface format. The frequency-format finding has been extensively replicated and constitutes a genuine challenge to strong versions of the overconfidence hypothesis.

Gigerenzer's ecological rationality framework argues that what looks like bias in laboratory tasks often reflects cognition adapted to the statistical structure of the natural environment. Human confidence judgments may be well-calibrated to the environments in which they evolved and typically operate, and appear miscalibrated only in the artificial reference classes constructed by psychologists. This is not a minor methodological quibble; it is a fundamentally different theoretical picture of what overconfidence is and where it lives.

The hard-easy effect constitutes a related complication. Griffin and Tversky's 1992 paper in Cognitive Psychology documented that overconfidence is strong on difficult tasks and reduces or reverses on easy tasks — people can be underconfident on easy tasks, correctly anticipating that they will perform well but overestimating how poorly others will do. This means "overconfidence" as a summary label is too coarse: the effect is substantially modulated by task difficulty, and an intervention that reduces overconfidence on hard tasks might actually worsen underconfidence on easy tasks.

Arkes's 2001 paper "Overconfidence in Judgmental Forecasting," published in the International Journal of Forecasting (Vol. 17, No. 4), synthesized the debiasing literature and reached sobering conclusions. Procedural debiasing — telling people about overconfidence, asking them to consider alternative hypotheses — works modestly in laboratory settings and poorly outside them. The most effective interventions are structural rather than psychological: reference class forecasting (Kahneman and Tversky; elaborated by Bent Flyvbjerg for infrastructure projects) bypasses individual overconfidence by anchoring predictions on the empirical distribution of similar past cases. The premortem technique, attributed to Gary Klein and popularized by Kahneman, asks decision-makers to assume a project has already failed and to generate reasons why — reversing the cognitive dynamic in which overconfidence suppresses consideration of failure modes. Both techniques work not by fixing the mind but by changing the information environment in which the mind operates.

The question of when confidence is epistemically warranted — not a bias but an accurate representation of genuine expertise — is addressed by John Jost and colleagues in work on system justification and by philosophers working on the epistemics of expertise. Expert weather forecasters are well-calibrated. Experienced clinical psychologists assessing violence risk from structured instruments are reasonably well-calibrated. The overconfidence effect is not universal; it is contingent on the availability of rapid, unambiguous feedback, the representative sampling of experience, and the complexity of the domain. This specificity is important: overconfidence is not the permanent condition of all human judgment, but the characteristic error of judgment in domains where feedback is slow, ambiguous, or absent.


References

  1. Alpert, M., & Raiffa, H. (1982). A progress report on the training of probability assessors. In D. Kahneman, P. Slovic, & A. Tversky (Eds.), Judgment under uncertainty: Heuristics and biases (pp. 294-305). Cambridge University Press.

  2. Arkes, H. R. (2001). Overconfidence in judgmental forecasting. In J. S. Armstrong (Ed.), Principles of Forecasting: A Handbook for Researchers and Practitioners (pp. 495-515). Kluwer Academic.

  3. Camerer, C., & Lovallo, D. (1999). Overconfidence and excess entry: An experimental approach. American Economic Review, 89(1), 306-318.

  4. De Long, J. B., Shleifer, A., Summers, L. H., & Waldmann, R. J. (1991). The survival of noise traders in financial markets. Journal of Business, 64(1), 1-19.

  5. Gigerenzer, G., Hoffrage, U., & Kleinbolting, H. (1991). Probabilistic mental models: A Brunswikian theory of confidence. Psychological Review, 98(4), 506-528.

  6. Griffin, D., & Tversky, A. (1992). The weighing of evidence and the determinants of confidence. Cognitive Psychology, 24(3), 411-435.

  7. Kruger, J., & Dunning, D. (1999). Unskilled and unaware of it: How difficulties in recognizing one's own incompetence lead to inflated self-assessments. Journal of Personality and Social Psychology, 77(6), 1121-1134.

  8. Lichtenstein, S., Fischhoff, B., & Phillips, L. D. (1982). Calibration of probabilities: The state of the art to 1980. In D. Kahneman, P. Slovic, & A. Tversky (Eds.), Judgment under uncertainty: Heuristics and biases (pp. 306-334). Cambridge University Press.

  9. Malmendier, U., & Tate, G. (2005). CEO overconfidence and corporate investment. Journal of Finance, 60(6), 2661-2700.

  10. Malmendier, U., & Tate, G. (2008). Who makes acquisitions? CEO overconfidence and the market's reaction. Journal of Financial Economics, 89(1), 20-43.

  11. Moore, D. A., & Healy, P. J. (2008). The trouble with overconfidence. Psychological Review, 115(2), 502-517.

  12. Svenson, O. (1981). Are we all less risky and more skillful than our fellow drivers? Acta Psychologica, 47(2), 143-148.

  13. Tetlock, P. E. (2005). Expert political judgment: How good is it? How can we know? Princeton University Press.

Frequently Asked Questions

What is the overconfidence effect?

The overconfidence effect is the systematic tendency for people's confidence in their judgments to exceed the accuracy of those judgments. In calibration research, subjects who say they are 90% confident in an answer are correct only about 70-75% of the time. Lichtenstein, Fischhoff and Phillips (1982) found that confidence intervals people believed would contain the true answer 98% of the time actually captured it only about 60% of the time. Moore and Healy (2008) distinguished three distinct types: overestimation (believing you perform better than you do), overplacement (believing you outperform others more than you do), and overprecision (being too confident in the accuracy of your beliefs).

How is the overconfidence effect different from the Dunning-Kruger effect?

The Dunning-Kruger effect specifically concerns low-skill performers who lack the metacognitive capacity to recognize their incompetence. The overconfidence effect is broader and more universal — it applies across the skill distribution, including to experts and high performers. LTCM's Nobel laureates were not unskilled; they were overconfident in the completeness and accuracy of their models. Dunning-Kruger is about incompetence failing to recognize itself; the overconfidence effect is about competent people systematically misjudging the precision of their knowledge.

What happened to Long-Term Capital Management?

Long-Term Capital Management (LTCM), founded in 1994 by John Meriwether and staffed by Nobel laureates Myron Scholes and Robert Merton, employed sophisticated quantitative models that had produced extraordinary returns. Their models estimated the probability of catastrophic loss as effectively zero based on historical data. The 1998 Russian debt crisis triggered a cascade of correlated failures the models had not considered possible. LTCM lost \(4.6 billion in under four months and required a \)3.6 billion emergency bailout coordinated by the Federal Reserve. The mechanism of failure was precisely their confidence in models that systematically excluded tail risks.

Who is well-calibrated and who is poorly calibrated?

Calibration research consistently finds that weather forecasters and bridge players are among the best-calibrated expert groups — both receive immediate, specific, unambiguous feedback on their predictions. Physicians, lawyers, financial analysts, and political forecasters tend to be poorly calibrated, showing systematic overconfidence. Philip Tetlock's forecasting tournaments found that most expert political forecasters performed only marginally better than chance. His 'superforecasters' — selected for probabilistic thinking and calibration — significantly outperformed intelligence analysts with classified information.

Is overconfidence ever adaptive?

Johnson and Fowler (2011) in Nature argued that overconfidence can be evolutionarily stable when the benefits of boldness in competition outweigh the costs of misjudging one's chances. Anderson et al. (2012) found that overconfident individuals attained higher social status in groups, independent of actual competence — suggesting overconfidence signals dominance that others respond to. The bias becomes maladaptive when it leads to inadequate preparation for downside scenarios, failure to seek disconfirming information, or dismissal of expert warning systems.