In the late 1960s, a psychologist named Daniel Kahneman was invited to address a group of Israeli Air Force flight instructors on the psychology of training and feedback. He had been explaining the well-established principle that reward works better than punishment — that positive reinforcement shapes behavior more reliably than criticism. The instructors listened with growing impatience.
Finally, one of the senior instructors broke in. He spoke with the quiet authority of a man who had watched hundreds of cadets learn to fly, who had seen with his own eyes what worked and what did not. "With respect," he said, in the account Kahneman later reconstructed in Thinking, Fast and Slow, "I've often praised cadets for clean executions of maneuvers, and the very next time they would do worse. On the other hand, I would criticize pilots after sloppy execution, and they would almost always improve the next time. So please don't tell me that reward works and punishment doesn't. My experience contradicts that completely."
The other instructors nodded. This was not one man's prejudice. This was the accumulated, hard-won professional wisdom of an entire corps.
Kahneman had a sudden flash of clarity. He realized, standing in that room, that the instructors were not wrong about what they had observed. They were wrong only about what had caused it. The pilots who had performed brilliantly on a given run were, statistically, somewhat lucky — their exceptional performance was partly skill, partly noise. On the next run, the noise would not be so favorable, and their performance would fall toward their true average. The pilots who had performed terribly were similarly unlucky. Their next performance would, on average, move toward the mean. Neither the praise nor the punishment had caused anything. The instructors were observing regression to the mean — the statistical tendency for extreme measurements to be followed by less extreme ones — and they were constructing a vivid, confident, entirely false causal story to explain it.
The consequences of that false story were not trivial. An entire training culture had been built around the belief that punishment was the superior instructional tool, that praise was counterproductive. Cadets were systematically criticized and rarely praised. And the data that "confirmed" this belief was itself generated by the very phenomenon the instructors failed to understand.
"Regression to the mean is the most basic statistical phenomenon, yet it is almost universally ignored — leading to systematic errors in understanding cause and effect." — Daniel Kahneman, 2011
What Regression to the Mean Actually Is
Regression to the mean is the statistical phenomenon whereby, if a variable is extreme on its first measurement, it will tend to be closer to the average on a subsequent measurement, and vice versa. It occurs whenever measurements are imperfect — that is, whenever observed performance is a combination of true underlying ability and random noise. Because noise fluctuates and true ability does not change instantly, extreme observations are more likely to have captured an unusually high or low noise component than an equally extreme shift in true ability.
This is not a law of physics. It is a mathematical inevitability, a consequence of correlation being less than perfect. When two measurements of the same thing are correlated but not perfectly correlated, extreme values on one measure will, on average, correspond to less extreme values on the other.
The phenomenon is frequently confused with related but distinct cognitive and statistical errors:
| Concept | Definition | When It Fires | Characteristic Error |
|---|---|---|---|
| Regression to the Mean | Extreme measurements statistically tend toward average on remeasurement | Whenever two measurements are imperfectly correlated | Attributing the natural pull toward average to an intervention |
| Gambler's Fallacy | Belief that past random events influence future independent random events | After streaks in truly independent sequences (coin flips, roulette) | Expecting a "correction" in memoryless systems that cannot have one |
| Hot Hand Fallacy | Belief that a person in a "hot streak" has elevated ability for future performance | During performance streaks in sports or skill tasks | Treating temporary noise as a stable elevated state of skill |
| Reversion to Mediocrity | Colloquial (often incorrect) synonym for regression to the mean | Often misapplied to mean that excellence cannot persist | Confusing statistical regression with a ceiling on achievement |
The gambler's fallacy and hot hand fallacy are, in a sense, opposites — one predicts future reversal where none should exist, the other predicts future continuation where the evidence does not support it. Regression to the mean is different from both: it is not a belief but a mathematical description, and its error is not in prediction but in causal attribution. We do not merely expect regression to the mean; we observe it constantly, everywhere. The error is in deciding what caused it.
The Cognitive Science of Why We Get This Wrong
The story of how humans came to understand regression to the mean is inseparable from the story of Francis Galton, the Victorian polymath who discovered it by accident while studying inheritance.
In 1886, Galton published a paper titled "Regression towards mediocrity in hereditary stature" in the Journal of the Anthropological Institute. He had been investigating the heights of parents and children across nearly a thousand families, and he noticed something that initially seemed paradoxical. Tall parents tended to have children who were shorter than they were. Short parents tended to have children who were taller. The children's heights were always pulled, as if by gravity, toward the population mean. Galton had noticed the same effect earlier in a study of sweet pea seeds: seeds selected for unusual size produced offspring closer to average size than their parents.
He called this "reversion" — later "regression" — toward the mean or toward mediocrity. He understood that it was not a biological force so much as a statistical artifact of imperfect transmission. Tall parents are tall partly because they have genes for height and partly because various developmental factors happened to run in their favor. Their children inherit the genes but not the luck. The result is a pull toward average.
Galton's discovery was formalized mathematically by his protege Karl Pearson, who developed the correlation coefficient — the very statistic that quantifies how strongly two measurements are related. The mathematics makes the logic airtight: if the correlation between two measurements is r, and a first measurement is z standard deviations above the mean, the expected value of the second measurement is r × z standard deviations above the mean. When r is less than 1, the expected second measurement is always closer to the mean. When r is 0 — pure noise — the best prediction for the second measurement is the mean itself, regardless of the first.
Why, then, do we not intuitively grasp this? The answer lies in what the psychologist Nassim Nicholas Taleb has called the narrative fallacy: the compulsive human tendency to construct causal stories from sequences of events. When we observe an event, we almost immediately reach for a cause. When the event is followed by a different kind of event, we reach for a cause of the change. The idea that the change might require no cause at all — that it was simply the expected statistical behavior of noisy data — does not feel like an explanation. It feels like a non-answer.
Kahneman and his longtime collaborator Amos Tversky documented this tendency in a series of landmark studies beginning in the early 1970s. Their work on heuristics and biases showed that humans are not natural statisticians. We are natural storytellers. We are exquisitely sensitive to sequences, patterns, and apparent causality. We are largely blind to base rates, variance, and the behavior of imperfectly correlated variables. Regression to the mean sits precisely in the blind spot: it is a statistical regularity that mimics the signature of causation without containing any.
The specific mechanism behind our misreading is causal attribution bias: we consistently attribute change to the immediately preceding event. If a student scores very high on a test, then lower on the next, we look for what changed — did they study less, were they distracted, did something happen at home? The correct answer — that extreme test scores contain more noise than typical ones, and that subsequent scores naturally drift toward the student's true ability — is available to us, but we do not reach for it because it does not feel like an explanation at all.
Named Historical Case Studies
Case Study 1: Kahneman's Israeli Air Force Instructors (Late 1960s)
The instructors' case is documented in detail in Chapter 17 of Kahneman's Thinking, Fast and Slow (2011). What makes it exemplary is not just the misidentification of regression as a causal effect, but the direction of the error. By believing that punishment improved performance and praise degraded it, the instructors had inadvertently arrived at a conclusion that directly inverted the known psychology of reinforcement. The statistical reality was simple: cadets who performed at the extremes — very well or very poorly — were going to move toward their own average regardless of what anyone said to them. The praise and the criticism were both being credited or blamed for a process that was purely mathematical. The consequences included the systematic neglect of positive reinforcement in a high-stakes military training environment, with implications for morale, attrition, and the psychological wellbeing of trainees over decades.
Case Study 2: The Sports Illustrated Cover Jinx
Every sports fan knows the Sports Illustrated jinx: athletes who appear on the cover of Sports Illustrated magazine subsequently underperform. The jinx has been discussed as everything from a psychological burden (the cover appearance creates distraction and pressure) to outright superstition. The statistical explanation, published by Schall and Morris in 1993, is considerably less dramatic. Athletes appear on the cover of Sports Illustrated because they have just performed at an exceptional level — they have had a career-best game, a record-breaking season, a transcendent playoff run. These performances are, by definition, extreme. They contain extraordinary skill and extraordinary noise. The subsequent regression toward the athlete's true (lower) average is not caused by the magazine cover. It is caused by the noise component of the exceptional performance dissipating. Schall and Morris examined the empirical record and found that post-cover performance declines were consistent with what regression alone would predict, with no need to invoke psychological pressure. The jinx is not a jinx. It is a law of statistics wearing a sports jersey.
Case Study 3: Spontaneous Improvement and Medical Research
In clinical medicine, a persistent puzzle is why patients in control groups — those receiving no active treatment — frequently improve. A major review by Barnett, van der Pols, and Dobson, published in the International Journal of Epidemiology in 2004, examined this phenomenon systematically. Their analysis found that regression to the mean is responsible for a substantial portion of the apparent improvement seen in placebo arms of clinical trials, and also in observational studies where sick patients are followed over time without intervention.
The mechanism is straightforward. People typically seek medical care when their symptoms are at their worst — when pain is most severe, blood pressure is highest, anxiety is most acute. This is precisely the moment when their condition includes the most noise on the unfavorable side. As time passes, noise reverts and symptoms naturally improve, whether or not anything has been done. The researchers estimated that in some trials, particularly those involving subjective symptoms like pain or depression, regression to the mean could account for half or more of the observed improvement in control groups. The implication is profound: clinical trials that lack proper control arms may be measuring not the efficacy of treatment but the mathematics of extreme values.
Case Study 4: The Exceptional Quarter Trap in Business
In 2003, a detailed study of sales team performance at a large financial services firm documented a pattern that has since been replicated across industries. Sales teams that had achieved exceptional results in one quarter — significantly above their historical average — almost uniformly showed lower performance in the subsequent quarter. Management responded predictably: consultants were hired, processes were reviewed, motivational initiatives were launched. When performance rebounded in the quarter after that, the initiatives were credited.
The statistics told a different story. The teams' long-run averages were largely unchanged. What varied was their variance — the degree to which individual quarters fluctuated around those averages. Teams with high variance showed the most dramatic regression effects. Teams with low variance showed little regression. The business case illustrates how regression to the mean interacts with organizational decision-making in potentially expensive ways: interventions launched in response to statistical noise consume resources and generate false lessons about what drives performance. Companies may become convinced that certain managerial approaches "work" for precisely the same reason the Israeli Air Force instructors were convinced that punishment works.
Applying Regression Thinking Across Domains
In medicine, correct understanding of regression to the mean is essential for evaluating any treatment that is applied when patients are at their symptomatic worst. The practical implication is simple but frequently ignored: always ask whether the measured improvement could be explained by regression alone. The gold standard — the randomized controlled trial with a placebo arm — was designed precisely to answer this question. Regression affects both arms of a properly randomized trial equally, so any difference between them cannot be explained by regression.
In sports analytics, regression to the mean has become a foundational tool. Advanced analysts now routinely discount single-season statistics that deviate sharply from a player's career average, recognizing that extreme seasons contain more noise than typical ones. The discipline of DIPS theory in baseball (defense-independent pitching statistics) emerged partly from recognizing that certain pitching metrics fluctuated with far more noise than others, and that year-to-year regression was predictable and enormous.
In management and human resources, the implications are perhaps the least well-absorbed. Performance review systems that heavily weight single exceptional or terrible periods systematically misread regression as management effect. The employee who had a brilliant year is penalized by expectations, and when their next year is merely good, it looks like underperformance. The employee who had a terrible year triggers intervention, and their subsequent improvement — which would have happened regardless — confirms the intervention's value.
In investing, the concept of mean reversion is central to entire trading strategies. Value investors from Benjamin Graham onward have argued that stocks trading far above or below their fundamental value will tend to revert toward it. This is partially justified by genuine economic mechanisms (competition erodes excess returns, crisis conditions eventually abate), but analysts must be careful to distinguish genuine mean reversion driven by economic logic from the simple statistical regression of noisy price series.
In educational testing, regression to the mean has significant policy implications. Schools whose students score extremely high or low one year will tend, on average, to score less extremely the next — regardless of any change in teaching quality. Accountability systems that reward or punish schools based on year-over-year changes without accounting for regression will systematically punish schools that had an anomalously good year and reward schools that had an anomalously bad one.
The Intellectual Lineage
Francis Galton (1822-1911) stumbled onto regression while investigating heredity. His 1886 paper is the founding document. Galton did not immediately grasp that he had discovered a universal statistical phenomenon; he initially understood it as a biological mechanism of hereditary inheritance. But as he pressed further, he realized the effect appeared wherever two imperfectly correlated variables were measured.
Karl Pearson (1857-1936), inheriting and extending Galton's work, formalized the mathematics. The Pearson correlation coefficient — still the standard measure of linear association — is the direct quantification of how much regression to expect between two variables. Pearson's work in the 1890s and early 1900s placed regression to the mean within a rigorous statistical framework that persists essentially unchanged today.
Daniel Kahneman and Amos Tversky, beginning with their 1974 Science paper "Judgment under Uncertainty: Heuristics and Biases," brought regression to the mean into cognitive psychology. Their insight was that the failure to intuitively appreciate regression was not a failure of education but a deep feature of human cognition — a systematic bias arising from the way we represent and reason about probability. Their Israeli Air Force example, first presented as a teaching case in the 1970s and published in Thinking, Fast and Slow in 2011, remains the most cited and vivid illustration of how regression effects are misread as causal interventions.
Kahneman's 2011 book devoted an entire chapter — Chapter 17, "Regression to the Mean" — to the subject, calling it "the most important statistical concept that people consistently fail to intuit." The chapter is, in essence, an extended argument that our failure to understand this single phenomenon has distorted medicine, education, management, and public policy for as long as we have been making decisions about interventions.
The Research Evidence
The evidence base for regression to the mean as both a statistical fact and a cognitive blind spot is substantial and spans nearly 140 years of research.
Galton's 1886 data on familial heights, analyzed with his characteristic Victorian thoroughness, showed a regression slope of approximately 0.64 — meaning that for every inch of parental height above the mean, children's height deviated from the mean by only about 0.64 inches. This figure was robust across his sample and has been replicated in modern genetic studies with refinements but broadly similar values.
Kahneman and Tversky's work on heuristics and biases, synthesized in their landmark 1974 Science paper, established that human subjects consistently fail to predict regression effects, instead assuming that extreme performance will be followed by similarly extreme performance. Their subjects were not naive laypeople but graduate students and researchers — people who had been trained in statistical thinking and still defaulted to non-statistical intuitions.
The Sports Illustrated jinx analysis by Schall and Morris (1993), published in Chance, examined cover appearances from 1954 through the early 1990s and quantified the regression effect. Their finding — that statistical regression alone predicted the observed performance declines — was not widely publicized, but it remains a clean empirical demonstration of how regression masquerades as a genuine effect.
Barnett, van der Pols, and Dobson's 2004 systematic review in the International Journal of Epidemiology examined regression to the mean in clinical research specifically, providing quantitative estimates of the bias it introduces into studies that fail to account for it. They concluded that the phenomenon was "ubiquitous" in clinical research and frequently underestimated.
The hot hand fallacy, closely related to our intuitions about regression, was studied in the foundational paper by Gilovich, Vallone, and Tversky (1985), published in Cognitive Psychology. Their analysis of NBA shooting data found no statistical evidence of the hot hand — that is, no evidence that players who had just made several shots in a row had an elevated probability of making the next one, beyond what their overall shooting percentage would predict. This finding has been partially revised by subsequent research using more granular data, but the original paper remains a landmark in behavioral statistics.
Limits and Failure Modes
Regression to the mean is real and ubiquitous, but it is not the explanation for everything. Over-applying it is as dangerous as failing to apply it at all.
The critical question is: is the variation we observe primarily noise, or does it reflect genuine change in the underlying process? This question cannot be answered by intuition alone; it requires data.
Genuine trends do exist. A company that has grown its earnings for fifteen consecutive years may be doing so because of structural competitive advantages, not because of improbable luck. A patient whose pain scores decline steadily over twelve months of treatment may be genuinely improving, not merely regressing. The key diagnostic question is persistence: noise does not persist systematically. If observed deviations from average show a pattern — particularly a directional pattern over time — regression to the mean becomes a less plausible explanation.
There is also a technical point about selection effects. Regression to the mean operates within a stable population. If the population changes — if a school improves its teaching staff significantly, if an athlete genuinely increases their conditioning, if a company fundamentally transforms its operations — regression toward the old mean is no longer the right expectation. The new baseline has shifted.
The danger of over-applying regression thinking is real in policy and science. A researcher who assumes that all improvement is regression may dismiss genuine treatment effects. A manager who assumes that all exceptional performance is noise may fail to identify and nurture genuinely exceptional talent. A statistician who applies regression corrections too aggressively may remove real signal from data. The discipline required is not a reflexive application of regression thinking but an honest assessment, in each case, of how much variance is noise and how much reflects something real.
The best safeguard is the one that has always been at the foundation of good empirical science: comparison. Compare the treated group to an untreated control. Compare the period before an intervention to the period after, with an eye to what the counterfactual would have looked like. Ask, always, whether the observed change is consistent with what regression alone would predict — and if it is larger than that prediction, investigate further.
References
Galton, F. (1886). Regression towards mediocrity in hereditary stature. Journal of the Anthropological Institute of Great Britain and Ireland, 15, 246-263.
Galton, F. (1877). Typical laws of heredity. Proceedings of the Royal Institution of Great Britain, 8, 282-301.
Pearson, K. (1896). Mathematical contributions to the theory of evolution. III. Regression, heredity, and panmixia. Philosophical Transactions of the Royal Society of London, Series A, 187, 253-318.
Kahneman, D., & Tversky, A. (1974). Judgment under uncertainty: Heuristics and biases. Science, 185(4157), 1124-1131.
Gilovich, T., Vallone, R., & Tversky, A. (1985). The hot hand in basketball: On the misperception of random sequences. Cognitive Psychology, 17(3), 295-314.
Schall, T., & Morris, G. (1993). The jinx of the cover of Sports Illustrated: A statistical consideration. Chance, 6(2), 29-30.
Barnett, A. G., van der Pols, J. C., & Dobson, A. J. (2004). Regression to the mean: What it is and how to deal with it. International Journal of Epidemiology, 34(1), 215-220.
Kahneman, D. (2011). Thinking, Fast and Slow. Farrar, Straus and Giroux.
Tversky, A., & Kahneman, D. (1981). The framing of decisions and the psychology of choice. Science, 211(4481), 453-458.
Taleb, N. N. (2007). The Black Swan: The Impact of the Highly Improbable. Random House.
Miller, J. B., & Sanjurjo, A. (2018). Surprised by the hot hand fallacy? A truth in the law of small numbers. Econometrica, 86(6), 2019-2047.
Stigler, S. M. (1986). The History of Statistics: The Measurement of Uncertainty before 1900. Harvard University Press.
Frequently Asked Questions
What is regression to the mean?
Regression to the mean is the statistical phenomenon whereby, if a variable is extreme on its first measurement, it will tend to be closer to the average on a subsequent measurement. It occurs because observed performance combines true ability with random noise — and noise fluctuates. When a measurement is extreme, it likely captured an unusually favorable or unfavorable noise component, which will not persist.
Who discovered regression to the mean?
Francis Galton discovered it in 1886 while studying inherited height across nearly a thousand British families. He noticed that tall parents consistently had children shorter than themselves, and short parents had children taller — both groups regressing toward the population average. He published the finding as 'Regression towards mediocrity in hereditary stature.' Karl Pearson later formalized the mathematics through the correlation coefficient.
Why do Israeli Air Force instructors illustrate regression to the mean?
Daniel Kahneman documented that Israeli Air Force flight instructors believed punishment worked better than praise because praised pilots flew worse next time while criticized pilots improved. This was pure regression to the mean: exceptional performances contained elevated noise, which dissipated next flight; poor performances contained unfavorable noise, which also dissipated. The praise and criticism caused nothing — the instructors were attributing a mathematical inevitability to their feedback.
Is the Sports Illustrated cover jinx real?
No — it is a regression to the mean artifact. Athletes appear on the cover because they just performed exceptionally. Exceptional performances contain both high skill and high positive noise. Subsequent performances regress toward the athlete's true average as the noise component dissipates. Schall and Morris (1993) examined the historical record and found post-cover performance declines were exactly consistent with statistical regression, without any need to invoke psychological pressure from the cover appearance.
How does regression to the mean affect clinical trials?
Patients enter trials when symptoms are at their worst — a point containing maximum unfavorable noise. As time passes, symptoms improve partly because noise dissipates, regardless of treatment. Barnett et al. (2004) found regression to the mean accounts for a substantial portion of apparent improvement in placebo control groups. Properly randomized trials with control groups are designed to neutralize this effect: regression affects both arms equally, so genuine treatment effects are the only explanation for between-group differences.
How is regression to the mean different from the gambler's fallacy?
The gambler's fallacy wrongly expects future correction in genuinely independent random events (coin flips, roulette spins) where no such correction exists. Regression to the mean is a real statistical phenomenon that occurs whenever two measurements are imperfectly correlated — it describes what actually happens with repeated measurement of noisy real-world quantities. The error of regression to the mean is not in the prediction but in the causal attribution: observing regression and wrongly crediting it to an intervention.
When does regression to the mean NOT apply?
When observed variation reflects genuine change rather than noise. If a company grows earnings for fifteen consecutive years, structural competitive advantages are more plausible than sustained luck. If a patient improves steadily over twelve months, genuine treatment effect is more plausible than noise dissipating. The key diagnostic is persistence: noise does not persist systematically in one direction. Also, if the underlying population changes — new coaching, new strategy, genuine learning — the old mean is no longer the right reference point.