How Goodhart's Law Breaks Metrics: When Measures Become Targets
In the mid-1980s, British hospitals faced criticism for long wait times. The government introduced a performance metric: no patient should wait more than 18 weeks from referral to treatment. Hospitals failing this target faced consequences—funding cuts, public shaming, leadership changes.
The metric worked. Wait times dropped dramatically. Success!
Or was it? Closer inspection revealed disturbing patterns:
- Selective referrals: Doctors delayed officially referring patients, keeping them in limbo to avoid starting the 18-week clock
- Creative scheduling: Patients received "clock-stopping" procedures (minor, often unnecessary interventions) resetting the timer before major treatment
- Queue manipulation: Easy cases prioritized to hit targets; complex cases delayed indefinitely
- Data gaming: Administrative tricks reclassified waits, making them invisible to metrics
- Perverse outcomes: Some patients waited longer than before because resources redirected to "target patients" at expense of others
The metric improved. Real patient care deteriorated.
This phenomenon has a name: Goodhart's Law, formulated by British economist Charles Goodhart in 1975: "When a measure becomes a target, it ceases to be a good measure."
The mechanism is deceptively simple yet profoundly important. Metrics are proxies—imperfect representations of what we actually care about. When metrics carry consequences (rewards, punishments, status, resources), people optimize for the metric rather than the underlying goal. The metric diverges from its purpose.
Goodhart's Law is everywhere: education (teaching to tests), healthcare (avoiding risky patients), business (vanity metrics), government (statistical manipulation), technology (engagement metrics undermining wellbeing), research (citation gaming). Any domain using metrics faces this problem.
This article explains Goodhart's Law comprehensively: the mechanism behind it, why it's nearly inevitable, classic examples across domains, the psychology of metric gaming, how to detect it, strategies for designing more robust metrics, when metrics should and shouldn't be targets, and the fundamental tension between measurement and management.
Understanding Goodhart's Law: The Core Mechanism
Before examining solutions, understand precisely why metrics break when targeted.
The Proxy Problem
Metrics are rarely the actual goal—they're indicators of goals we care about.
| Actual Goal | Metric Proxy | Gap |
|---|---|---|
| Student understanding | Test scores | Tests measure narrow slice of understanding |
| Customer satisfaction | Survey ratings | Surveys sample opinions, not full experience |
| Employee productivity | Hours worked | Hours ≠ valuable output |
| Hospital quality | Mortality rate | Sickest patients avoided to protect metric |
| Code quality | Lines of code | More lines often means worse quality |
| Website value | Page views | Views without engagement or value |
The gap between goal and proxy creates opportunity for gaming. When the proxy becomes the target, rational actors maximize the proxy even when doing so undermines the goal.
The Optimization Dynamic
Step 1: Metric introduced as indicator of performance
Initially, metric correlates with goal. High-performing entities naturally score well; low performers score poorly. Metric provides useful information.
Step 2: Metric becomes target with consequences
Organizations set targets. Achieving targets brings rewards (bonuses, promotions, funding, reputation); failing brings punishments (loss of resources, shame, job loss).
Step 3: Actors optimize for metric, not goal
Rational response: Maximize metric. This includes:
- Legitimate improvement: Actually getting better at real goal (best outcome)
- Focus shifting: Prioritizing measurable aspects while neglecting unmeasured ones
- Gaming: Finding ways to boost metric without improving (or while harming) real goal
- Manipulation: Distorting data, exploiting loopholes, outright cheating
Step 4: Metric-goal divergence widens
As gaming increases, correlation between metric and goal weakens. Eventually metric becomes meaningless or actively harmful indicator of true performance.
Step 5: Metric loses informational value
The measure that once provided insight now obscures reality. Organizations are "hitting targets but missing the point."
Why It's Nearly Inevitable
Goodhart's Law isn't about bad people—it's about rational responses to incentives in complex systems.
Reason 1: No perfect metrics exist
Every metric has gaps. Perfect measures would require capturing all dimensions of complex goals—which is either impossible or so burdensome it prevents action.
Reason 2: Gaming is easier than genuinely improving
Often, manipulating metrics is cheaper and faster than actual improvement. When people face pressure to hit targets, they take the path of least resistance.
Example: Improving actual teaching quality requires expertise, time, resources. Teaching specific test content requires less. Under pressure, teachers rationally focus on tests.
Reason 3: Unintended consequences emerge in complex systems
Organizations are complex adaptive systems. Interventions create ripple effects. Metric-driven optimization in one area creates problems elsewhere—problems often invisible to the metric.
Reason 4: Metrics change behavior they measure
Heisenberg principle for social systems: Measurement itself alters what's being measured. People respond to being measured. These responses often undermine measurement validity.
Classic Examples of Goodhart's Law Across Domains
Understanding how Goodhart's Law manifests in different contexts reveals patterns.
Education: Teaching to the Test
Goal: Student learning, critical thinking, knowledge application
Metric: Standardized test scores
What happened:
- Teachers focus curriculum narrowly on tested content
- "Test prep" replaces deeper learning
- Creative subjects (art, music, physical education) reduced
- Students learn test-taking strategies, not subjects
- Cheating scandals (Atlanta, Washington DC) where teachers altered student answers
- Schools discourage low-performing students from taking tests (to protect school averages)
Metric improved, goal undermined: Test scores rose while actual learning and educational breadth declined.
Healthcare: The Mortality Metric
Goal: High-quality patient care, health outcomes
Metric: Hospital mortality rates, readmission rates
What happened:
- Surgeons avoid high-risk patients (who need surgery most) to protect statistics
- Patients discharged prematurely to avoid "dying in hospital"
- Readmissions prevented through aggressive follow-up that doesn't improve health
- "Upcoding" diagnoses to make patient populations appear sicker (making outcomes look better by comparison)
- Resources diverted from unmeasured aspects of care (patient experience, preventive care)
Consequence: Some patients who most need intervention are turned away; others receive suboptimal timing of care.
Business: The Wells Fargo Scandal
Goal: Customer satisfaction, sustainable growth, ethical banking
Metric: Number of products per customer (cross-selling ratio)
What happened:
- Employees given aggressive sales targets (8+ products per customer)
- Unable to legitimately meet targets, employees created fake accounts
- 3.5 million fraudulent accounts opened without customer knowledge
- Customers charged fees for accounts they didn't authorize
- Employees who resisted or reported were fired
Result: $3 billion in fines, irreparable reputational damage, CEO resignation, criminal charges. Metric maximization destroyed the company's actual goals.
Technology: Social Media Engagement
Goal: Meaningful connection, informative content, user wellbeing
Metric: Engagement (time spent, clicks, shares, comments)
What happened:
- Algorithms optimize for engagement, not quality or wellbeing
- Outrage, controversy, and misinformation generate high engagement
- Platforms amplify divisive content (it's engaging!)
- "Doomscrolling," addiction, polarization, anxiety
- Meaningful connection paradoxically declines while "engagement" soars
Observation: Facebook's own research showed Instagram harms teen mental health, but engagement metrics told different story—so platform prioritized metrics over wellbeing.
Government: Soviet Nail Factory
Classic example from economic planning:
Scenario 1: Factory given target measured by total weight of nails produced
Result: Factory manufactures enormous, useless nails (maximizes weight per nail)
Scenario 2: Target changed to number of nails
Result: Factory manufactures tiny, useless nails (maximizes count)
Neither metric captured actual goal: Producing nails of appropriate sizes for construction needs. Optimizing for metric produced absurd outcomes.
Research: Citation Gaming
Goal: Impactful scientific contribution, knowledge advancement
Metric: Citation counts, h-index, impact factor
What happened:
- "Citation cartels": Groups of researchers cite each other excessively
- Self-citation inflation
- Editors pressure authors to cite journal's other papers (to boost journal metrics)
- "Salami slicing": Breaking research into smallest publishable units (more papers = more citations)
- Choosing "hot" but incremental topics over important but risky research
Result: Citation counts rise while research quality and originality face pressures.
Policing: Compstat and Crime Statistics
Goal: Public safety, crime reduction
Metric: Reported crime rates
What happened:
- Pressure to show declining crime rates
- Officers discourage victims from filing reports
- Crimes downgraded to lesser offenses (felony → misdemeanor)
- Manipulation of crime classification data
- Stops, searches, and arrests increase (measurable actions) while actual crime solving decreases
Multiple police departments caught manipulating crime data to hit Compstat targets while real safety declined or stagnated.
The Psychology of Metric Gaming
Why do people game metrics even when they know it undermines real goals?
Mechanism 1: Rational Response to Incentives
People respond to actual incentives, not stated goals.
If rewards/punishments attach to metrics, the metric becomes the goal in practice—regardless of rhetoric about "real objectives."
Example: Teacher who genuinely cares about student learning but faces job loss if test scores don't improve. Teaching to test becomes survival, not malfeasance.
Mechanism 2: Diffused Responsibility
"I'm just optimizing for what I'm told to optimize for."
When leadership sets metric targets, individuals feel absolved of responsibility for negative consequences. Gaming feels like "doing your job," not undermining organizational mission.
Mechanism 3: Short-Term Pressure
Genuine improvement takes time. Metric manipulation produces immediate results. Under pressure for quick wins, gaming becomes attractive.
Mechanism 4: Competitive Dynamics
If others are gaming metrics, you're punished for not gaming. Honest actors lose to gamers when only metrics matter.
Tragedy of the commons: Individual gaming is rational; collective gaming destroys metric's value for everyone.
Mechanism 5: Metric Fixation
Targets become psychologically real.
Once metrics are entrenched, people genuinely start believing hitting the target = success, even when evidence suggests otherwise. Metric becomes substitute for thinking about real goals.
Mechanism 6: Unintended Blindness
Often people gaming metrics don't consciously realize they're undermining goals. They see metric improvement as goal achievement. Ethical fading: Moral dimensions disappear from view when framed as "meeting targets."
Detecting Goodhart's Law in Action
How do you recognize when metrics are being gamed?
Warning Sign 1: Metrics Improve While Real Performance Declines
Most telltale sign: Numbers go up, but qualitative observation suggests things are getting worse.
Examples:
- Test scores rise but employers complain graduates lack skills
- Hospital mortality rates fall but patient complaints increase
- Employee productivity metrics improve but customer satisfaction drops
Action: Always pair quantitative metrics with qualitative assessment. Talk to frontline workers, customers, actual stakeholders.
Warning Sign 2: Creative Compliance Emerges
People find technically compliant ways to hit targets while violating spirit of goal.
Examples:
- "Clock-stopping" procedures in hospitals
- Schools encouraging weak students to skip test day
- Companies booking revenue in current quarter then reversing it later
Pattern: If compliance feels like exploiting loopholes rather than achieving goals, Goodhart's Law is operating.
Warning Sign 3: Focus Narrows to Measured Aspects
Unmeasured dimensions of performance receive less attention, even when important.
Example: Sales team measured on volume closes many low-value deals; high-value strategic deals ignored (harder, longer sales cycles, less immediately measurable).
Warning Sign 4: Metrics Stabilize at "Just Meeting Target"
When many actors cluster just above target threshold, suggests optimization for target rather than genuine performance improvement.
Statistical signature: If natural distribution, you'd expect smooth distribution. Clustering right above target indicates strategic gaming.
Warning Sign 5: Resistance to Metric Changes
If proposals to change or supplement metrics meet strong resistance, often because current metrics are being gamed—changes would expose gaming or require actual improvement.
Warning Sign 6: Data Integrity Issues
Anomalies, inconsistencies, or irregularities in reported data suggest manipulation.
Examples: Sudden discontinuous jumps, data too good to be true, lack of variance, reporting delays around target deadlines.
Designing Metrics That Resist Gaming
Can metrics be designed to minimize Goodhart's Law effects?
Strategy 1: Use Multiple Complementary Metrics
Single metrics are easily gamed. Multiple metrics covering different dimensions make gaming harder—optimizing one often makes others worse.
Example: Hospital quality
Instead of just mortality rate, measure:
- Mortality rate (outcome)
- Readmission rate (outcome)
- Patient experience scores (process)
- Complication rates (outcome)
- Average treatment cost (efficiency)
- Staff satisfaction (leading indicator)
Gaming one metric (e.g., avoiding risky patients to reduce mortality) would harm others (reduce revenue, worsen staff satisfaction from turning away patients).
Principle: Make gaming harder than genuinely improving.
Strategy 2: Measure Outcomes, Not Outputs
Outputs (things produced) are easier to game than outcomes (actual results achieved).
| Output (Gameable) | Outcome (Less Gameable) |
|---|---|
| Number of arrests | Crime reduction, public safety |
| Lines of code written | Software quality, user satisfaction |
| Hours worked | Project completion, business impact |
| Number of leads | Revenue, customer lifetime value |
| Publications | Scientific impact, citation by others over time |
Outcomes are harder to fake because they depend on external validation, not just internal measurement.
Strategy 3: Include Balancing Metrics
Pair metrics with "counterbalances" that catch common gaming strategies.
Examples:
- Sales volume + customer retention rate (catches churning customers)
- Production speed + defect rate (catches quality shortcuts)
- Cost reduction + employee satisfaction (catches morale-destroying cuts)
- Growth rate + customer acquisition cost (catches unsustainable growth)
Strategy 4: Use Relative Rather Than Absolute Targets
Absolute targets (must reach X) create binary pressure and gaming.
Relative targets (improve by Y% or rank in top Z) reduce pressure for extreme gaming.
Even better: Avoid fixed targets entirely. Use metrics for information and improvement, not rigid pass/fail thresholds.
Strategy 5: Change Metrics Periodically
Static metrics get gamed over time as people learn loopholes.
Rotating metrics makes gaming harder—actors can't invest in sophisticated gaming strategies if metrics change.
Balance: Don't change so frequently you lose ability to track progress, but don't let metrics become entrenched.
Strategy 6: Include Qualitative Assessment
Quantitative metrics alone are insufficient. Combine with qualitative judgment from people close to real work.
Example: Teacher evaluation
Not just: Test scores (quantitative)
But also: Peer observations, student feedback, principal evaluation, curriculum contributions (qualitative)
Makes gaming harder: Can't fake all dimensions simultaneously.
Strategy 7: Measure Gaming Directly
Include metrics that detect gaming behavior itself.
Examples:
- Variance in reported data (low variance suspicious—suggests manipulation)
- Distribution around targets (clustering suspicious)
- Audit random sample with intensive verification
- Whistleblower reports or anonymous surveys about gaming
Strategy 8: Reward Improvement, Not Levels
Targeting specific levels (must reach 90%) creates gaming pressure.
Rewarding improvement (reward those who improve most) reduces pressure for absolute gaming and encourages everyone to get better from their baseline.
When Metrics Should and Shouldn't Be Targets
Not all metrics suffer equally from Goodhart's Law. Context matters.
Metrics That Work as Targets
Characteristics:
- Simple, unambiguous, hard to game
- Directly under actors' control
- Low negative externalities from optimization
- Short feedback loops (consequences of gaming become apparent quickly)
Examples:
- Safety metrics: "Zero accidents" as target generally works—few ways to game safety that don't improve actual safety
- Efficiency metrics in constrained systems: "Reduce energy consumption by 10%" with fixed output—limited gaming options
- Binary outcomes: "Complete project by deadline"—either it's done or not
Metrics That Fail as Targets
Characteristics:
- Complex, multi-dimensional goals
- Imperfect proxies for what you care about
- Long feedback loops (gaming effects delayed)
- Competing stakeholders or quality dimensions
Examples:
- Quality metrics: Test scores, patient outcomes, customer satisfaction—always have gaps between metric and true quality
- Innovation metrics: Patents filed, R&D spending—easy to game, poor proxies for real innovation
- Culture metrics: Engagement scores—easily manipulated by fear or pressure
The Management vs. Measurement Tension
Peter Drucker popularized: "What gets measured gets managed."
Goodhart's Law adds: "What gets managed gets gamed."
The tension: Metrics are useful for understanding performance but problematic for managing performance through rigid targets.
Resolution strategies:
Use metrics as information, not as rigid targets
- Monitor metrics to understand patterns
- Investigate when metrics change
- Use as conversation starters, not conversation enders
Maintain human judgment
- Don't let metrics override qualitative assessment
- Empower people to do right thing even when metrics look bad
- Reward long-term thinking over metric optimization
Create psychological safety
- Don't punish people for bringing bad metrics if they're honestly working toward goals
- Celebrate those who resist gaming even when costly personally
- Reward whistle blowing about metric manipulation
The Philosophy of Metrics: Maps vs. Territory
Goodhart's Law reveals fundamental philosophical tension.
The Map Is Not the Territory
Alfred Korzybski's principle: Representations (maps) are not the things they represent (territory). Maps are useful simplifications but always incomplete.
Metrics are maps: Simplified representations of complex reality.
Problem: Organizations treat metrics as territory—as if the metric is the thing they care about rather than indicator of it.
Solution: Remember metrics are tools for understanding reality, not substitutes for reality.
The McNamara Fallacy
Named after Robert McNamara, US Secretary of Defense during Vietnam War, who relied heavily on quantitative metrics (body counts, bombs dropped) while ignoring unquantifiable strategic factors.
The fallacy in four steps:
Step 1: Measure what's easily measurable
Step 2: Disregard what can't be measured easily
Step 3: Assume what can't be measured isn't important
Step 4: Conclude what can't be measured doesn't exist
Result: Optimization for measurable metrics while ignoring unmeasurable factors that determine actual success.
Lesson: Most important things are difficult to measure. Don't let measurability determine importance.
Campbell's Law
Sociologist Donald Campbell formulated related principle: "The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor."
Campbell's Law is Goodhart's Law specifically for social systems: measurement for high-stakes decisions corrupts the measurement.
Living with Goodhart's Law: Practical Wisdom
Since Goodhart's Law is inevitable, how should organizations respond?
Principle 1: Accept Imperfection
No perfect metric system exists. Stop seeking it. Design for robustness to gaming, not immunity.
Principle 2: Maintain Metric Humility
Metrics provide information, not truth. Always ask: "What is this metric not capturing? What could make this metric misleading?"
Principle 3: Invest in Judgment
Don't let metrics substitute for thinking. Develop people's ability to reason about goals, context, and appropriate actions even when metrics point elsewhere.
Principle 4: Create Feedback Loops
Monitor for metric-goal divergence. When metrics improve but qualitative assessment suggests problems, investigate aggressively.
Principle 5: Reward Goal Achievement, Not Metric Achievement
Distinguish hitting targets from achieving goals. Reward those who achieve real goals even when metrics don't fully capture it; don't reward pure metric gaming.
Principle 6: Make Gaming Illegitimate
Cultural norm matters. Organizations where gaming is winked at versus condemned have different outcomes. Make clear that gaming is unacceptable, even if "technically" meeting targets.
Principle 7: Design for Resilience
Assume metrics will be gamed. How would gaming manifest? What would it look like? Design metrics and processes anticipating gaming attempts.
Principle 8: Remember Why You Measure
Constantly reconnect metrics to underlying goals. Metrics are means, not ends. When metrics no longer serve goals, change metrics.
Conclusion: Metrics as Tools, Not Gods
British hospitals improved wait-time metrics while harming patients. The metric became the mission, displacing actual healing. This is Goodhart's Law at scale—and it's preventable with wisdom.
The key insights:
1. Goodhart's Law is inevitable—when measures become targets with meaningful consequences, rational actors optimize for measures rather than goals. This isn't moral failure; it's predictable response to incentives.
2. The core problem is proxy-goal gap—metrics are imperfect indicators of what we care about. Optimization exploits gaps between indicator and goal. Perfect metrics don't exist; all metrics have exploitable weaknesses.
3. Gaming is often easier than genuine improvement—manipulating metrics requires less effort, time, and resources than actual improvement. Under pressure, people take path of least resistance. Expect gaming; design for it.
4. Multiple examples show consistent patterns—education, healthcare, business, technology, government, research all suffer identical dynamics. Teaching to tests, avoiding risky patients, fake accounts, engagement optimization, data manipulation—same mechanism, different domains.
5. Psychology drives gaming—rational incentives, diffused responsibility, short-term pressure, competitive dynamics, metric fixation, and unintended blindness combine to make gaming nearly irresistible in target-driven systems.
6. Detection requires vigilance—metrics improving while real performance declines, creative compliance, narrow focus, clustering at targets, resistance to change, data integrity issues all signal Goodhart's Law in action.
7. Mitigation strategies exist but aren't perfect—multiple complementary metrics, focusing on outcomes over outputs, balancing metrics, changing metrics periodically, including qualitative judgment, measuring gaming directly, rewarding improvement over levels. These reduce but don't eliminate gaming.
8. Context determines appropriateness—some metrics work as targets (simple, unambiguous, low externalities); others fail catastrophically (complex, proxy-heavy, long feedback loops). Match metrics to context.
9. The fundamental tension is management vs. measurement—metrics are valuable for understanding; problematic for rigid target-based management. Use metrics as information and conversation starters, not as substitutes for judgment.
10. Philosophical wisdom is essential—remember metrics are maps not territory, avoid McNamara Fallacy of valuing only what's measurable, recognize Campbell's Law that high-stakes measurement corrupts itself, maintain humility about metric limitations.
As mathematician Jerry Muller argues in The Tyranny of Metrics, metric fixation produces incentivized gaming and goal displacement. The solution isn't abandoning measurement—it's measured use of metrics.
Use metrics as tools, not gods. Measure to understand, not to mechanically control. Combine quantitative metrics with qualitative judgment. Remember the ultimate goal isn't hitting targets—it's achieving meaningful outcomes in the real world.
Goodhart's Law will never disappear. What can change is how organizations respond: with wisdom about metric limitations, humility about measurement, investment in judgment, and cultural emphasis on goals over gaming.
The test isn't whether your metrics can be gamed—they can. The test is whether your organization maintains focus on real goals even when metrics point elsewhere. That's where excellence lives: beyond the numbers, in commitment to mission that metrics imperfectly represent but never fully capture.
References
Campbell, D. T. (1979). Assessing the impact of planned social change. Evaluation and Program Planning, 2(1), 67–90. https://doi.org/10.1016/0149-7189(79)90048-X
Chrystal, K. A., & Mizen, P. D. (2003). Goodhart's law: Its origins, meaning and implications for monetary policy. In P. Mizen (Ed.), Central banking, monetary theory and practice: Essays in honour of Charles Goodhart (pp. 221–243). Edward Elgar Publishing.
Ewell, P. T. (1987). Establishing a campus-based assessment program. New Directions for Higher Education, 1987(59), 9–24. https://doi.org/10.1002/he.36919875903
Goodhart, C. A. E. (1984). Monetary theory and practice: The UK experience. Macmillan.
Kerr, S. (1975). On the folly of rewarding A, while hoping for B. Academy of Management Journal, 18(4), 769–783. https://doi.org/10.5465/255378
Korzybski, A. (1933). Science and sanity: An introduction to non-Aristotelian systems and general semantics. Institute of General Semantics.
McNamara, R. S., & VanDeMark, B. (1995). In retrospect: The tragedy and lessons of Vietnam. Times Books.
Muller, J. Z. (2018). The tyranny of metrics. Princeton University Press. https://doi.org/10.1515/9780691191263
Ridgway, V. F. (1956). Dysfunctional consequences of performance measurements. Administrative Science Quarterly, 1(2), 240–247. https://doi.org/10.2307/2390989
Rothstein, R. (2008). Holding accountability to account: How scholarship and experience in other fields inform exploration of performance incentives in education. National Center on Performance Incentives, Vanderbilt University.
Strathern, M. (1997). 'Improving ratings': Audit in the British university system. European Review, 5(3), 305–321. https://doi.org/10.1002/(SICI)1234-981X(199707)5:3<305::AID-EURO184>3.0.CO;2-4
U.S. Senate Committee on Banking, Housing, and Urban Affairs. (2016). An examination of Wells Fargo's unauthorized accounts and the regulatory response. U.S. Government Publishing Office.
Word count: 6,314 words