How Goodhart's Law Breaks Metrics: When Measures Become Targets

In the mid-1980s, British hospitals faced criticism for long wait times. The government introduced a performance metric: no patient should wait more than 18 weeks from referral to treatment. Hospitals failing this target faced consequences—funding cuts, public shaming, leadership changes.

The metric worked. Wait times dropped dramatically. Success!

Or was it? Closer inspection revealed disturbing patterns:

  • Selective referrals: Doctors delayed officially referring patients, keeping them in limbo to avoid starting the 18-week clock
  • Creative scheduling: Patients received "clock-stopping" procedures (minor, often unnecessary interventions) resetting the timer before major treatment
  • Queue manipulation: Easy cases prioritized to hit targets; complex cases delayed indefinitely
  • Data gaming: Administrative tricks reclassified waits, making them invisible to metrics
  • Perverse outcomes: Some patients waited longer than before because resources redirected to "target patients" at expense of others

The metric improved. Real patient care deteriorated.

This phenomenon has a name: Goodhart's Law, formulated by British economist Charles Goodhart in 1975: "When a measure becomes a target, it ceases to be a good measure."

The mechanism is deceptively simple yet profoundly important. Metrics are proxies—imperfect representations of what we actually care about. When metrics carry consequences (rewards, punishments, status, resources), people optimize for the metric rather than the underlying goal. The metric diverges from its purpose.

Goodhart's Law is everywhere: education (teaching to tests), healthcare (avoiding risky patients), business (vanity metrics), government (statistical manipulation), technology (engagement metrics undermining wellbeing), research (citation gaming). Any domain using metrics faces this problem.

This article explains Goodhart's Law comprehensively: the mechanism behind it, why it's nearly inevitable, classic examples across domains, the psychology of metric gaming, how to detect it, strategies for designing more robust metrics, when metrics should and shouldn't be targets, and the fundamental tension between measurement and management.


Understanding Goodhart's Law: The Core Mechanism

Before examining solutions, understand precisely why metrics break when targeted.

The Proxy Problem

Metrics are rarely the actual goal—they're indicators of goals we care about.

Actual Goal Metric Proxy Gap
Student understanding Test scores Tests measure narrow slice of understanding
Customer satisfaction Survey ratings Surveys sample opinions, not full experience
Employee productivity Hours worked Hours ≠ valuable output
Hospital quality Mortality rate Sickest patients avoided to protect metric
Code quality Lines of code More lines often means worse quality
Website value Page views Views without engagement or value

The gap between goal and proxy creates opportunity for gaming. When the proxy becomes the target, rational actors maximize the proxy even when doing so undermines the goal.

The Optimization Dynamic

Step 1: Metric introduced as indicator of performance

Initially, metric correlates with goal. High-performing entities naturally score well; low performers score poorly. Metric provides useful information.

Step 2: Metric becomes target with consequences

Organizations set targets. Achieving targets brings rewards (bonuses, promotions, funding, reputation); failing brings punishments (loss of resources, shame, job loss).

Step 3: Actors optimize for metric, not goal

Rational response: Maximize metric. This includes:

  • Legitimate improvement: Actually getting better at real goal (best outcome)
  • Focus shifting: Prioritizing measurable aspects while neglecting unmeasured ones
  • Gaming: Finding ways to boost metric without improving (or while harming) real goal
  • Manipulation: Distorting data, exploiting loopholes, outright cheating

Step 4: Metric-goal divergence widens

As gaming increases, correlation between metric and goal weakens. Eventually metric becomes meaningless or actively harmful indicator of true performance.

Step 5: Metric loses informational value

The measure that once provided insight now obscures reality. Organizations are "hitting targets but missing the point."

Why It's Nearly Inevitable

Goodhart's Law isn't about bad people—it's about rational responses to incentives in complex systems.

Reason 1: No perfect metrics exist

Every metric has gaps. Perfect measures would require capturing all dimensions of complex goals—which is either impossible or so burdensome it prevents action.

Reason 2: Gaming is easier than genuinely improving

Often, manipulating metrics is cheaper and faster than actual improvement. When people face pressure to hit targets, they take the path of least resistance.

Example: Improving actual teaching quality requires expertise, time, resources. Teaching specific test content requires less. Under pressure, teachers rationally focus on tests.

Reason 3: Unintended consequences emerge in complex systems

Organizations are complex adaptive systems. Interventions create ripple effects. Metric-driven optimization in one area creates problems elsewhere—problems often invisible to the metric.

Reason 4: Metrics change behavior they measure

Heisenberg principle for social systems: Measurement itself alters what's being measured. People respond to being measured. These responses often undermine measurement validity.


Classic Examples of Goodhart's Law Across Domains

Understanding how Goodhart's Law manifests in different contexts reveals patterns.

Education: Teaching to the Test

Goal: Student learning, critical thinking, knowledge application

Metric: Standardized test scores

What happened:

  • Teachers focus curriculum narrowly on tested content
  • "Test prep" replaces deeper learning
  • Creative subjects (art, music, physical education) reduced
  • Students learn test-taking strategies, not subjects
  • Cheating scandals (Atlanta, Washington DC) where teachers altered student answers
  • Schools discourage low-performing students from taking tests (to protect school averages)

Metric improved, goal undermined: Test scores rose while actual learning and educational breadth declined.

Healthcare: The Mortality Metric

Goal: High-quality patient care, health outcomes

Metric: Hospital mortality rates, readmission rates

What happened:

  • Surgeons avoid high-risk patients (who need surgery most) to protect statistics
  • Patients discharged prematurely to avoid "dying in hospital"
  • Readmissions prevented through aggressive follow-up that doesn't improve health
  • "Upcoding" diagnoses to make patient populations appear sicker (making outcomes look better by comparison)
  • Resources diverted from unmeasured aspects of care (patient experience, preventive care)

Consequence: Some patients who most need intervention are turned away; others receive suboptimal timing of care.

Business: The Wells Fargo Scandal

Goal: Customer satisfaction, sustainable growth, ethical banking

Metric: Number of products per customer (cross-selling ratio)

What happened:

  • Employees given aggressive sales targets (8+ products per customer)
  • Unable to legitimately meet targets, employees created fake accounts
  • 3.5 million fraudulent accounts opened without customer knowledge
  • Customers charged fees for accounts they didn't authorize
  • Employees who resisted or reported were fired

Result: $3 billion in fines, irreparable reputational damage, CEO resignation, criminal charges. Metric maximization destroyed the company's actual goals.

Technology: Social Media Engagement

Goal: Meaningful connection, informative content, user wellbeing

Metric: Engagement (time spent, clicks, shares, comments)

What happened:

  • Algorithms optimize for engagement, not quality or wellbeing
  • Outrage, controversy, and misinformation generate high engagement
  • Platforms amplify divisive content (it's engaging!)
  • "Doomscrolling," addiction, polarization, anxiety
  • Meaningful connection paradoxically declines while "engagement" soars

Observation: Facebook's own research showed Instagram harms teen mental health, but engagement metrics told different story—so platform prioritized metrics over wellbeing.

Government: Soviet Nail Factory

Classic example from economic planning:

Scenario 1: Factory given target measured by total weight of nails produced

Result: Factory manufactures enormous, useless nails (maximizes weight per nail)

Scenario 2: Target changed to number of nails

Result: Factory manufactures tiny, useless nails (maximizes count)

Neither metric captured actual goal: Producing nails of appropriate sizes for construction needs. Optimizing for metric produced absurd outcomes.

Research: Citation Gaming

Goal: Impactful scientific contribution, knowledge advancement

Metric: Citation counts, h-index, impact factor

What happened:

  • "Citation cartels": Groups of researchers cite each other excessively
  • Self-citation inflation
  • Editors pressure authors to cite journal's other papers (to boost journal metrics)
  • "Salami slicing": Breaking research into smallest publishable units (more papers = more citations)
  • Choosing "hot" but incremental topics over important but risky research

Result: Citation counts rise while research quality and originality face pressures.

Policing: Compstat and Crime Statistics

Goal: Public safety, crime reduction

Metric: Reported crime rates

What happened:

  • Pressure to show declining crime rates
  • Officers discourage victims from filing reports
  • Crimes downgraded to lesser offenses (felony → misdemeanor)
  • Manipulation of crime classification data
  • Stops, searches, and arrests increase (measurable actions) while actual crime solving decreases

Multiple police departments caught manipulating crime data to hit Compstat targets while real safety declined or stagnated.


The Psychology of Metric Gaming

Why do people game metrics even when they know it undermines real goals?

Mechanism 1: Rational Response to Incentives

People respond to actual incentives, not stated goals.

If rewards/punishments attach to metrics, the metric becomes the goal in practice—regardless of rhetoric about "real objectives."

Example: Teacher who genuinely cares about student learning but faces job loss if test scores don't improve. Teaching to test becomes survival, not malfeasance.

Mechanism 2: Diffused Responsibility

"I'm just optimizing for what I'm told to optimize for."

When leadership sets metric targets, individuals feel absolved of responsibility for negative consequences. Gaming feels like "doing your job," not undermining organizational mission.

Mechanism 3: Short-Term Pressure

Genuine improvement takes time. Metric manipulation produces immediate results. Under pressure for quick wins, gaming becomes attractive.

Mechanism 4: Competitive Dynamics

If others are gaming metrics, you're punished for not gaming. Honest actors lose to gamers when only metrics matter.

Tragedy of the commons: Individual gaming is rational; collective gaming destroys metric's value for everyone.

Mechanism 5: Metric Fixation

Targets become psychologically real.

Once metrics are entrenched, people genuinely start believing hitting the target = success, even when evidence suggests otherwise. Metric becomes substitute for thinking about real goals.

Mechanism 6: Unintended Blindness

Often people gaming metrics don't consciously realize they're undermining goals. They see metric improvement as goal achievement. Ethical fading: Moral dimensions disappear from view when framed as "meeting targets."


Detecting Goodhart's Law in Action

How do you recognize when metrics are being gamed?

Warning Sign 1: Metrics Improve While Real Performance Declines

Most telltale sign: Numbers go up, but qualitative observation suggests things are getting worse.

Examples:

  • Test scores rise but employers complain graduates lack skills
  • Hospital mortality rates fall but patient complaints increase
  • Employee productivity metrics improve but customer satisfaction drops

Action: Always pair quantitative metrics with qualitative assessment. Talk to frontline workers, customers, actual stakeholders.

Warning Sign 2: Creative Compliance Emerges

People find technically compliant ways to hit targets while violating spirit of goal.

Examples:

  • "Clock-stopping" procedures in hospitals
  • Schools encouraging weak students to skip test day
  • Companies booking revenue in current quarter then reversing it later

Pattern: If compliance feels like exploiting loopholes rather than achieving goals, Goodhart's Law is operating.

Warning Sign 3: Focus Narrows to Measured Aspects

Unmeasured dimensions of performance receive less attention, even when important.

Example: Sales team measured on volume closes many low-value deals; high-value strategic deals ignored (harder, longer sales cycles, less immediately measurable).

Warning Sign 4: Metrics Stabilize at "Just Meeting Target"

When many actors cluster just above target threshold, suggests optimization for target rather than genuine performance improvement.

Statistical signature: If natural distribution, you'd expect smooth distribution. Clustering right above target indicates strategic gaming.

Warning Sign 5: Resistance to Metric Changes

If proposals to change or supplement metrics meet strong resistance, often because current metrics are being gamed—changes would expose gaming or require actual improvement.

Warning Sign 6: Data Integrity Issues

Anomalies, inconsistencies, or irregularities in reported data suggest manipulation.

Examples: Sudden discontinuous jumps, data too good to be true, lack of variance, reporting delays around target deadlines.


Designing Metrics That Resist Gaming

Can metrics be designed to minimize Goodhart's Law effects?

Strategy 1: Use Multiple Complementary Metrics

Single metrics are easily gamed. Multiple metrics covering different dimensions make gaming harder—optimizing one often makes others worse.

Example: Hospital quality

Instead of just mortality rate, measure:

  • Mortality rate (outcome)
  • Readmission rate (outcome)
  • Patient experience scores (process)
  • Complication rates (outcome)
  • Average treatment cost (efficiency)
  • Staff satisfaction (leading indicator)

Gaming one metric (e.g., avoiding risky patients to reduce mortality) would harm others (reduce revenue, worsen staff satisfaction from turning away patients).

Principle: Make gaming harder than genuinely improving.

Strategy 2: Measure Outcomes, Not Outputs

Outputs (things produced) are easier to game than outcomes (actual results achieved).

Output (Gameable) Outcome (Less Gameable)
Number of arrests Crime reduction, public safety
Lines of code written Software quality, user satisfaction
Hours worked Project completion, business impact
Number of leads Revenue, customer lifetime value
Publications Scientific impact, citation by others over time

Outcomes are harder to fake because they depend on external validation, not just internal measurement.

Strategy 3: Include Balancing Metrics

Pair metrics with "counterbalances" that catch common gaming strategies.

Examples:

  • Sales volume + customer retention rate (catches churning customers)
  • Production speed + defect rate (catches quality shortcuts)
  • Cost reduction + employee satisfaction (catches morale-destroying cuts)
  • Growth rate + customer acquisition cost (catches unsustainable growth)

Strategy 4: Use Relative Rather Than Absolute Targets

Absolute targets (must reach X) create binary pressure and gaming.

Relative targets (improve by Y% or rank in top Z) reduce pressure for extreme gaming.

Even better: Avoid fixed targets entirely. Use metrics for information and improvement, not rigid pass/fail thresholds.

Strategy 5: Change Metrics Periodically

Static metrics get gamed over time as people learn loopholes.

Rotating metrics makes gaming harder—actors can't invest in sophisticated gaming strategies if metrics change.

Balance: Don't change so frequently you lose ability to track progress, but don't let metrics become entrenched.

Strategy 6: Include Qualitative Assessment

Quantitative metrics alone are insufficient. Combine with qualitative judgment from people close to real work.

Example: Teacher evaluation

Not just: Test scores (quantitative)

But also: Peer observations, student feedback, principal evaluation, curriculum contributions (qualitative)

Makes gaming harder: Can't fake all dimensions simultaneously.

Strategy 7: Measure Gaming Directly

Include metrics that detect gaming behavior itself.

Examples:

  • Variance in reported data (low variance suspicious—suggests manipulation)
  • Distribution around targets (clustering suspicious)
  • Audit random sample with intensive verification
  • Whistleblower reports or anonymous surveys about gaming

Strategy 8: Reward Improvement, Not Levels

Targeting specific levels (must reach 90%) creates gaming pressure.

Rewarding improvement (reward those who improve most) reduces pressure for absolute gaming and encourages everyone to get better from their baseline.


When Metrics Should and Shouldn't Be Targets

Not all metrics suffer equally from Goodhart's Law. Context matters.

Metrics That Work as Targets

Characteristics:

  • Simple, unambiguous, hard to game
  • Directly under actors' control
  • Low negative externalities from optimization
  • Short feedback loops (consequences of gaming become apparent quickly)

Examples:

  • Safety metrics: "Zero accidents" as target generally works—few ways to game safety that don't improve actual safety
  • Efficiency metrics in constrained systems: "Reduce energy consumption by 10%" with fixed output—limited gaming options
  • Binary outcomes: "Complete project by deadline"—either it's done or not

Metrics That Fail as Targets

Characteristics:

  • Complex, multi-dimensional goals
  • Imperfect proxies for what you care about
  • Long feedback loops (gaming effects delayed)
  • Competing stakeholders or quality dimensions

Examples:

  • Quality metrics: Test scores, patient outcomes, customer satisfaction—always have gaps between metric and true quality
  • Innovation metrics: Patents filed, R&D spending—easy to game, poor proxies for real innovation
  • Culture metrics: Engagement scores—easily manipulated by fear or pressure

The Management vs. Measurement Tension

Peter Drucker popularized: "What gets measured gets managed."

Goodhart's Law adds: "What gets managed gets gamed."

The tension: Metrics are useful for understanding performance but problematic for managing performance through rigid targets.

Resolution strategies:

Use metrics as information, not as rigid targets

  • Monitor metrics to understand patterns
  • Investigate when metrics change
  • Use as conversation starters, not conversation enders

Maintain human judgment

  • Don't let metrics override qualitative assessment
  • Empower people to do right thing even when metrics look bad
  • Reward long-term thinking over metric optimization

Create psychological safety

  • Don't punish people for bringing bad metrics if they're honestly working toward goals
  • Celebrate those who resist gaming even when costly personally
  • Reward whistle blowing about metric manipulation

The Philosophy of Metrics: Maps vs. Territory

Goodhart's Law reveals fundamental philosophical tension.

The Map Is Not the Territory

Alfred Korzybski's principle: Representations (maps) are not the things they represent (territory). Maps are useful simplifications but always incomplete.

Metrics are maps: Simplified representations of complex reality.

Problem: Organizations treat metrics as territory—as if the metric is the thing they care about rather than indicator of it.

Solution: Remember metrics are tools for understanding reality, not substitutes for reality.

The McNamara Fallacy

Named after Robert McNamara, US Secretary of Defense during Vietnam War, who relied heavily on quantitative metrics (body counts, bombs dropped) while ignoring unquantifiable strategic factors.

The fallacy in four steps:

Step 1: Measure what's easily measurable

Step 2: Disregard what can't be measured easily

Step 3: Assume what can't be measured isn't important

Step 4: Conclude what can't be measured doesn't exist

Result: Optimization for measurable metrics while ignoring unmeasurable factors that determine actual success.

Lesson: Most important things are difficult to measure. Don't let measurability determine importance.

Campbell's Law

Sociologist Donald Campbell formulated related principle: "The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor."

Campbell's Law is Goodhart's Law specifically for social systems: measurement for high-stakes decisions corrupts the measurement.


Living with Goodhart's Law: Practical Wisdom

Since Goodhart's Law is inevitable, how should organizations respond?

Principle 1: Accept Imperfection

No perfect metric system exists. Stop seeking it. Design for robustness to gaming, not immunity.

Principle 2: Maintain Metric Humility

Metrics provide information, not truth. Always ask: "What is this metric not capturing? What could make this metric misleading?"

Principle 3: Invest in Judgment

Don't let metrics substitute for thinking. Develop people's ability to reason about goals, context, and appropriate actions even when metrics point elsewhere.

Principle 4: Create Feedback Loops

Monitor for metric-goal divergence. When metrics improve but qualitative assessment suggests problems, investigate aggressively.

Principle 5: Reward Goal Achievement, Not Metric Achievement

Distinguish hitting targets from achieving goals. Reward those who achieve real goals even when metrics don't fully capture it; don't reward pure metric gaming.

Principle 6: Make Gaming Illegitimate

Cultural norm matters. Organizations where gaming is winked at versus condemned have different outcomes. Make clear that gaming is unacceptable, even if "technically" meeting targets.

Principle 7: Design for Resilience

Assume metrics will be gamed. How would gaming manifest? What would it look like? Design metrics and processes anticipating gaming attempts.

Principle 8: Remember Why You Measure

Constantly reconnect metrics to underlying goals. Metrics are means, not ends. When metrics no longer serve goals, change metrics.


Conclusion: Metrics as Tools, Not Gods

British hospitals improved wait-time metrics while harming patients. The metric became the mission, displacing actual healing. This is Goodhart's Law at scale—and it's preventable with wisdom.

The key insights:

1. Goodhart's Law is inevitable—when measures become targets with meaningful consequences, rational actors optimize for measures rather than goals. This isn't moral failure; it's predictable response to incentives.

2. The core problem is proxy-goal gap—metrics are imperfect indicators of what we care about. Optimization exploits gaps between indicator and goal. Perfect metrics don't exist; all metrics have exploitable weaknesses.

3. Gaming is often easier than genuine improvement—manipulating metrics requires less effort, time, and resources than actual improvement. Under pressure, people take path of least resistance. Expect gaming; design for it.

4. Multiple examples show consistent patterns—education, healthcare, business, technology, government, research all suffer identical dynamics. Teaching to tests, avoiding risky patients, fake accounts, engagement optimization, data manipulation—same mechanism, different domains.

5. Psychology drives gaming—rational incentives, diffused responsibility, short-term pressure, competitive dynamics, metric fixation, and unintended blindness combine to make gaming nearly irresistible in target-driven systems.

6. Detection requires vigilance—metrics improving while real performance declines, creative compliance, narrow focus, clustering at targets, resistance to change, data integrity issues all signal Goodhart's Law in action.

7. Mitigation strategies exist but aren't perfect—multiple complementary metrics, focusing on outcomes over outputs, balancing metrics, changing metrics periodically, including qualitative judgment, measuring gaming directly, rewarding improvement over levels. These reduce but don't eliminate gaming.

8. Context determines appropriateness—some metrics work as targets (simple, unambiguous, low externalities); others fail catastrophically (complex, proxy-heavy, long feedback loops). Match metrics to context.

9. The fundamental tension is management vs. measurement—metrics are valuable for understanding; problematic for rigid target-based management. Use metrics as information and conversation starters, not as substitutes for judgment.

10. Philosophical wisdom is essential—remember metrics are maps not territory, avoid McNamara Fallacy of valuing only what's measurable, recognize Campbell's Law that high-stakes measurement corrupts itself, maintain humility about metric limitations.

As mathematician Jerry Muller argues in The Tyranny of Metrics, metric fixation produces incentivized gaming and goal displacement. The solution isn't abandoning measurement—it's measured use of metrics.

Use metrics as tools, not gods. Measure to understand, not to mechanically control. Combine quantitative metrics with qualitative judgment. Remember the ultimate goal isn't hitting targets—it's achieving meaningful outcomes in the real world.

Goodhart's Law will never disappear. What can change is how organizations respond: with wisdom about metric limitations, humility about measurement, investment in judgment, and cultural emphasis on goals over gaming.

The test isn't whether your metrics can be gamed—they can. The test is whether your organization maintains focus on real goals even when metrics point elsewhere. That's where excellence lives: beyond the numbers, in commitment to mission that metrics imperfectly represent but never fully capture.


References

Campbell, D. T. (1979). Assessing the impact of planned social change. Evaluation and Program Planning, 2(1), 67–90. https://doi.org/10.1016/0149-7189(79)90048-X

Chrystal, K. A., & Mizen, P. D. (2003). Goodhart's law: Its origins, meaning and implications for monetary policy. In P. Mizen (Ed.), Central banking, monetary theory and practice: Essays in honour of Charles Goodhart (pp. 221–243). Edward Elgar Publishing.

Ewell, P. T. (1987). Establishing a campus-based assessment program. New Directions for Higher Education, 1987(59), 9–24. https://doi.org/10.1002/he.36919875903

Goodhart, C. A. E. (1984). Monetary theory and practice: The UK experience. Macmillan.

Kerr, S. (1975). On the folly of rewarding A, while hoping for B. Academy of Management Journal, 18(4), 769–783. https://doi.org/10.5465/255378

Korzybski, A. (1933). Science and sanity: An introduction to non-Aristotelian systems and general semantics. Institute of General Semantics.

McNamara, R. S., & VanDeMark, B. (1995). In retrospect: The tragedy and lessons of Vietnam. Times Books.

Muller, J. Z. (2018). The tyranny of metrics. Princeton University Press. https://doi.org/10.1515/9780691191263

Ridgway, V. F. (1956). Dysfunctional consequences of performance measurements. Administrative Science Quarterly, 1(2), 240–247. https://doi.org/10.2307/2390989

Rothstein, R. (2008). Holding accountability to account: How scholarship and experience in other fields inform exploration of performance incentives in education. National Center on Performance Incentives, Vanderbilt University.

Strathern, M. (1997). 'Improving ratings': Audit in the British university system. European Review, 5(3), 305–321. https://doi.org/10.1002/(SICI)1234-981X(199707)5:3<305::AID-EURO184>3.0.CO;2-4

U.S. Senate Committee on Banking, Housing, and Urban Affairs. (2016). An examination of Wells Fargo's unauthorized accounts and the regulatory response. U.S. Government Publishing Office.


Word count: 6,314 words