Why Metrics Often Mislead

Your dashboard shows success: conversion rate up 15%, user engagement climbing, revenue per customer increasing. Every metric green. The board presentation looks excellent. Yet three months later, the company is struggling—churn accelerating, support costs exploding, product quality complaints surging. How did all the metrics look great while the business deteriorated?

Metrics mislead not because they lie (though manipulation happens), but because they tell partial truths easily mistaken for complete pictures. A metric shows one number from one angle at one point in time, and organizations treat it as comprehensive reality. The map becomes the territory. The proxy becomes the goal. The measurement becomes divorced from what it was meant to measure.

Understanding how and why metrics mislead—through gaming, misinterpretation, misalignment, and the systemic pressures that corrupt good metrics once they become targets—is essential for using measurement effectively without being deceived.


The Core Mechanisms of Misleading

Mechanism 1: Goodhart's Law

Statement: "When a measure becomes a target, it ceases to be a good measure."

Why it happens:

  1. Metric chosen because it correlates with goal
  2. Metric becomes target
  3. People optimize for metric
  4. Correlation between metric and goal breaks down
  5. Metric improves while goal performance deteriorates

Example: Soviet Nail Factory

Goal: Produce useful nails

Metric: Weight of nails produced (tons)

Result:

  • Factory produces huge, heavy, useless nails
  • Metric (tonnage) maximized
  • Goal (useful nails) sacrificed

Alternative metric: Number of nails produced

Result:

  • Factory produces tiny, useless nails
  • Metric (count) maximized
  • Goal still sacrificed

Lesson: Optimizing a proxy metric destroys its relationship to the underlying goal.


Example: Wells Fargo Account Openings

Goal: Grow customer relationships

Metric: New accounts opened per employee

Intent: More accounts = deeper customer relationships

Result:

  • Employees opened millions of unauthorized accounts
  • Customers didn't want or use accounts
  • Metric (accounts) soared
  • Goal (real customer relationships) harmed
  • Reputation destroyed, billions in fines

The metric-as-target corrupted the system.


Mechanism 2: Gaming and Manipulation

Gaming: Achieving metric targets without improving (or while degrading) actual performance.

Common gaming tactics:

Tactic Description Example
Cherry-picking Report only favorable data Select best time period, exclude bad segments
Reclassification Change definitions to improve numbers Reclassify customers to hide churn
Threshold gaming Bunch activity to just meet targets Close deals at month-end to hit quota
Sandbagging Delay good results to next period Hold deals if quota already met
Output shifting Hit metric by sacrificing unmeasured quality Fast support resolution, problems unresolved
Measurement manipulation Change how you measure Adjust survey timing, question wording

Example: British Ambulance Response Times

Target: Respond to emergencies within 8 minutes

Gaming tactics:

  • Stop clock when ambulance dispatched, not when it arrives
  • Send fast motorcycle paramedic first (hits 8-minute target), ambulance later
  • Reclassify emergencies to less urgent categories (looser targets)

Result: Response time metrics improved, but actual emergency care quality questionable.


Example: Teacher Test Score Gaming

Metric: Student test scores

Gaming:

  • Narrow curriculum to tested subjects
  • Teach test-taking strategies, not deep learning
  • Exclude low-performers from test day
  • In extreme cases: Change student answers, give answers during test

Result: Scores rise, actual educational outcomes unclear or worse.


Mechanism 3: Partial Visibility

Metrics show what they measure and hide everything else.

The illumination problem:

  • Metric illuminates measured aspect
  • Makes unmeasured aspects darker by comparison
  • "Drunk searching for keys under streetlight" problem

Example: Page Views

What it shows: Traffic volume

What it hides:

  • Traffic quality (bots vs. humans? Engaged vs. bounce?)
  • Traffic intent (ready to buy vs. random visitor?)
  • Traffic outcome (converted? Got value?)

Risk: Optimize for traffic volume, get useless traffic.


Example: Employee Productivity Metrics

What it shows: Hours worked, tasks completed, output produced

What it hides:

  • Quality of work
  • Collaboration and helping others
  • Innovation and creative thinking
  • Institutional knowledge building

Risk: Optimize for measured output, destroy unmeasured but critical factors.


Mechanism 4: Misinterpretation

Metrics are misinterpreted when:

  • Correlation confused with causation
  • Context ignored
  • Statistical significance misunderstood
  • Metric definition unclear

Example: Ice Cream and Drowning

Observation: Ice cream sales correlate with drowning deaths

Misinterpretation: Ice cream causes drowning

Reality: Both caused by warm weather (confounding variable)

Lesson: Correlation ≠ causation


Example: Simpson's Paradox

Phenomenon: Trend appears in subgroups but reverses when combined.

UC Berkeley admissions (1973):

  • Overall: Men admitted at higher rate than women (appears discriminatory)
  • By department: Women admitted at higher rates in most departments
  • Explanation: Women applied to more competitive departments

Lesson: Aggregated metrics can mislead. Segmentation reveals reality.


Example: Survivorship Bias

Phenomenon: Analyzing survivors without considering those who didn't survive.

WWII aircraft armor:

  • Observe bullet holes on returning planes
  • Temptation: Reinforce areas with holes
  • Reality: Reinforce areas without holes (planes hit there didn't return)

Lesson: Metrics based on survivors miss critical information from non-survivors.


Mechanism 5: Metric Decay

Even good metrics degrade over time.

Decay process:

  1. Metric initially correlates with goal
  2. Metric becomes target
  3. People learn to game it
  4. Metric-goal correlation weakens
  5. Eventually: Metric decoupled from goal

Example: Citation Counts in Academia

Original use: Citations as proxy for research impact

Early days: Reasonable correlation

As target:

  • Strategic citation networks
  • Self-citation
  • Citation rings (we cite each other)
  • Incremental publishing (more papers = more citations)

Result: Citation counts inflated, relationship to actual impact weakened.


Types of Misleading Metrics

Type 1: Vanity Metrics

Definition: Metrics that look impressive but don't correlate with meaningful outcomes.

Characteristics:

  • Easy to increase artificially
  • Make you feel good
  • Don't inform decisions
  • Don't predict business success

Examples:

Vanity Metric Why It Misleads Meaningful Alternative
Total page views Doesn't mean engagement or value Conversion rate, engagement rate
Social media followers Many inactive, don't convert Engagement rate, conversion from social
Registered users Most never activate Activated users, retained users
App downloads Most never opened Day-7 retention, activated users
Email list size Many unengaged Open rate, click rate, engaged subscribers

Danger: Celebrate vanity metrics, miss real performance.


Type 2: Proxy Metrics

Definition: Metrics that represent something else, assumed to correlate with goals.

Problem: Proxies degrade when they become targets.


Example: Hospital Readmission Rates

Proxy for: Quality of care

Logic: Better care → fewer readmissions

Gaming:

  • Extend initial hospital stays (no "readmission" if never discharged)
  • Discourage readmissions (treat in ER, don't formally admit)
  • Select healthier patients

Result: Readmission rates improve, actual care quality unclear.


Example: Employee Satisfaction Surveys

Proxy for: Workplace health, retention risk

Logic: Satisfied employees stay, perform better

Gaming:

  • Survey timing (avoid stressful periods)
  • Implicit pressure to rate highly
  • Survey fatigue (only most engaged respond)

Result: Scores rise, underlying issues persist.


Type 3: Ratio Distortion

Problem: Ratios can be improved by manipulating numerator or denominator, sometimes perversely.


Example: Acceptance Rate (College Rankings)

Metric: % of applicants accepted

Desired interpretation: Selectivity indicates quality

Gaming:

  • Encourage unqualified students to apply (increases applications, lowers acceptance rate)
  • Reject more students
  • Accept students "off waitlist" (not counted in initial acceptance rate)

Result: Acceptance rate drops, doesn't mean quality increased.


Example: Conversion Rate

Metric: Conversions / Visitors

Gaming options:

  • Increase numerator: Lower prices, worse targeting (more low-value conversions)
  • Decrease denominator: Reduce traffic quality filters (fewer visitors, but worse overall business)

Result: Conversion rate improves, revenue may decline.


Type 4: Threshold Effects

Problem: Behavior clusters around metric thresholds, creating distortions.


Example: Standardized Test Cutoffs

Metric: % of students scoring above threshold

Gaming:

  • Focus resources on "bubble students" (just below threshold)
  • Neglect high-performers (already above threshold)
  • Neglect low-performers (unlikely to reach threshold)

Result: More students hit threshold, but resource allocation becomes perverse.


Example: Sales Quotas

Threshold: Monthly revenue target

Distortion:

  • End-of-month scramble
  • Discounts to close marginal deals
  • Sandbagging (delay deals if quota met)
  • Revenue pulled forward (future months suffer)

Result: Monthly target hit, but annual performance and customer relationships suffer.


Domain-Specific Misleading Examples

Software Development

Misleading metric: Lines of code written

Problem:

  • Incentivizes verbosity
  • Discourages refactoring (reduces lines)
  • Conflates activity with value

Alternative: Features delivered and adopted, bug rates, code maintainability.


Misleading metric: Story points completed

Problem:

  • Story point inflation
  • Focus on volume, not value
  • Gaming estimation process

Alternative: User value delivered, cycle time, customer satisfaction.


Sales

Misleading metric: Pipeline value

Problem:

  • Easy to inflate by adding low-quality leads
  • Doesn't account for close probability
  • Creates false confidence

Alternative: Weighted pipeline (probability-adjusted), win rate, actual closed revenue.


Misleading metric: Number of calls/meetings

Problem:

  • Activity, not outcome
  • Incentivizes quantity over quality
  • Doesn't predict revenue

Alternative: Conversion rates at each stage, deal velocity, revenue per rep.


Customer Support

Misleading metric: Tickets closed per hour

Problem:

  • Incentivizes quick closure, not resolution
  • Encourages closing without solving problem
  • Degrades customer experience

Alternative: First-contact resolution, customer satisfaction, issue recurrence rate.


Misleading metric: Average handle time

Problem:

  • Rushes complex issues
  • Discourages thoroughness
  • Reduces help quality

Alternative: Resolution rate, customer satisfaction, issue escalation rate.


Healthcare

Misleading metric: Patient satisfaction scores

Problem:

  • Can be gamed (avoid difficult conversations, over-prescribe pain meds)
  • Doesn't correlate strongly with health outcomes
  • May incentivize patient appeasement over best medical practice

Alternative: Health outcomes, evidence-based care adherence, patient safety indicators.


Misleading metric: Length of stay

Problem:

  • Pressure to discharge quickly
  • May compromise recovery
  • Readmission risk increases

Alternative: Readmission rates, recovery outcomes, patient safety, patient readiness for discharge.


Education

Misleading metric: Graduation rates

Problem:

  • Pressure to pass unprepared students
  • Grade inflation
  • Reduced academic standards

Alternative: Actual learning assessments, post-graduation outcomes, job placement rates.


Misleading metric: Test score averages

Problem:

  • Teaching to test
  • Narrow curriculum
  • Doesn't capture deep learning

Alternative: Critical thinking assessments, project quality, long-term learning retention.


The Measurement-Target Problem

Campbell's Law

Statement: "The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor."

Translation: Using a metric as a target corrupts it.


The Lifecycle of Metric Corruption

Stage 1: Valid Proxy

  • Metric correlates with goal
  • Useful for monitoring

Stage 2: Increased Attention

  • Metric reported prominently
  • Discussed in meetings
  • Used for evaluation

Stage 3: Becoming Target

  • Consequences attached to metric
  • Bonuses, promotions, reputation depend on it
  • Metric now high-stakes

Stage 4: Gaming Emerges

  • People discover how to improve metric without improving goal
  • Early gaming subtle
  • Metric-goal correlation weakens

Stage 5: Institutionalized Gaming

  • Gaming becomes normal practice
  • "Everyone does it"
  • Metric fully decoupled from goal

Stage 6: Metric Crisis

  • Obvious that metric no longer represents reality
  • Metric changed or abandoned
  • Cycle begins again with new metric

Example: British Healthcare Waiting Times

Goal: Reduce patient wait times for treatment

Metric: % of patients treated within target time (4 hours in emergency, 18 weeks for elective surgery)

Stage-by-stage corruption:

Stage 1-2: Valid proxy

  • Tracks real wait times
  • Identifies problem areas

Stage 3: High stakes

  • Hospital funding tied to hitting targets
  • Managers' careers depend on metrics

Stage 4-5: Gaming emerges and spreads

  • Ambulances wait outside ER until 4-hour window achievable
  • Patients reclassified to categories with longer targets
  • Elective surgeries scheduled just under 18-week deadline
  • Patients "pause" on waiting list (clock stops, not counted)

Stage 6: Crisis

  • Obvious gaming, public outcry
  • Metric no longer trusted
  • Actual care quality questionable despite hitting targets

Why Organizations Keep Using Misleading Metrics

Reason 1: Metrics Look Objective

Appeal: Numbers feel scientific, unbiased, fair

Reality: Metric choice is subjective, measurement contains biases, interpretation requires judgment

Result: False confidence in flawed metrics


Reason 2: Alternatives Are Harder

Qualitative assessment:

  • Requires judgment
  • Time-intensive
  • Harder to scale
  • Less "defensible" (no single number)

Metrics:

  • Quick, scalable
  • Easy to compare
  • Simple to report

Result: Organizations default to metrics even when misleading, because alternatives require more effort.


Reason 3: Accountability Pressure

Managers need to demonstrate results.

Metrics provide:

  • "Proof" of performance
  • Comparability (vs. goals, peers, past)
  • Defensibility in evaluations

Without metrics: "We improved" sounds vague

With metrics: "We improved X by 23%" sounds concrete

Problem: Even misleading metrics provide cover.


Reason 4: Gaming Is Incremental

Gaming doesn't announce itself.

Evolution:

  • Start: Slight optimization (reasonable)
  • Middle: Aggressive optimization (questionable)
  • End: Full gaming (clear corruption)

At each step, individuals rationalize:

  • "I'm just being efficient"
  • "Everyone does this"
  • "The metric is the goal"

Result: Gaming normalized before anyone notices.


Reason 5: Inertia and Path Dependence

Once established:

  • Historical data accumulated
  • Comparisons over time matter
  • Changing metric feels like admitting past measurement was wrong
  • Political cost to change

Result: Broken metrics persist long after problems obvious.


Detecting Misleading Metrics

Red Flag 1: Metric Improves, Reality Doesn't

Test: Does improving the metric correspond to actual goal achievement?

Example:

  • Customer satisfaction scores rising
  • Yet churn increasing, complaints up
  • Red flag: Metric decoupled from reality

Red Flag 2: Everyone Hits Targets Easily

If targets consistently achieved:

  • Targets too easy, OR
  • Widespread gaming

Healthy: Some hit targets, some miss (indicates stretch goals and honesty)

Suspicious: Everyone always hits targets (indicates gaming or sandbaggin)


Red Flag 3: Unmeasured Aspects Deteriorating

If measured areas improve while unmeasured areas degrade:

  • Tunnel vision
  • Resources shifted from unmeasured to measured

Example:

  • Metric: Feature velocity (features shipped per sprint)
  • Reality: Code quality declining, technical debt rising, bugs increasing

Red Flag 4: Metric Behavior Clusters Around Thresholds

If results bunch just above threshold:

  • Indicates gaming to hit target
  • Natural distributions don't cluster at arbitrary thresholds

Example: Test scores clustering just above passing threshold suggests teaching narrowly to threshold.


Red Flag 5: People Can't Explain How Metric Connects to Goal

Ask: "How does improving this metric advance our actual goals?"

If answers are:

  • Vague
  • Circular ("We measure it because it's important")
  • Inconsistent across people

Red flag: Metric has become ritualized without clear purpose.


Preventing Metric Misleading

Strategy 1: Measure Outcomes, Not Just Proxies

Closer to actual goal = harder to game.

Hierarchy:

  • Worst: Activity metrics (calls made, features shipped)
  • Better: Output metrics (deals closed, features adopted)
  • Best: Outcome metrics (revenue, customer retention, mission impact)

Strategy 2: Use Multiple Complementary Metrics

Single metrics get gamed. Balanced scorecards resist gaming.

Example: Balanced customer support metrics

  • Speed (response time)
  • Quality (customer satisfaction)
  • Effectiveness (first-contact resolution)
  • Efficiency (cost per ticket)

Can't optimize all simultaneously without real improvement.


Strategy 3: Include Qualitative Assessment

Don't rely on metrics alone.

Balanced approach:

  • Metrics for scale, trends, patterns
  • Qual (conversations, observations, stories) for context, gaming detection, meaning

Strategy 4: Separate Measurement from Evaluation

**When metrics used for:

  • Learning: Honest reporting, problem-solving
  • Punishment: Gaming, hiding problems

Approach:

  • Measure for learning and improvement (formative)
  • Supplement with periodic evaluation (summative) that's harder to game

Strategy 5: Rotate Metrics

If metric becomes corrupted:

  • Change or retire it
  • Introduce new metric
  • Forces people to refocus on goal, not metric

Strategy 6: Audit for Gaming

Regularly check:

  • Are there suspicious patterns? (clustering at thresholds, sudden changes)
  • Do metric improvements correspond to real outcomes?
  • What are people doing to hit metrics?

If gaming detected, address root causes (incentives, consequences), not just symptoms.


Conclusion: Metrics as Tools, Not Truth

Metrics are not reality. They are models of reality—simplified, partial, distorted.

The map is not the territory.

Metrics mislead when:

  • They become targets (Goodhart's Law)
  • People game them
  • They're misinterpreted
  • They show partial picture (hide important factors)
  • They decay over time as gaming evolves

Despite risks, metrics are useful:

  • Enable scale (can't qualitatively assess millions)
  • Identify patterns
  • Track trends
  • Focus attention

The path forward:

  • Use metrics (don't abandon measurement)
  • Don't trust metrics blindly (supplement with qualitative understanding)
  • Measure outcomes (not just proxies)
  • Use multiple metrics (resist gaming)
  • Monitor for corruption (metrics degrade over time)
  • Remember the goal (metric is tool, not objective)

Good measurement requires:

  • Humility (metrics are flawed tools)
  • Vigilance (watch for gaming and distortion)
  • Balance (metrics + qualitative understanding)
  • Purpose (remember why you're measuring)

"What gets measured gets managed"—sometimes in ways that help, often in ways that hurt.

Measure thoughtfully. Interpret carefully. Act wisely.


References

  1. Goodhart, C. (1975). "Problems of Monetary Management: The U.K. Experience." Papers in Monetary Economics (Reserve Bank of Australia).

  2. Campbell, D. T. (1979). "Assessing the Impact of Planned Social Change." Evaluation and Program Planning, 2(1), 67–90.

  3. Muller, J. Z. (2018). The Tyranny of Metrics. Princeton University Press.

  4. Kerr, S. (1975). "On the Folly of Rewarding A, While Hoping for B." Academy of Management Journal, 18(4), 769–783.

  5. Ridgway, V. F. (1956). "Dysfunctional Consequences of Performance Measurements." Administrative Science Quarterly, 1(2), 240–247.

  6. Austin, R. D. (1996). Measuring and Managing Performance in Organizations. Dorset House.

  7. Strathern, M. (1997). "'Improving Ratings': Audit in the British University System." European Review, 5(3), 305–321.

  8. Power, M. (1997). The Audit Society: Rituals of Verification. Oxford University Press.

  9. Levitt, S. D., & Dubner, S. J. (2005). Freakonomics: A Rogue Economist Explores the Hidden Side of Everything. William Morrow.

  10. Bevan, G., & Hood, C. (2006). "What's Measured Is What Matters: Targets and Gaming in the English Public Health Care System." Public Administration, 84(3), 517–538.

  11. de Bruijn, H. (2007). Managing Performance in the Public Sector (2nd ed.). Routledge.

  12. Hood, C. (2006). "Gaming in Targetworld: The Targets Approach to Managing British Public Services." Public Administration Review, 66(4), 515–521.

  13. Kahneman, D., & Tversky, A. (1973). "On the Psychology of Prediction." Psychological Review, 80(4), 237–251.

  14. Croll, A., & Yoskovitz, B. (2013). Lean Analytics: Use Data to Build a Better Startup Faster. O'Reilly Media.

  15. Seddon, J. (2008). Systems Thinking in the Public Sector: The Failure of the Reform Regime...and a Manifesto for a Better Way. Triarchy Press.


About This Series: This article is part of a larger exploration of measurement, metrics, and evaluation. For related concepts, see [Goodhart's Law Breaks Metrics], [Why Measurement Changes Behavior], [Vanity Metrics vs Meaningful Metrics], and [Designing Useful Measurement Systems].