Why Metrics Often Mislead
Your dashboard shows success: conversion rate up 15%, user engagement climbing, revenue per customer increasing. Every metric green. The board presentation looks excellent. Yet three months later, the company is struggling—churn accelerating, support costs exploding, product quality complaints surging. How did all the metrics look great while the business deteriorated?
Metrics mislead not because they lie (though manipulation happens), but because they tell partial truths easily mistaken for complete pictures. A metric shows one number from one angle at one point in time, and organizations treat it as comprehensive reality. The map becomes the territory. The proxy becomes the goal. The measurement becomes divorced from what it was meant to measure.
Understanding how and why metrics mislead—through gaming, misinterpretation, misalignment, and the systemic pressures that corrupt good metrics once they become targets—is essential for using measurement effectively without being deceived.
The Core Mechanisms of Misleading
Mechanism 1: Goodhart's Law
Statement: "When a measure becomes a target, it ceases to be a good measure."
Why it happens:
- Metric chosen because it correlates with goal
- Metric becomes target
- People optimize for metric
- Correlation between metric and goal breaks down
- Metric improves while goal performance deteriorates
Example: Soviet Nail Factory
Goal: Produce useful nails
Metric: Weight of nails produced (tons)
Result:
- Factory produces huge, heavy, useless nails
- Metric (tonnage) maximized
- Goal (useful nails) sacrificed
Alternative metric: Number of nails produced
Result:
- Factory produces tiny, useless nails
- Metric (count) maximized
- Goal still sacrificed
Lesson: Optimizing a proxy metric destroys its relationship to the underlying goal.
Example: Wells Fargo Account Openings
Goal: Grow customer relationships
Metric: New accounts opened per employee
Intent: More accounts = deeper customer relationships
Result:
- Employees opened millions of unauthorized accounts
- Customers didn't want or use accounts
- Metric (accounts) soared
- Goal (real customer relationships) harmed
- Reputation destroyed, billions in fines
The metric-as-target corrupted the system.
Mechanism 2: Gaming and Manipulation
Gaming: Achieving metric targets without improving (or while degrading) actual performance.
Common gaming tactics:
| Tactic | Description | Example |
|---|---|---|
| Cherry-picking | Report only favorable data | Select best time period, exclude bad segments |
| Reclassification | Change definitions to improve numbers | Reclassify customers to hide churn |
| Threshold gaming | Bunch activity to just meet targets | Close deals at month-end to hit quota |
| Sandbagging | Delay good results to next period | Hold deals if quota already met |
| Output shifting | Hit metric by sacrificing unmeasured quality | Fast support resolution, problems unresolved |
| Measurement manipulation | Change how you measure | Adjust survey timing, question wording |
Example: British Ambulance Response Times
Target: Respond to emergencies within 8 minutes
Gaming tactics:
- Stop clock when ambulance dispatched, not when it arrives
- Send fast motorcycle paramedic first (hits 8-minute target), ambulance later
- Reclassify emergencies to less urgent categories (looser targets)
Result: Response time metrics improved, but actual emergency care quality questionable.
Example: Teacher Test Score Gaming
Metric: Student test scores
Gaming:
- Narrow curriculum to tested subjects
- Teach test-taking strategies, not deep learning
- Exclude low-performers from test day
- In extreme cases: Change student answers, give answers during test
Result: Scores rise, actual educational outcomes unclear or worse.
Mechanism 3: Partial Visibility
Metrics show what they measure and hide everything else.
The illumination problem:
- Metric illuminates measured aspect
- Makes unmeasured aspects darker by comparison
- "Drunk searching for keys under streetlight" problem
Example: Page Views
What it shows: Traffic volume
What it hides:
- Traffic quality (bots vs. humans? Engaged vs. bounce?)
- Traffic intent (ready to buy vs. random visitor?)
- Traffic outcome (converted? Got value?)
Risk: Optimize for traffic volume, get useless traffic.
Example: Employee Productivity Metrics
What it shows: Hours worked, tasks completed, output produced
What it hides:
- Quality of work
- Collaboration and helping others
- Innovation and creative thinking
- Institutional knowledge building
Risk: Optimize for measured output, destroy unmeasured but critical factors.
Mechanism 4: Misinterpretation
Metrics are misinterpreted when:
- Correlation confused with causation
- Context ignored
- Statistical significance misunderstood
- Metric definition unclear
Example: Ice Cream and Drowning
Observation: Ice cream sales correlate with drowning deaths
Misinterpretation: Ice cream causes drowning
Reality: Both caused by warm weather (confounding variable)
Lesson: Correlation ≠ causation
Example: Simpson's Paradox
Phenomenon: Trend appears in subgroups but reverses when combined.
UC Berkeley admissions (1973):
- Overall: Men admitted at higher rate than women (appears discriminatory)
- By department: Women admitted at higher rates in most departments
- Explanation: Women applied to more competitive departments
Lesson: Aggregated metrics can mislead. Segmentation reveals reality.
Example: Survivorship Bias
Phenomenon: Analyzing survivors without considering those who didn't survive.
WWII aircraft armor:
- Observe bullet holes on returning planes
- Temptation: Reinforce areas with holes
- Reality: Reinforce areas without holes (planes hit there didn't return)
Lesson: Metrics based on survivors miss critical information from non-survivors.
Mechanism 5: Metric Decay
Even good metrics degrade over time.
Decay process:
- Metric initially correlates with goal
- Metric becomes target
- People learn to game it
- Metric-goal correlation weakens
- Eventually: Metric decoupled from goal
Example: Citation Counts in Academia
Original use: Citations as proxy for research impact
Early days: Reasonable correlation
As target:
- Strategic citation networks
- Self-citation
- Citation rings (we cite each other)
- Incremental publishing (more papers = more citations)
Result: Citation counts inflated, relationship to actual impact weakened.
Types of Misleading Metrics
Type 1: Vanity Metrics
Definition: Metrics that look impressive but don't correlate with meaningful outcomes.
Characteristics:
- Easy to increase artificially
- Make you feel good
- Don't inform decisions
- Don't predict business success
Examples:
| Vanity Metric | Why It Misleads | Meaningful Alternative |
|---|---|---|
| Total page views | Doesn't mean engagement or value | Conversion rate, engagement rate |
| Social media followers | Many inactive, don't convert | Engagement rate, conversion from social |
| Registered users | Most never activate | Activated users, retained users |
| App downloads | Most never opened | Day-7 retention, activated users |
| Email list size | Many unengaged | Open rate, click rate, engaged subscribers |
Danger: Celebrate vanity metrics, miss real performance.
Type 2: Proxy Metrics
Definition: Metrics that represent something else, assumed to correlate with goals.
Problem: Proxies degrade when they become targets.
Example: Hospital Readmission Rates
Proxy for: Quality of care
Logic: Better care → fewer readmissions
Gaming:
- Extend initial hospital stays (no "readmission" if never discharged)
- Discourage readmissions (treat in ER, don't formally admit)
- Select healthier patients
Result: Readmission rates improve, actual care quality unclear.
Example: Employee Satisfaction Surveys
Proxy for: Workplace health, retention risk
Logic: Satisfied employees stay, perform better
Gaming:
- Survey timing (avoid stressful periods)
- Implicit pressure to rate highly
- Survey fatigue (only most engaged respond)
Result: Scores rise, underlying issues persist.
Type 3: Ratio Distortion
Problem: Ratios can be improved by manipulating numerator or denominator, sometimes perversely.
Example: Acceptance Rate (College Rankings)
Metric: % of applicants accepted
Desired interpretation: Selectivity indicates quality
Gaming:
- Encourage unqualified students to apply (increases applications, lowers acceptance rate)
- Reject more students
- Accept students "off waitlist" (not counted in initial acceptance rate)
Result: Acceptance rate drops, doesn't mean quality increased.
Example: Conversion Rate
Metric: Conversions / Visitors
Gaming options:
- Increase numerator: Lower prices, worse targeting (more low-value conversions)
- Decrease denominator: Reduce traffic quality filters (fewer visitors, but worse overall business)
Result: Conversion rate improves, revenue may decline.
Type 4: Threshold Effects
Problem: Behavior clusters around metric thresholds, creating distortions.
Example: Standardized Test Cutoffs
Metric: % of students scoring above threshold
Gaming:
- Focus resources on "bubble students" (just below threshold)
- Neglect high-performers (already above threshold)
- Neglect low-performers (unlikely to reach threshold)
Result: More students hit threshold, but resource allocation becomes perverse.
Example: Sales Quotas
Threshold: Monthly revenue target
Distortion:
- End-of-month scramble
- Discounts to close marginal deals
- Sandbagging (delay deals if quota met)
- Revenue pulled forward (future months suffer)
Result: Monthly target hit, but annual performance and customer relationships suffer.
Domain-Specific Misleading Examples
Software Development
Misleading metric: Lines of code written
Problem:
- Incentivizes verbosity
- Discourages refactoring (reduces lines)
- Conflates activity with value
Alternative: Features delivered and adopted, bug rates, code maintainability.
Misleading metric: Story points completed
Problem:
- Story point inflation
- Focus on volume, not value
- Gaming estimation process
Alternative: User value delivered, cycle time, customer satisfaction.
Sales
Misleading metric: Pipeline value
Problem:
- Easy to inflate by adding low-quality leads
- Doesn't account for close probability
- Creates false confidence
Alternative: Weighted pipeline (probability-adjusted), win rate, actual closed revenue.
Misleading metric: Number of calls/meetings
Problem:
- Activity, not outcome
- Incentivizes quantity over quality
- Doesn't predict revenue
Alternative: Conversion rates at each stage, deal velocity, revenue per rep.
Customer Support
Misleading metric: Tickets closed per hour
Problem:
- Incentivizes quick closure, not resolution
- Encourages closing without solving problem
- Degrades customer experience
Alternative: First-contact resolution, customer satisfaction, issue recurrence rate.
Misleading metric: Average handle time
Problem:
- Rushes complex issues
- Discourages thoroughness
- Reduces help quality
Alternative: Resolution rate, customer satisfaction, issue escalation rate.
Healthcare
Misleading metric: Patient satisfaction scores
Problem:
- Can be gamed (avoid difficult conversations, over-prescribe pain meds)
- Doesn't correlate strongly with health outcomes
- May incentivize patient appeasement over best medical practice
Alternative: Health outcomes, evidence-based care adherence, patient safety indicators.
Misleading metric: Length of stay
Problem:
- Pressure to discharge quickly
- May compromise recovery
- Readmission risk increases
Alternative: Readmission rates, recovery outcomes, patient safety, patient readiness for discharge.
Education
Misleading metric: Graduation rates
Problem:
- Pressure to pass unprepared students
- Grade inflation
- Reduced academic standards
Alternative: Actual learning assessments, post-graduation outcomes, job placement rates.
Misleading metric: Test score averages
Problem:
- Teaching to test
- Narrow curriculum
- Doesn't capture deep learning
Alternative: Critical thinking assessments, project quality, long-term learning retention.
The Measurement-Target Problem
Campbell's Law
Statement: "The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor."
Translation: Using a metric as a target corrupts it.
The Lifecycle of Metric Corruption
Stage 1: Valid Proxy
- Metric correlates with goal
- Useful for monitoring
Stage 2: Increased Attention
- Metric reported prominently
- Discussed in meetings
- Used for evaluation
Stage 3: Becoming Target
- Consequences attached to metric
- Bonuses, promotions, reputation depend on it
- Metric now high-stakes
Stage 4: Gaming Emerges
- People discover how to improve metric without improving goal
- Early gaming subtle
- Metric-goal correlation weakens
Stage 5: Institutionalized Gaming
- Gaming becomes normal practice
- "Everyone does it"
- Metric fully decoupled from goal
Stage 6: Metric Crisis
- Obvious that metric no longer represents reality
- Metric changed or abandoned
- Cycle begins again with new metric
Example: British Healthcare Waiting Times
Goal: Reduce patient wait times for treatment
Metric: % of patients treated within target time (4 hours in emergency, 18 weeks for elective surgery)
Stage-by-stage corruption:
Stage 1-2: Valid proxy
- Tracks real wait times
- Identifies problem areas
Stage 3: High stakes
- Hospital funding tied to hitting targets
- Managers' careers depend on metrics
Stage 4-5: Gaming emerges and spreads
- Ambulances wait outside ER until 4-hour window achievable
- Patients reclassified to categories with longer targets
- Elective surgeries scheduled just under 18-week deadline
- Patients "pause" on waiting list (clock stops, not counted)
Stage 6: Crisis
- Obvious gaming, public outcry
- Metric no longer trusted
- Actual care quality questionable despite hitting targets
Why Organizations Keep Using Misleading Metrics
Reason 1: Metrics Look Objective
Appeal: Numbers feel scientific, unbiased, fair
Reality: Metric choice is subjective, measurement contains biases, interpretation requires judgment
Result: False confidence in flawed metrics
Reason 2: Alternatives Are Harder
Qualitative assessment:
- Requires judgment
- Time-intensive
- Harder to scale
- Less "defensible" (no single number)
Metrics:
- Quick, scalable
- Easy to compare
- Simple to report
Result: Organizations default to metrics even when misleading, because alternatives require more effort.
Reason 3: Accountability Pressure
Managers need to demonstrate results.
Metrics provide:
- "Proof" of performance
- Comparability (vs. goals, peers, past)
- Defensibility in evaluations
Without metrics: "We improved" sounds vague
With metrics: "We improved X by 23%" sounds concrete
Problem: Even misleading metrics provide cover.
Reason 4: Gaming Is Incremental
Gaming doesn't announce itself.
Evolution:
- Start: Slight optimization (reasonable)
- Middle: Aggressive optimization (questionable)
- End: Full gaming (clear corruption)
At each step, individuals rationalize:
- "I'm just being efficient"
- "Everyone does this"
- "The metric is the goal"
Result: Gaming normalized before anyone notices.
Reason 5: Inertia and Path Dependence
Once established:
- Historical data accumulated
- Comparisons over time matter
- Changing metric feels like admitting past measurement was wrong
- Political cost to change
Result: Broken metrics persist long after problems obvious.
Detecting Misleading Metrics
Red Flag 1: Metric Improves, Reality Doesn't
Test: Does improving the metric correspond to actual goal achievement?
Example:
- Customer satisfaction scores rising
- Yet churn increasing, complaints up
- Red flag: Metric decoupled from reality
Red Flag 2: Everyone Hits Targets Easily
If targets consistently achieved:
- Targets too easy, OR
- Widespread gaming
Healthy: Some hit targets, some miss (indicates stretch goals and honesty)
Suspicious: Everyone always hits targets (indicates gaming or sandbaggin)
Red Flag 3: Unmeasured Aspects Deteriorating
If measured areas improve while unmeasured areas degrade:
- Tunnel vision
- Resources shifted from unmeasured to measured
Example:
- Metric: Feature velocity (features shipped per sprint)
- Reality: Code quality declining, technical debt rising, bugs increasing
Red Flag 4: Metric Behavior Clusters Around Thresholds
If results bunch just above threshold:
- Indicates gaming to hit target
- Natural distributions don't cluster at arbitrary thresholds
Example: Test scores clustering just above passing threshold suggests teaching narrowly to threshold.
Red Flag 5: People Can't Explain How Metric Connects to Goal
Ask: "How does improving this metric advance our actual goals?"
If answers are:
- Vague
- Circular ("We measure it because it's important")
- Inconsistent across people
Red flag: Metric has become ritualized without clear purpose.
Preventing Metric Misleading
Strategy 1: Measure Outcomes, Not Just Proxies
Closer to actual goal = harder to game.
Hierarchy:
- Worst: Activity metrics (calls made, features shipped)
- Better: Output metrics (deals closed, features adopted)
- Best: Outcome metrics (revenue, customer retention, mission impact)
Strategy 2: Use Multiple Complementary Metrics
Single metrics get gamed. Balanced scorecards resist gaming.
Example: Balanced customer support metrics
- Speed (response time)
- Quality (customer satisfaction)
- Effectiveness (first-contact resolution)
- Efficiency (cost per ticket)
Can't optimize all simultaneously without real improvement.
Strategy 3: Include Qualitative Assessment
Don't rely on metrics alone.
Balanced approach:
- Metrics for scale, trends, patterns
- Qual (conversations, observations, stories) for context, gaming detection, meaning
Strategy 4: Separate Measurement from Evaluation
**When metrics used for:
- Learning: Honest reporting, problem-solving
- Punishment: Gaming, hiding problems
Approach:
- Measure for learning and improvement (formative)
- Supplement with periodic evaluation (summative) that's harder to game
Strategy 5: Rotate Metrics
If metric becomes corrupted:
- Change or retire it
- Introduce new metric
- Forces people to refocus on goal, not metric
Strategy 6: Audit for Gaming
Regularly check:
- Are there suspicious patterns? (clustering at thresholds, sudden changes)
- Do metric improvements correspond to real outcomes?
- What are people doing to hit metrics?
If gaming detected, address root causes (incentives, consequences), not just symptoms.
Conclusion: Metrics as Tools, Not Truth
Metrics are not reality. They are models of reality—simplified, partial, distorted.
The map is not the territory.
Metrics mislead when:
- They become targets (Goodhart's Law)
- People game them
- They're misinterpreted
- They show partial picture (hide important factors)
- They decay over time as gaming evolves
Despite risks, metrics are useful:
- Enable scale (can't qualitatively assess millions)
- Identify patterns
- Track trends
- Focus attention
The path forward:
- Use metrics (don't abandon measurement)
- Don't trust metrics blindly (supplement with qualitative understanding)
- Measure outcomes (not just proxies)
- Use multiple metrics (resist gaming)
- Monitor for corruption (metrics degrade over time)
- Remember the goal (metric is tool, not objective)
Good measurement requires:
- Humility (metrics are flawed tools)
- Vigilance (watch for gaming and distortion)
- Balance (metrics + qualitative understanding)
- Purpose (remember why you're measuring)
"What gets measured gets managed"—sometimes in ways that help, often in ways that hurt.
Measure thoughtfully. Interpret carefully. Act wisely.
References
Goodhart, C. (1975). "Problems of Monetary Management: The U.K. Experience." Papers in Monetary Economics (Reserve Bank of Australia).
Campbell, D. T. (1979). "Assessing the Impact of Planned Social Change." Evaluation and Program Planning, 2(1), 67–90.
Muller, J. Z. (2018). The Tyranny of Metrics. Princeton University Press.
Kerr, S. (1975). "On the Folly of Rewarding A, While Hoping for B." Academy of Management Journal, 18(4), 769–783.
Ridgway, V. F. (1956). "Dysfunctional Consequences of Performance Measurements." Administrative Science Quarterly, 1(2), 240–247.
Austin, R. D. (1996). Measuring and Managing Performance in Organizations. Dorset House.
Strathern, M. (1997). "'Improving Ratings': Audit in the British University System." European Review, 5(3), 305–321.
Power, M. (1997). The Audit Society: Rituals of Verification. Oxford University Press.
Levitt, S. D., & Dubner, S. J. (2005). Freakonomics: A Rogue Economist Explores the Hidden Side of Everything. William Morrow.
Bevan, G., & Hood, C. (2006). "What's Measured Is What Matters: Targets and Gaming in the English Public Health Care System." Public Administration, 84(3), 517–538.
de Bruijn, H. (2007). Managing Performance in the Public Sector (2nd ed.). Routledge.
Hood, C. (2006). "Gaming in Targetworld: The Targets Approach to Managing British Public Services." Public Administration Review, 66(4), 515–521.
Kahneman, D., & Tversky, A. (1973). "On the Psychology of Prediction." Psychological Review, 80(4), 237–251.
Croll, A., & Yoskovitz, B. (2013). Lean Analytics: Use Data to Build a Better Startup Faster. O'Reilly Media.
Seddon, J. (2008). Systems Thinking in the Public Sector: The Failure of the Reform Regime...and a Manifesto for a Better Way. Triarchy Press.
About This Series: This article is part of a larger exploration of measurement, metrics, and evaluation. For related concepts, see [Goodhart's Law Breaks Metrics], [Why Measurement Changes Behavior], [Vanity Metrics vs Meaningful Metrics], and [Designing Useful Measurement Systems].