Why Metrics Vocabulary Matters

"Not everything that can be counted counts, and not everything that counts can be counted." — William Bruce Cameron (often attributed to Einstein)

A startup CEO proudly reports: "We have 100,000 users!" (Vanity metric—says nothing about revenue, engagement, or retention)

A product manager tracks: "Page views are up 40%!" (Misleading—could be from bots, confusion, or users repeatedly failing to find what they need)

A marketer celebrates: "We're measuring everything!" (No—you're collecting everything but measuring what matters)

Imprecise metrics and measurement language leads to measuring the wrong things, optimizing for the wrong goals, and making decision making based on meaningless numbers.

Measurement and metrics terminology comes from statistics, business analytics, operations research, and management science. Each term has specific meaning that distinguishes useful metrics from useless ones, predictive indicators from historical records, and actionable data from feel-good numbers.

Peter Drucker (never actually said): "What gets measured gets managed." Real quote (Muller, 2018): "What gets measured gets gamed"—which is precisely why precise metrics terminology matters.

Understanding these distinctions helps you:

  • Design metrics that actually drive behavior
  • Distinguish signal from noise
  • Avoid Goodhart's Law (optimizing metrics instead of goals)
  • Communicate clearly about performance

This is the vocabulary that separates data-driven decisions from data-decorated guesses.

Core Metrics Concepts

Metric vs. KPI

Metric:

  • Definition: Any quantifiable measure of performance, behavior, or outcomes
  • Scope: Broad—anything you can count, rate, or measure
  • Quantity: Organizations track hundreds or thousands
  • Purpose: Monitor, understand, diagnose

Examples: Page views, support tickets, response time, lines of code, coffee consumption

KPIs explained (Key Performance Indicator):

  • Definition: Specific metrics most critical to achieving strategic goals
  • Scope: Narrow—the vital few that matter most
  • Quantity: Organizations focus on 3-7 primary KPIs per team/level
  • Purpose: Drive strategy, make high-stakes decisions, track progress toward goals

Examples: Monthly Recurring Revenue (MRR), Net Promoter Score (NPS), Customer Acquisition Cost (CAC), Gross Margin

The Relationship

All KPIs are metrics, but not all metrics are KPIs.

Visual hierarchy:

All Measurements
  └─ Metrics (quantifiable, tracked regularly)
      └─ KPIs (strategic, tied to goals)

Selection criteria for KPIs:

  • Strategic alignment: Directly reflects progress toward goal
  • Actionability: You can influence it through decisions
  • Measurability: Can be accurately quantified
  • Clarity: Everyone understands what it means
  • Timeliness: Updates frequently enough to inform decisions

Common mistake: Calling everything a "KPI" dilutes focus. If you have 50 KPIs, you have 0 KPIs—you have metrics.

Example - SaaS company:

Metrics tracked (dozens):

  • Website visitors, trial signups, activation rate, feature usage, support tickets, page load time, server uptime, team velocity, bug count, NPS, referrals...

KPIs (3-5 primary):

  • Monthly Recurring Revenue (MRR)
  • Customer Churn Rate
  • Customer Acquisition Cost (CAC)
  • Lifetime Value (LTV)

Application: Track many metrics, but focus leadership attention on the few KPIs that determine success or failure.

"What you measure is what you get." — Robert Kaplan

Leading vs. Lagging Indicators

Lagging Indicators

Definition: Metrics that measure past results—outcomes that have already occurred.

Characteristics:

  • Historical: Tell you what happened
  • Easy to measure: Usually clear, objective
  • Hard to influence: Past can't be changed
  • Definitive: Actual outcomes, not predictions

Examples:

  • Revenue (result of past sales)
  • Profit (result of past operations)
  • Customer churn (already left)
  • Graduation rates (already completed)
  • Accidents (already occurred)

Value: Definitive assessment of whether you succeeded.

Limitation: By the time you know, it's too late to change.

Leading Indicators

Definition: Metrics that predict future performance—early signals of likely outcomes.

Characteristics:

  • Predictive: Indicate what will happen
  • Harder to measure: Often require inference
  • Actionable: You can still influence outcome
  • Imperfect: Probabilistic, not certain

Examples:

  • Sales pipeline (predicts future revenue)
  • Employee engagement (predicts retention)
  • Product trial rate (predicts conversions)
  • Student attendance (predicts graduation)
  • Near-miss incidents (predict future accidents)

Value: Early warning system—alerts you to problems before they materialize.

Limitation: Correlation isn't perfect; leading indicators can be wrong.

Why Both Matter

Lagging indicators tell you if you achieved goals (accountability, scorekeeping).

Leading indicators tell you if you're on track to achieve goals (management, course correction).

Aspect Lagging Indicator Leading Indicator
Timing Past results Future predictions
Certainty Definitive Probabilistic
Actionability Low (too late) High (can intervene)
Ease Easy to measure Harder to measure
Example Revenue earned Sales calls made

Best practice: Pair leading and lagging indicators.

Example - Weight loss:

  • Lagging: Weight on scale (definitive but delayed)
  • Leading: Daily calorie intake, exercise minutes (predict future weight)

Application: Don't just track outcomes (lagging). Identify and track the activities/behaviors (leading) that drive those outcomes.

"In God we trust; all others must bring data." — W. Edwards Deming

Vanity vs. Actionable Metrics

Vanity Metrics

Definition (Eric Ries, Lean Startup): Vanity metrics look impressive but don't correlate with business success or inform decisions.

Characteristics:

  • Make you feel good (big numbers)
  • Don't predict revenue or retention
  • Don't suggest specific actions
  • Easy to manipulate or game
  • Lack context (absolute numbers without rates)

Common vanity metrics:

Metric Why It's Vanity Better Alternative
Total registered users Includes inactive, churned users Monthly Active Users (MAU)
Total page views Could be confusion, bots, same user Unique engaged users
Total downloads Says nothing about usage Daily Active Users (DAU)
Social media followers Many are bots, inactive Engagement rate
Total revenue Ignores costs, growth rate Net profit, MRR growth rate

Why they're dangerous:

  • Create false sense of progress
  • Distract from metrics that matter
  • Enable self-deception
  • Waste resources optimizing wrong things

Example:

  • Vanity: "We have 1 million app downloads!"
  • Reality: 95% used it once and never returned. Company is dying.

Actionable Metrics

Definition: Metrics that inform specific decisions and suggest clear actions.

Characteristics:

  • Tie to business outcomes
  • Segment and context-rich
  • Lead to specific interventions
  • Hard to game without real improvement

Transformation - Vanity to Actionable:

Vanity Metric Actionable Version What It Tells You
Total users Weekly Active Users / Total Signups Activation rate—how many become engaged
Page views Time to task completion Whether users find what they need efficiently
Revenue Revenue per user segment Which segments are profitable
Followers Engagement rate by content type What content resonates

Test for actionability: Ask "If this metric changes, what do I do differently?"

  • If answer is clear → Actionable
  • If answer is "feel good" or "nothing" → Vanity

Application: Audit your dashboard. For each metric, ask: "Does this inform a decision or just make me feel good?" Remove vanity metrics ruthlessly.

Proxy Metrics and Goodhart's Law

Proxy Metrics

Definition: Measurable substitutes that approximate something hard or impossible to measure directly.

Why needed: Some important outcomes are:

  • Too delayed (long-term health)
  • Too expensive (full user satisfaction survey)
  • Too abstract (happiness, understanding)
  • Too rare (prevent catastrophic failures)

Common proxies:

True Goal (Hard to Measure) Proxy Metric (Easier to Measure)
Long-term health Blood pressure, cholesterol, BMI
Customer satisfaction Net Promoter Score (NPS)
Learning Test scores
Software quality Bug count, test coverage
Economic health GDP, unemployment rate
Employee happiness Retention rate, engagement surveys

The problem with proxies: They're imperfect. Optimizing the proxy doesn't guarantee optimizing the goal.

Example - Education:

  • Goal: Deep understanding, critical thinking, creativity
  • Proxy: Standardized test scores
  • Result: Schools teach to the test (proxy improves, but goal may not)

Goodhart's Law

"When a measure becomes a target, it ceases to be a good measure." — Goodhart's Law (Charles Goodhart)

Definition (Charles Goodhart, 1975): "When a measure becomes a target, it ceases to be a good measure."

Expanded (Marilyn Strathern): "When a measure becomes a target, people optimize for the measure rather than the underlying goal."

Why it happens:

  1. Metric is imperfect proxy for goal
  2. Metric becomes target (measured, rewarded, tracked)
  3. People game the metric (consciously or unconsciously)
  4. Metric-goal correlation breaks (metric improves without real progress)

Classic examples:

Domain Target Metric Gaming Behavior Actual Outcome
Soviet factories Nail production (by weight) Made fewer, heavier nails Unusable products
Cobra bounty (India) Dead cobras turned in Breed cobras for bounty More cobras after program ended
Hospital wait times % seen within 4 hours Ambulances circle outside until patient can be seen quickly Gaming the metric, not reducing waits
Software engineering Lines of code written Write verbose, redundant code More code, worse quality
Academia Publication count Publish minimal publishable units, quantity over quality Citation inflation, replication crisis

Modern examples:

Social media metrics:

  • Target: Engagement (likes, shares, comments)
  • Gaming: Outrage content, clickbait, sensationalism
  • Result: Engagement up, discourse quality down

Wells Fargo (2016):

  • Target: Accounts opened per employee
  • Gaming: Opened fake accounts without customer knowledge
  • Result: Scandal, fines, reputation damage

COVID-19 testing:

  • Target: Positivity rate
  • Gaming: Some locations limited testing to obvious cases
  • Result: Lower positivity rate, worse outbreak detection

Defending Against Goodhart's Law

Strategies:

1. Use multiple metrics (no single metric captures everything)

  • Don't just measure accounts opened; measure legitimate accounts used
  • Don't just measure engagement; measure user satisfaction

2. Monitor for gaming (check if metric-goal correlation holds)

  • Rising test scores + declining real performance? Gaming likely

3. Rotate metrics (prevents long-term optimization)

  • Change what you measure periodically

4. Balance competing metrics (creates trade-offs)

  • Revenue AND customer satisfaction
  • Speed AND quality
  • Quantity AND accuracy

5. Focus on outcomes, not outputs

  • Not "lines of code" but "working features deployed"
  • Not "sales calls made" but "revenue generated"

6. Tie metrics to actual goals

  • Regularly ask: "Is optimizing this metric actually achieving our goal?"

Application: When designing metrics, assume people will game them. How could they optimize the metric without achieving the goal? Design defenses accordingly.

Validity and Reliability

Validity

Definition: Does the metric actually measure what you think it's measuring?

Types:

Face validity: Appears to measure the construct

  • Example: Asking "Are you satisfied?" seems to measure satisfaction

Construct validity: Actually captures the theoretical concept

  • Example: Does IQ test actually measure intelligence, or just test-taking ability?

Predictive validity: Predicts outcomes it should predict

  • Example: Do SAT scores predict college success?

Content validity: Covers all aspects of what you're measuring

  • Example: Does customer satisfaction survey cover all satisfaction dimensions?

Threats to validity:

  • Measuring wrong thing (test scores ≠ learning)
  • Missing important dimensions (revenue growth without profitability)
  • Confounding variables (correlation without causation)

Example - Validity problem:

  • Goal: Measure employee productivity
  • Metric: Hours worked
  • Problem: Invalid—hours ≠ output. Measures time, not productivity.

Reliability

Definition: Does the metric produce consistent results under consistent conditions?

Characteristics:

  • Repeatability: Same measurement process yields same result
  • Precision: Low random error
  • Consistency: Different measurers get same result

Types:

Test-retest reliability: Measure same thing twice, get same result

Inter-rater reliability: Different people measuring get same result

Internal consistency: Multiple items measuring same construct correlate

Threats to reliability:

  • Measurement error (inconsistent instruments)
  • Subjective judgment (different raters, different results)
  • Environmental variation (conditions change between measurements)

Example - Reliability problem:

  • Metric: "Employee engagement" rated by managers
  • Problem: Unreliable—different managers rate differently, same manager rates differently at different times

The Relationship

Validity: Are you measuring the right thing?
Reliability: Are you measuring consistently?

Ideal: High validity AND high reliability (measuring right thing consistently)

Possible problems:

  • High reliability, low validity: Consistently measuring the wrong thing
  • High validity, low reliability: Measuring right thing inconsistently (random noise)
  • Low both: Useless metric
Validity Reliability Example
✅ High ✅ High Blood pressure reading with calibrated instrument (measures cardiovascular health consistently)
✅ High ❌ Low Customer satisfaction via unstructured interviews (relevant but inconsistent)
❌ Low ✅ High Hours worked as productivity measure (consistent but doesn't measure actual output)
❌ Low ❌ Low Random number generator (neither measures anything nor consistent)

Application: When designing metrics, ask:

  1. Validity: "Does this actually measure what matters?"
  2. Reliability: "Will I get consistent results?"

Both are necessary. Neither alone is sufficient.

Advanced Metrics Concepts

Composite Metrics

Definition: Single metric combining multiple sub-metrics, weighted to reflect priorities.

Examples:

  • Credit score: Payment history + debt-to-income + credit age + types of credit...
  • Happiness index: GDP per capita + social support + life expectancy + freedom + generosity + corruption perception
  • Developer productivity: Code quality + velocity + bug rate + collaboration

Advantages:

  • Captures multi-dimensional concepts
  • Provides single scoreboard number
  • Allows weighting of priorities

Risks:

  • Obscures underlying components (score drops—but why?)
  • Weighting is subjective (what's "right" weight?)
  • More complex to understand and trust
  • Easier to game (optimize easier components, ignore hard ones)

Best practice: Show composite score AND underlying components.

Ratio Metrics

Definition: Metrics expressed as ratios rather than absolute numbers, providing context.

Why ratios matter: Absolute numbers lack context.

Examples:

Absolute (Context-free) Ratio (With Context) Why Ratio Is Better
100 sales 100 sales / 1000 leads = 10% conversion Shows efficiency, not just volume
$1M revenue $1M revenue / $500K costs = 2x ROI Shows profitability, not just size
50 bugs 50 bugs / 10,000 lines of code = 0.5% bug rate Normalizes by complexity
1,000 complaints 1,000 complaints / 100,000 customers = 1% Shows proportion affected

Key ratios:

  • Rates: Events per time period (churn rate, growth rate)
  • Proportions: Part-to-whole (market share, conversion rate)
  • Efficiency: Output per input (revenue per employee, profit margin)

Application: Convert absolute metrics to ratios for meaningful comparison across time, teams, or companies.

Cohort Metrics

Definition: Metrics segmented by groups that share common characteristics or experience at same time.

Why cohorts matter: Aggregated metrics hide important patterns.

Common cohort types:

  • Time-based: Users acquired in January vs. February
  • Channel-based: Users from Google vs. Facebook
  • Feature-based: Users who tried Feature X vs. didn't
  • Demographic: Age groups, locations, segments

Example - User retention:

Aggregated: "80% retention rate"
Problem: Mixes old users (high retention) with new users (low retention)

Cohort analysis:

  • January cohort: 90% retention
  • February cohort: 85% retention
  • March cohort: 70% retention

Insight: Retention is declining for new users—something changed. Aggregated metric hid this.

Application: When metrics seem stable but you suspect problems, segment by cohort to reveal hidden patterns.

Practical Application

Designing a Metrics System

Framework:

1. Define goals clearly

  • What are you trying to achieve? (Strategy)
  • What does success look like? (Outcomes)

2. Identify KPIs (3-7 per level/team)

  • What metrics best reflect progress toward goals?
  • Leading indicators (predict future)
  • Lagging indicators (confirm results)

3. Add supporting metrics (dashboard context)

  • Metrics that explain KPI movements
  • Diagnostic metrics for troubleshooting

4. Test validity and reliability

  • Does each metric measure what you think?
  • Are measurements consistent?

5. Check for gaming potential

  • How could people optimize metrics without achieving goals?
  • Add balancing metrics

6. Review and iterate

  • Do metrics still align with goals?
  • Are you measuring what matters?

Metric Red Flags

Warning signs of bad metrics:

1. Vanity symptoms:

  • You can't explain what action to take if it changes
  • It always goes up (no failure mode)
  • You celebrate the number but business isn't improving

2. Goodhart's Law symptoms:

  • Metric improves but underlying goal doesn't
  • Obvious gaming behavior emerges
  • People optimize metric instead of customer value

3. Validity problems:

  • Metric doesn't correlate with business outcomes
  • Proxy has drifted from goal
  • Measuring wrong thing entirely

4. Reliability problems:

  • Results vary wildly without real change
  • Different teams report different numbers for same thing
  • Can't reproduce measurements

5. Complexity problems:

  • No one understands how it's calculated
  • Requires PhD to interpret
  • Changes for unknown reasons

Application: Audit existing metrics against these red flags. If metric fails multiple tests, replace it.

The Meta-Principle

Metrics are tools, not goals. They help you understand reality, not replace it.

Peter Drucker (actual quote): "There is surely nothing quite so useless as doing with great efficiency what should not be done at all."

Translation: Measuring the wrong things precisely is worse than not measuring at all—it directs effort toward worthless goals.

The vocabulary of metrics exists to help you:

  • Distinguish signal from noise (actionable vs. vanity)
  • Predict future from past (leading vs. lagging)
  • Measure what matters (validity)
  • Measure consistently (reliability)
  • Avoid gaming (Goodhart's Law awareness)

Use metrics vocabulary precisely because imprecise language leads to imprecise measurement, which leads to imprecise decisions.

Measure what matters. Ignore what doesn't. Know the difference.

"The goal is to turn data into information, and information into insight." — Carly Fiorina


What Research Shows About Metrics Terminology and Misuse

The vocabulary of measurement is not merely semantic -- precise terminology reflects genuine conceptual distinctions that have material consequences for how organizations make decisions. Several researchers have documented the practical stakes of imprecise metrics language.

Charles Goodhart's 1975 paper introduced the concept that would bear his name, but the terminological precision matters: Goodhart did not say "metrics are bad" or "targets are dangerous." He said that measures used as targets cease to be good measures. The distinction between a metric (information about the world) and a target (a goal expressed numerically) is the conceptual hinge on which Goodhart's Law turns. Organizations that conflate these two categories -- treating every metric as implicitly a target, or treating targets as though they still provide neutral information -- will predictably experience metric corruption.

Marilyn Strathern's 1997 formulation of Goodhart's Law in the context of British university auditing added precision: "When a measure becomes a target, people optimize for the measure rather than the underlying goal." This version makes explicit that the problem is the optimization behavior that targets induce, not the measurement itself. A metric that is tracked for information does not necessarily produce gaming. A metric that determines funding, employment, or reputation reliably does.

Donald Campbell's 1979 paper introduced the related concept of indicator corruption, distinguishing it from simple measurement error. Corruption is systematic distortion in a predictable direction driven by incentives. This is terminologically important because "error" implies randomness that averages out, while "corruption" implies systematic bias that accumulates. Organizations that treat gaming of metrics as "measurement error" will look for statistical solutions (larger samples, better instruments) when the actual problem is incentive design.

Jerry Muller's The Tyranny of Metrics (2018) contributed the term "metric fixation" to describe the institutional pathology of substituting quantitative metrics for substantive judgment. Muller distinguishes this from measurement, which he endorses as a genuinely useful tool. Metric fixation is the belief that everything important can and should be reduced to numbers, and that numerical indicators can replace the judgment of experienced practitioners. This distinction -- between measurement as a tool that supports judgment and metric fixation as an ideology that replaces judgment -- is one of the most important conceptual lines in the literature.

W. Edwards Deming introduced terminological distinctions in quality management that remain foundational. His distinction between common cause variation (random fluctuation within a stable system) and special cause variation (signals that the system has changed) is critical for interpreting any metric over time. These are not just statistical concepts -- they imply different management responses. Responding to common cause variation as though it were a special cause (Deming called this "tampering") increases rather than decreases variation. Measurement systems that do not build in this distinction will consistently produce overreaction to noise.

Deming also distinguished between "inspection" (measuring outputs to find defects) and "quality at the source" (building measurement into the process so that defects are detected and prevented where they occur). This distinction has become foundational in lean manufacturing and software development: shift measurement from final inspection to process monitoring.

Robert Kaplan and David Norton contributed the distinction between leading and lagging indicators as a formal framework for measurement system design. While the general idea that some metrics predict outcomes while others record them is intuitive, Kaplan and Norton operationalized it through the Balanced Scorecard architecture: financial metrics are lagging (recording outcomes of decisions already made); customer, internal process, and learning metrics are leading (predicting future financial outcomes). This framework made explicit that organizations need both types, and that exclusive focus on either produces predictable failures -- exclusive lagging measurement means you always know when you have failed but never in time to prevent it; exclusive leading measurement means you track proxies without confirming they actually predict the outcomes you care about.


Real-World Cases Where Terminology Mattered

Wells Fargo and the metric-target confusion. The Wells Fargo unauthorized accounts scandal (2002-2016) illustrates the consequences of treating metrics as targets. The metric -- accounts per customer -- was designed to measure the depth of customer relationships. It was a valid proxy metric when used informatively: higher cross-sell ratios indicated that customers were using Wells Fargo as their primary financial institution. When it became a performance target with compensation implications, it stopped functioning as a measure of relationship depth and started functioning as a number to hit by whatever means available. The terminological confusion between metric (information about customer relationships) and target (goal for employees to achieve) directly produced 3.5 million unauthorized accounts and $3 billion in settlements.

NHS waiting times and the leading/lagging conflation. The UK's National Health Service waiting time targets conflated several terminological distinctions. The 4-hour emergency department target was intended as a lagging indicator of access quality -- a measure of how well the system was actually providing timely emergency care. But when it became a target with funding consequences, it was gamed in ways that improved the metric while not improving (and sometimes degrading) actual access quality. The error was treating a metric (measurement of access quality) as a target (performance goal), without understanding that the target-making transformation activates Goodhart dynamics. Subsequent NHS measurement framework development has been more careful to distinguish monitoring metrics from performance targets, and to use independent patient surveys as a check on the target-based metrics.

Soviet production quotas and proxy metric failures. Soviet central planning assigned quantitative targets to factories as the primary mechanism for coordinating production. These targets were metrics in one sense (quantified descriptions of production) but were designed and used as targets (goals with consequences for managers). The terminological slippage -- treating targets as though they were neutral information -- prevented planners from updating when the targets stopped tracking the actual goals (useful production). When nail weight targets produced heavy useless nails, the planning system's response was typically to adjust the target formula rather than to recognize that the target-making process itself was the problem. No terminological framework that distinguished "metric" (information about production) from "target" (production goal) existed within the planning system, which made systematic learning about metric corruption essentially impossible.

Intel's OKR and the objective-key result distinction. Andy Grove's development of OKRs at Intel introduced a terminological distinction that has become influential in organizational management: objectives (what you want to achieve, qualitative, directional) versus key results (how you will measure whether you achieved it, quantitative, specific). This distinction has practical consequences. Objectives can be ambitious and inspirational without being gameable, because they are not directly measured. Key results are measured and therefore susceptible to Goodhart dynamics -- but because they are explicitly positioned as measurements of objective achievement rather than the objective itself, the conceptual separation is maintained. When a key result is gamed (achieved without achieving the objective), the OKR framework makes this visible as a failure: the key result was achieved but the objective was not, which reveals that the key result was not a valid indicator of the objective.

Google Flu Trends and construct validity. Google's Google Flu Trends project failed partly because of a construct validity problem, but the error was enabled by imprecise terminology. The project used search query volume as a proxy for flu prevalence -- a defensible methodological choice. The failure was treating the proxy as though it measured the construct directly. When the proxy drifted from the construct it was supposed to measure (because of media-induced search behavior rather than illness-induced search behavior), the system continued reporting confidently because it lacked the terminological framework to distinguish "search volume about flu" (what was measured) from "flu prevalence" (what was claimed to be measured). Construct validity -- the question of whether a measurement actually captures the theoretical construct it is supposed to represent -- was not systematically evaluated.


Evidence-Based Principles for Using Metrics Terminology Precisely

Principle 1: Always distinguish metrics from targets. The single most consequential terminological distinction in the measurement literature is between a metric (a number that provides information about the world) and a target (a goal expressed numerically). Organizations that conflate these two categories will predictably experience Goodhart dynamics: the metric will be optimized at the expense of the goal it was supposed to represent. Effective metrics practice requires treating this distinction as a design constraint: before any metric is tied to consequences (compensation, funding, career), explicitly evaluate whether it is sufficiently resistant to gaming to function as a target without corrupting the information it provides.

Principle 2: Specify the causal relationship between proxy metrics and actual goals. Every organizational metric is a proxy for something that matters more directly. Customer satisfaction scores are proxies for customer behavior (retention, recommendation, expansion). Net Promoter Score is a proxy for word-of-mouth growth. Employee engagement surveys are proxies for retention and performance. The proxy relationship requires explicit documentation: this metric is a valid proxy for this outcome because of this mechanism. Without this documentation, proxy drift goes undetected -- the metric continues to be used as though it represents the goal even after the relationship has broken down.

Principle 3: Use the leading/lagging distinction explicitly in measurement system design. Kaplan and Norton's research shows that organizations with measurement systems that include only lagging indicators consistently underinvest in the drivers of future performance. The leading/lagging distinction should be part of the explicit design vocabulary of any measurement system: for each strategic goal, what are the lagging indicators that will confirm achievement, and what are the leading indicators that will predict achievement in time to adjust course? The design discipline of requiring both types forces organizations to think about causal mechanisms, not just outcome tracking.

Principle 4: Acknowledge the reliability-validity distinction and its consequences. A reliable metric that is not valid (consistently measuring the wrong thing) is more dangerous than an unreliable metric, because it produces confident wrong conclusions. Hours worked is a highly reliable metric for effort measurement but of questionable validity as a measure of output or value. Customer satisfaction scores are moderately reliable but of contested validity as predictors of financial outcomes. The terminological precision of distinguishing "are we measuring consistently" (reliability) from "are we measuring the right thing" (validity) enables more sophisticated metric evaluation than treating measurement quality as a single dimension.


Essential Readings

Metrics Fundamentals:

  • Ries, E. (2011). The Lean Startup. New York: Crown Business. [Vanity vs. actionable metrics]
  • Croll, A., & Yoskovitz, B. (2013). Lean Analytics. Sebastopol, CA: O'Reilly. [Metrics for startups]
  • Kaplan, R. S., & Norton, D. P. (1996). The Balanced Scorecard. Boston: Harvard Business School Press. [Strategic measurement framework]

Goodhart's Law and Gaming:

  • Goodhart, C. A. E. (1984). "Problems of Monetary Management: The UK Experience." In Monetary Theory and Practice (pp. 91-121). London: Macmillan.
  • Muller, J. Z. (2018). The Tyranny of Metrics. Princeton: Princeton University Press. [Comprehensive critique of metric fixation]
  • Campbell, D. T. (1979). "Assessing the Impact of Planned Social Change." Evaluation and Program Planning, 2(1), 67-90. [Campbell's Law]

Measurement Theory:

  • Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and Quasi-Experimental Designs for Generalized Causal Inference. Boston: Houghton Mifflin. [Validity types]
  • Stevens, S. S. (1946). "On the Theory of Scales of Measurement." Science, 103(2684), 677-680. [Measurement scales]
  • Carmines, E. G., & Zeller, R. A. (1979). Reliability and Validity Assessment. Beverly Hills: Sage. [Accessible treatment]

Leading and Lagging Indicators:

  • Parmenter, D. (2015). Key Performance Indicators (3rd ed.). Hoboken, NJ: Wiley. [KPI design and implementation]
  • Marr, B. (2012). Key Performance Indicators: The 75 Measures Every Manager Needs to Know. Harlow: Pearson.

Business Metrics:

  • Farris, P. W., Bendle, N. T., Pfeifer, P. E., & Reibstein, D. J. (2010). Marketing Metrics (2nd ed.). Upper Saddle River, NJ: Wharton School Publishing.
  • Davenport, T. H., & Harris, J. (2017). Competing on Analytics (Updated ed.). Boston: Harvard Business Review Press.

Practical Application:

  • Redman, T. C. (2013). Data Driven: Profiting from Your Most Important Business Asset. Boston: Harvard Business Review Press.
  • Provost, F., & Fawcett, T. (2013). Data Science for Business. Sebastopol, CA: O'Reilly.

Frequently Asked Questions

What's the difference between metrics and KPIs?

All KPIs are metrics, but not all metrics are KPIs. KPIs are the specific metrics most critical to your goals and strategy.

What are leading vs lagging indicators?

Leading indicators predict future performance; lagging indicators measure past results. Both are needed for complete insight.

What are vanity metrics?

Vanity metrics look impressive but don't correlate with business outcomes or inform decisions—they feel good but aren't useful.

What does it mean for a metric to be actionable?

An actionable metric shows you what to do differently. If it changes, you know what actions to take to improve it.

What is a proxy metric?

A proxy metric stands in for something hard to measure directly, approximating the real target through a measurable substitute.

What is Goodhart's Law?

When a measure becomes a target, it ceases to be a good measure—people optimize for the metric rather than the underlying goal.

Why does measurement terminology matter?

Precise language prevents metric confusion, helps you design better measurement systems, and improves communication about performance.