Step-by-Step: Designing Effective Metrics

"What gets measured gets managed" is one of the most quoted principles in business, attributed (probably incorrectly) to Peter Drucker. But the full truth is more sobering: what gets measured gets managed, whether or not it is the right thing to manage, and whether or not "managing" it actually produces the outcomes you want. Metrics are among the most powerful forces shaping behavior in any organization, and their power operates regardless of whether the metrics are well-designed or poorly designed. A good metric focuses attention on what matters, reveals problems early, guides improvement, and aligns individual effort with organizational goals. A bad metric focuses attention on what is easy to count, conceals real problems behind impressive numbers, drives gaming and distortion, and misaligns individual effort with organizational needs.

The difference between organizations that thrive and organizations that drown in measurement is not whether they measure things. Everyone measures things. The difference is whether they measure the right things, in the right way, with the right understanding of what metrics can and cannot do. Designing effective metrics is not primarily a technical skill (although some technical knowledge helps); it is primarily a thinking skill: the ability to reason clearly about what you are actually trying to achieve, what would tell you whether you are achieving it, and what behaviors your chosen measurements will create.

This guide provides a systematic process for designing metrics that actually improve performance, from identifying what to measure through selecting metrics, defining them precisely, preventing gaming, and maintaining them over time. It addresses the most common and most damaging metric design errors, including Goodhart's Law, the McNamara Fallacy, surrogate endpoint errors, and the pervasive problem of measuring what is easy rather than what is important.


How Do I Choose Which Metrics to Track?

Metric selection is the most consequential decision in the entire measurement design process. Choose the right metrics and they become a lens that reveals the system's true performance. Choose the wrong metrics and they become a distorting mirror that shows a flattering image while the system deteriorates behind the reflection.

Start with Strategic Goals

Every metric should be derivable from a strategic goal through a clear chain of logic. The chain looks like this: Strategic goal (what outcome do we want to achieve?) leads to Success criteria (how would we know if we achieved it?) leads to Measurable indicator (what observable quantity tracks whether we are meeting the success criterion?).

For example:

  • Strategic goal: Become the preferred provider for enterprise customers.
  • Success criterion: Enterprise customers choose us over competitors, stay with us for multiple years, and expand their usage over time.
  • Measurable indicators: Enterprise customer acquisition rate, enterprise customer retention rate, enterprise account revenue growth rate.

This derivation process ensures that every metric connects to something the organization actually cares about. Metrics that cannot be traced back to a strategic goal through a clear chain of reasoning should be questioned: why are we measuring this? What goal does it serve? If the answer is "we've always measured it" or "because the data is available," the metric is a candidate for elimination.

Identify Leading Indicators

Once you have identified the outcome metrics (the measurable indicators of strategic goal achievement), look for leading indicators: metrics that predict future changes in the outcome metrics before those changes are visible. Leading indicators are more actionable than outcome metrics because they provide advance warning that allows course correction before the outcome deteriorates.

The relationship between leading and lagging indicators is analogous to the relationship between steering a car and reading the speedometer. The speedometer (a lagging indicator) tells you your current speed, but by the time it shows you are going too fast, you have already been speeding for some time. The view through the windshield (a leading indicator) shows a curve ahead, allowing you to slow down before the speedometer would have told you there was a problem.

In a customer-focused business, customer satisfaction scores are a leading indicator of customer retention (a lagging indicator). Satisfied customers tend to stay; dissatisfied customers tend to leave, but there is a delay between dissatisfaction and departure. If you measure only retention, you won't see the problem until customers have already left. If you measure satisfaction, you can detect the problem while there is still time to address it.

In a software development context, code complexity metrics and test coverage are leading indicators of defect rates (a lagging indicator). Increasing complexity and decreasing test coverage predict future increases in defects, but the defects won't appear until the code is released. Measuring the leading indicators allows the team to address the root causes before the symptoms appear.

Keep the Set Small: 3-5 Key Metrics

How do I choose which metrics to track? The most important principle is restraint. The natural temptation is to measure everything that might be relevant, which produces a measurement system with dozens of metrics that overwhelms decision-makers, dilutes attention, and obscures the few signals that actually matter in a sea of noise.

Research on human attention and decision-making consistently shows that people can effectively track and respond to a small number of metrics simultaneously, typically 3 to 5. Beyond that, each additional metric reduces the attention given to all metrics, including the most important ones. A dashboard with 30 metrics receives less total attention per metric than a dashboard with 5, which means the most important metrics get less attention in the 30-metric dashboard even though they are "included."

The discipline of selecting only 3-5 key metrics forces you to prioritize ruthlessly, which is one of the most valuable exercises in the entire metric design process. If you can track only 5 metrics, which 5 would you choose? This question cuts through the measurement bloat that afflicts most organizations and focuses attention on the handful of indicators that truly matter for strategic goal achievement.

This does not mean that the organization should only have 5 metrics. Different functions, teams, and levels of the organization will have their own metric sets tailored to their specific responsibilities. But at each level, the key metric set should be small enough to hold in working memory and focused enough to guide action.


What Makes a Good Metric?

Not all metrics are created equal. The difference between metrics that drive genuine improvement and metrics that drive gaming and distortion lies in several design properties that are worth understanding in detail.

Clearly Defined

A good metric has a precise, unambiguous definition that leaves no room for interpretation disagreements. "Customer satisfaction" is not a metric; it is a concept. "Customer satisfaction as measured by the mean score on a 5-point Likert scale survey administered to a random sample of 200 customers per month, excluding customers who have been with us less than 30 days" is a metric. The definition should specify: what exactly is being measured, how it is measured (methodology), who is measured (population), when it is measured (frequency and timing), and what counts as an observation (inclusion and exclusion criteria).

Definitional ambiguity is one of the most common sources of metric dysfunction. When people disagree about whether a metric has improved or deteriorated, the disagreement is often not about the data but about what the metric means. The sales team counts a deal as "closed" when the contract is signed; the finance team counts it as "closed" when payment is received. Both definitions are reasonable, but they produce different numbers, and the disagreement about which number is "right" is really a disagreement about the definition.

Directly Tied to Goals

A good metric measures something that is causally connected to the goal it serves, not merely correlated with it. The connection should be strong enough that improving the metric reliably improves progress toward the goal, and deterioration in the metric reliably signals deterioration in goal progress.

This property is harder to achieve than it sounds, because many metrics that seem connected to goals are actually connected only weakly or indirectly. "Number of features shipped" seems connected to the goal of "build a great product," but the connection is weak: you can ship many features and still have a terrible product if the features are the wrong ones, or if they are poorly implemented, or if they create complexity that degrades the user experience. "Customer retention rate" is much more directly connected to "build a great product" because customers stay when the product meets their needs and leave when it doesn't.

Actionable

A good metric provides information that the people who receive it can act upon. If a metric declines, there should be specific, feasible actions that the relevant people can take to improve it. If the people who see the metric have no ability to influence it, the metric is informational but not actionable, and non-actionable metrics tend to breed frustration and learned helplessness rather than improvement.

Actionability depends on who receives the metric and what authority they have. A "customer churn rate" metric is actionable for a customer success team that has the authority to change onboarding processes, adjust pricing, or improve product features. The same metric is not actionable for a front-line employee who has no influence over these decisions. Design the metric distribution so that each metric reaches the people who can actually do something about it.

Resistant to Gaming

How do I prevent Goodhart's Law problems? This is one of the central challenges of metric design. Goodhart's Law (originally formulated by economist Charles Goodhart and popularized by anthropologist Marilyn Strathern as "when a measure becomes a target, it ceases to be a good measure") describes the universal tendency for people to optimize the metric rather than the underlying performance the metric represents.

Designing metrics that are resistant to gaming requires understanding how gaming works. Gaming occurs through several mechanisms:

  • Redefining the measurement: Changing what counts as a measurement to inflate the number. A school that "counsels out" low-performing students before standardized tests improves its average score without improving its education.
  • Selecting the measured population: Choosing which cases are measured to produce favorable numbers. A hospital that transfers the sickest patients to other facilities before they die improves its mortality rate without improving its care.
  • Shifting activity to measured dimensions: Concentrating effort on what is measured at the expense of what is not. A call center agent who rushes through complex issues to handle more calls per hour improves the call volume metric while destroying customer satisfaction.
  • Temporal manipulation: Timing activities to coincide with measurement periods. A sales team that front-loads deals into the current quarter (through discounts, pressure, or accounting tricks) improves the quarterly number while borrowing from the future.

To resist gaming, use multiple complementary metrics that create checks and balances. If you measure both the quantity and quality of output, gaming quantity (by sacrificing quality) will show up in the quality metric, and gaming quality (by reducing quantity) will show up in the quantity metric. The metrics should be designed so that the only way to improve all of them simultaneously is to actually improve the underlying performance.

Additionally, periodically verify that metric improvements correspond to real improvements in the outcomes the metrics are supposed to represent. If your customer satisfaction score is rising but customer retention is falling, something is wrong with either the satisfaction measurement or the interpretation of the score.

Leading Rather Than Lagging When Possible

As discussed above, leading indicators provide advance warning of problems, which makes them more actionable than lagging indicators. A good metric set includes both: leading indicators for operational guidance (what should we do now?) and lagging indicators for outcome verification (did our actions produce the intended results?).

Metric Quality Strong Metric Weak Metric
Definition Precise, unambiguous, consistently measurable Vague, subject to interpretation, inconsistently measured
Goal connection Causally linked to strategic goal Correlated or unrelated to strategic goal
Actionability Recipients can act on the information Recipients have no influence over the metric
Gaming resistance Multiple complementary metrics, verified against outcomes Single metric, easy to game without detection
Temporal orientation Mix of leading (predictive) and lagging (confirmatory) Only lagging (reactive) indicators

Should Metrics Be Leading or Lagging Indicators?

The answer is unequivocally both, but for different purposes. This distinction is fundamental to effective metric design and is worth exploring in depth.

Lagging Indicators: Verifying Results

Lagging indicators measure outcomes: revenue, profit, customer retention, employee turnover, defect rates, market share. They tell you whether you achieved what you set out to achieve. Their strength is that they measure what you actually care about. Their weakness is that by the time a lagging indicator shows a problem, the problem has already occurred: customers have already left, defects have already shipped, revenue has already been missed. Lagging indicators are like looking in the rearview mirror: they show you clearly what has already happened, but they cannot prevent what is about to happen.

Leading Indicators: Enabling Course Correction

Leading indicators measure activities, behaviors, and conditions that predict future changes in lagging indicators. Sales pipeline size is a leading indicator of future revenue. Employee engagement is a leading indicator of future turnover. Customer complaint volume is a leading indicator of future churn. Code review thoroughness is a leading indicator of future defect rates.

Leading indicators' strength is their timeliness: they signal emerging problems before those problems are visible in outcomes, which provides a window for correction. Their weakness is their indirectness: they measure something that predicts outcomes rather than measuring outcomes directly, and the prediction may be wrong if the assumed causal relationship between the leading indicator and the outcome is not as strong as believed.

Balancing Predictive and Confirmatory Measurement

The most effective metric systems use leading indicators for operational management (What should we do this week to stay on track?) and lagging indicators for strategic assessment (Are our efforts producing the intended results?). The leading indicators guide day-to-day decisions; the lagging indicators validate (or invalidate) the assumption that the leading indicators are actually connected to the outcomes.

For example, a software team might track code review coverage and test pass rates (leading indicators) for daily operational guidance, while tracking defect escape rate and customer-reported bugs (lagging indicators) for monthly strategic assessment. If the leading indicators improve but the lagging indicators do not, it suggests that code reviews and tests are not catching the types of defects that reach customers, which is valuable diagnostic information that neither type of indicator alone would provide.


How Often Should Metrics Be Reviewed?

Metric review frequency is a critical design decision that significantly affects the metric system's effectiveness. Reviewing too frequently creates noise and overreaction; reviewing too infrequently delays problem detection and correction.

Match Review Frequency to the Metric's Nature

Operational metrics (server uptime, transaction error rates, production line output) should be reviewed in real-time or daily because the system they monitor operates on a short cycle and problems can escalate rapidly. A server outage that goes undetected for an hour can affect thousands of users; the feedback system must be fast enough to detect and respond before the impact grows.

Tactical metrics (weekly sales, sprint velocity, customer satisfaction trends) should be reviewed weekly or bi-weekly because the activities they monitor require days to weeks to show meaningful changes. Reviewing these metrics daily would create noise (normal day-to-day variation) that obscures the real trends visible at weekly resolution.

Strategic metrics (quarterly revenue, annual market share, year-over-year customer retention) should be reviewed monthly or quarterly because the outcomes they measure take months to materialize and the decisions they inform (strategic direction, investment priorities, organizational changes) take months to implement. Monthly review provides enough frequency for course correction without creating the illusion that strategic outcomes can be managed on a weekly basis.

Too Frequent Creates Noise

When metrics are reviewed more frequently than the system's natural cycle, the reviews capture random variation rather than meaningful change. A sales team that reviews daily sales numbers will see wild variation that reflects the randomness of deal timing rather than any meaningful change in sales performance. Reacting to this random variation, perhaps by panicking on low-sales days and celebrating on high-sales days, wastes energy and attention without improving outcomes.

W. Edwards Deming called this pattern "tampering": reacting to common-cause variation (the normal randomness inherent in any process) as if it were special-cause variation (a meaningful change requiring intervention). Tampering increases variation rather than reducing it, which is the exact opposite of the improvement the metric system is supposed to achieve.

Too Rare Prevents Correction

When metrics are reviewed less frequently than problems develop, the review cycle cannot catch problems while they are still small and correctable. An annual employee engagement survey cannot detect the team morale crisis that developed in March and led to three resignations in June, because the survey results will not be available until the following January. A quarterly financial review cannot catch the spending overrun that began in the first week of the quarter.

The right review frequency is the one that provides enough data points to distinguish signal from noise while being frequent enough to catch problems before they become crises. As a practical guideline: if you have identified the metric as important enough to track, ask yourself "What is the longest I would be comfortable going without checking this metric?" That duration is approximately the right review frequency.


What If Important Things Are Hard to Measure?

This is one of the most important questions in metric design, and the wrong answer to it, which unfortunately is also the most common answer, undermines metric systems across every industry and sector.

The wrong answer is: "If it's hard to measure, don't measure it." This answer is wrong because it creates a systematic bias toward measuring easy things (activity counts, financial transactions, time durations) while ignoring difficult-to-measure things (quality, innovation, trust, learning, resilience, strategic positioning) that are often more important than the easy-to-measure things.

Robert McNamara made this error famous during the Vietnam War, where he insisted on measuring war progress through quantifiable metrics (enemy body counts, territory held, missions flown) while ignoring difficult-to-measure factors (political legitimacy, popular support, guerrilla capability, strategic coherence) that ultimately determined the war's outcome. The measurable metrics showed "progress" even as the unmeasurable factors deteriorated to the point of defeat. This pattern, now called the McNamara Fallacy, appears with depressing regularity in business, education, healthcare, and public policy: organizations measure what is easy to count, ignore what is hard to count but more important, and then express surprise when the easy-to-count metrics improve while actual performance deteriorates.

Use Proxies Carefully

When direct measurement is not feasible, proxy metrics can be used to approximate the measurement, but with full awareness that proxies are approximations that may diverge from the underlying reality. Customer complaints per thousand customers is a proxy for customer satisfaction (because most dissatisfied customers do not complain). Employee voluntary turnover rate is a proxy for employee engagement (because disengaged employees are more likely to leave, but some engaged employees also leave for other reasons). Patent filings are a proxy for innovation (because innovative companies tend to file patents, but patents do not measure all forms of innovation and some companies file patents that are not genuinely innovative).

Proxies are useful when they are treated as what they are: imperfect approximations that need to be validated against the underlying reality. Proxies become dangerous when they are treated as equivalent to the underlying reality, because any divergence between the proxy and the reality creates an opportunity for gaming (improving the proxy without improving the reality).

Use Qualitative Assessment

For some important dimensions, quantitative measurement is not the right tool. The quality of an organization's strategic thinking, the depth of trust between team members, the creativity of a design team's work, the wisdom of a leader's decisions, these are important dimensions that resist quantification. Attempting to quantify them produces false precision that obscures rather than reveals.

For these dimensions, qualitative assessment is more appropriate: structured evaluations by knowledgeable people, using rubrics that define what "excellent," "adequate," and "poor" look like for the dimension in question. Qualitative assessment is less precise than quantitative measurement, but less precise measurement of the right thing is far more valuable than precise measurement of the wrong thing.

Use Mixed Methods

The most robust approach for hard-to-measure dimensions is mixed methods: combining quantitative metrics with qualitative assessment. Measure what you can quantify, but supplement the quantitative data with qualitative evaluation that captures the dimensions that quantification misses. A product team might track quantitative metrics (user engagement, feature adoption, NPS scores) alongside qualitative assessment (user interviews, usability testing, expert reviews) to build a comprehensive picture that neither approach alone would provide.

The key insight is that some imprecision in measuring the right things is vastly preferable to precision in measuring the wrong things. An imprecise measure of innovation is more useful than a precise measure of patent filings. An imprecise measure of employee engagement is more useful than a precise measure of hours worked. An imprecise measure of learning is more useful than a precise measure of test scores. The McNamara Fallacy is avoided not by achieving perfect measurement but by accepting imperfect measurement of what matters rather than substituting perfect measurement of what is easy.


Common Metric Design Patterns and Anti-Patterns

Understanding the most common metric design mistakes helps you avoid them. Each anti-pattern has a corresponding positive pattern that produces better results.

Anti-Pattern: Vanity Metrics

Vanity metrics are numbers that look impressive but do not provide actionable information about performance. Total page views, total registered users, total downloads, total followers, these numbers almost always go up over time, which makes them psychologically gratifying but analytically useless. They tell you that the thing you are measuring is getting bigger (which is usually true of any cumulative count), but they do not tell you whether performance is improving, stable, or deteriorating.

Better pattern: Rate and ratio metrics. Instead of total page views (which always increases), measure page views per active user per week (which can increase or decrease and tells you whether engagement is strengthening or weakening). Instead of total registered users (which always increases), measure the percentage of registered users who are active in any given month (which reveals whether users find lasting value). Rate and ratio metrics normalize for scale effects and provide genuine signals about performance.

Anti-Pattern: Single-Metric Obsession

When an organization fixates on a single metric to the exclusion of all others, that metric inevitably becomes gamed, and the organization's actual performance deteriorates in the dimensions not captured by the single metric. The classic example is a sales organization that measures only revenue, which leads to predatory sales practices, customer churn, and long-term revenue decline, or a software team that measures only velocity (story points per sprint), which leads to story point inflation, quality degradation, and mounting technical debt.

Better pattern: Balanced metric sets. Design a small set (3-5) of complementary metrics that collectively represent the full picture of performance. The Balanced Scorecard framework, developed by Robert Kaplan and David Norton, recommends metrics across four perspectives: financial, customer, internal process, and learning/growth. The specific metrics vary by organization, but the principle of measuring multiple complementary dimensions is universal.

Anti-Pattern: Input Metrics Without Output Verification

Measuring inputs (hours worked, money spent, activities completed) without verifying that those inputs produce the desired outputs creates the illusion of progress without the reality. A marketing team that measures "campaigns launched" without measuring "leads generated" or "revenue attributed" cannot know whether their campaigns are effective. A training department that measures "training hours delivered" without measuring "skill improvement" or "performance change" cannot know whether their training works.

Better pattern: Linked input-output metrics. For every input metric, identify the corresponding output metric and track both. This linkage allows you to assess efficiency (how much output per unit of input?) and effectiveness (does more input actually produce more output, or is the relationship broken?).

Anti-Pattern: One-Time Design, Permanent Use

Many metric systems are designed once and then used indefinitely without review, even as the organization's strategy, environment, and capabilities evolve. Metrics that were well-aligned with the organization's goals three years ago may be irrelevant or counterproductive today, but because "we've always tracked this," they persist, consuming attention and effort while providing decreasing value.

Better pattern: Periodic metric reviews. Schedule an annual or semi-annual review of the metric system that asks: Are these metrics still aligned with our current strategy? Are they driving the behaviors we want? Has gaming eroded their validity? Are there important dimensions we are not measuring? The metric system should evolve with the organization, not remain frozen in the strategic context of its original design.


A Worked Example: Designing Metrics for a Product Team

To illustrate the complete metric design process, consider a product team responsible for a B2B SaaS platform that needs to design its core metric set.

Step 1: Identify Strategic Goals

The team identifies three strategic goals: grow the customer base, increase customer value (revenue per customer), and reduce churn. These goals come from the company's annual strategy and are the team's primary accountability.

Step 2: Derive Candidate Metrics

For each goal, the team brainstorms potential metrics:

  • Grow customer base: New customer acquisition rate, trial-to-paid conversion rate, referral rate, website-to-trial conversion rate, sales pipeline size.
  • Increase customer value: Average revenue per account, feature adoption rate, upsell/cross-sell rate, usage depth.
  • Reduce churn: Monthly churn rate, customer satisfaction score, customer health score, time-to-first-value for new customers, support ticket volume trend.

Step 3: Select Key Metrics (3-5)

From the candidates, the team selects five key metrics:

  1. Trial-to-paid conversion rate (leading indicator of customer growth)
  2. Monthly active usage rate (leading indicator of both value and retention)
  3. Net revenue retention (lagging indicator combining upsell, cross-sell, and churn)
  4. Customer satisfaction score (leading indicator of retention)
  5. Time-to-first-value (leading indicator of onboarding effectiveness, which drives conversion and retention)

Step 4: Define Each Metric Precisely

For each metric, the team writes a precise definition. For example:

Monthly active usage rate: The percentage of paid accounts that have at least one user who logged in and performed at least one core workflow action during the calendar month. Excludes accounts in their first 30 days (which are measured separately by time-to-first-value). Measured on the first business day following the end of each month. Target: 80% or higher.

Step 5: Identify Gaming Risks and Countermeasures

The team anticipates gaming risks for each metric:

  • Trial-to-paid conversion: Sales could lower the bar for what counts as a "paid" customer (e.g., offering deeply discounted trials). Countermeasure: track 90-day retention of converted customers as a companion metric.
  • Monthly active usage: Product could gamify login behavior (e.g., daily email prompts that inflate logins without increasing genuine usage). Countermeasure: require a core workflow action, not just a login.
  • Net revenue retention: Sales could push unnecessary upgrades that customers later cancel. Countermeasure: track customer satisfaction alongside revenue retention.

Step 6: Set Review Cadences

  • Trial-to-paid conversion and time-to-first-value: Weekly review (these are operational metrics with short feedback cycles).
  • Monthly active usage and customer satisfaction: Monthly review (these reflect trends that emerge over weeks, not days).
  • Net revenue retention: Quarterly review (this reflects longer-term customer behavior).

Step 7: Design the Dashboard

The team creates a single-page dashboard that shows all five metrics with current values, targets, trends (13-week sparklines), and status indicators (green/yellow/red). The dashboard is reviewed at the weekly product team meeting (with focus on the weekly metrics) and at the monthly product review (with focus on all five metrics). The quarterly business review includes net revenue retention analysis with cohort breakdowns that reveal whether the retention pattern is improving or deteriorating across successive customer cohorts.


Building a Metric Culture

Metrics do not exist in a vacuum. They exist within organizational cultures that determine how metrics are interpreted, how they are used, and whether they drive improvement or gaming. Building a healthy metric culture is as important as designing good metrics, and it requires attention to several dynamics.

Metrics as Learning Tools, Not Punishment Tools

When metrics are used primarily to identify and punish poor performers, the organizational response is predictable: people game the metrics, hide information, and focus on looking good rather than doing well. When metrics are used primarily as learning tools, to understand what is working, what is not, and what to try differently, the organizational response is very different: people engage with the metrics honestly, share information openly, and focus on improving rather than appearing to improve.

The difference is not in the metrics themselves but in how leadership uses them. A metric review meeting where the leader asks "Why is this number down? Whose fault is it?" creates a punishment culture. A metric review meeting where the leader asks "What is this metric telling us about our system? What should we try differently?" creates a learning culture. Same metric, same data, radically different organizational behavior.

Transparency About Limitations

A healthy metric culture acknowledges that all metrics are imperfect, that metric improvement does not always correspond to real improvement, and that some important things resist measurement. This transparency protects against the two extremes of metric dysfunction: metric worship (treating metrics as infallible) and metric cynicism (dismissing all metrics as meaningless). The healthy middle ground is metric pragmatism: using metrics as useful but imperfect tools that inform rather than dictate decisions.

Regular Metric Audits

Just as financial systems require periodic audits, metric systems benefit from periodic reviews that assess whether the metrics are still measuring what they are supposed to measure, whether gaming has eroded their validity, whether the metrics are still aligned with current strategic goals, and whether important dimensions are going unmeasured. An annual "metric audit" that examines each key metric against these criteria can prevent the gradual degradation that turns initially useful metrics into actively harmful ones.

Celebrating Insight, Not Just Improvement

A healthy metric culture celebrates not only when metrics improve but also when metrics reveal something unexpected or uncomfortable. When a metric shows a problem that nobody knew existed, the appropriate response is gratitude (the metric is doing its job) rather than blame (someone must have caused this). When a metric reveals that a popular initiative is not working, the appropriate response is curiosity (why isn't it working? what should we try instead?) rather than defensiveness (the metric must be wrong).

This cultural orientation, where bad news from metrics is treated as valuable information rather than as an occasion for punishment, is what separates organizations that genuinely learn from their metrics from organizations that use metrics primarily for performance theater.


Metrics Across Different Organizational Contexts

Different organizational contexts create different metric design challenges, and understanding these context-specific challenges helps you design metrics that are appropriate for your situation.

Startup Metrics

Startups face a distinctive metric challenge: they are operating in conditions of extreme uncertainty where the fundamental business model is still being validated. In this context, traditional business metrics (revenue growth, profitability, market share) may be misleading because they are measuring the performance of a business model that may itself be wrong.

Startup metrics should focus on learning velocity: how quickly the organization is discovering what works. Metrics like experiment cycle time (how quickly can we test a hypothesis?), validated learning rate (how many of our hypotheses have been confirmed or rejected?), and customer problem-solution fit (do customers actually have the problem we think they have? does our solution address it?) are more useful than revenue metrics during the early stages when the primary goal is learning, not scaling.

Enterprise Metrics

Large enterprises face the opposite challenge: they have extensive measurement infrastructure and decades of historical data, but their metric systems often suffer from metric proliferation (too many metrics), metric inertia (metrics that persist long after the strategic context that justified them has changed), and metric silos (each department measures its own performance without considering cross-departmental effects).

Enterprise metric design should focus on metric rationalization (reducing the number of metrics to the critical few), metric alignment (ensuring that departmental metrics are consistent with enterprise-level goals), and cross-functional metrics (metrics that measure outcomes that depend on collaboration between departments, which counteract the silo effect).

Non-Profit and Public Sector Metrics

Non-profit and public sector organizations face perhaps the most difficult metric design challenge of all: their most important outcomes (poverty reduction, educational attainment, community health, environmental quality, public safety) are inherently difficult to measure, slow to change, and influenced by many factors beyond the organization's control.

The temptation in these contexts is to fall back on activity metrics (number of people served, number of programs delivered, money spent) that are easy to count but that do not capture whether the organization's work is actually producing the intended outcomes. The better approach, though more difficult, is to design outcome-oriented metrics that measure genuine changes in the conditions the organization is trying to improve, supplemented by qualitative assessment where quantitative measurement is not feasible, and with explicit acknowledgment that the organization's contribution to the outcome is only one of many factors.

Effective metrics in non-profit and public sector contexts also need to account for attribution complexity: the fact that measured outcomes are influenced by many factors beyond the organization's programs. A youth development organization cannot take sole credit for improved graduation rates in its target community, but it can measure whether youth who participate in its programs have better outcomes than comparable youth who do not, which provides evidence of the program's contribution to the outcome.


The Discipline of Good Measurement

Designing effective metrics is ultimately a discipline of clear thinking about what matters, what can be observed, and how measurement shapes behavior. The organizations that measure most effectively are not those with the most sophisticated analytics tools or the most extensive data collection, they are the organizations that think most clearly about what they are trying to achieve, what would tell them whether they are achieving it, and what unintended behaviors their measurements might create.

This discipline requires the courage to measure what is important even when it is difficult, the restraint to measure only what is needed even when more data is available, the honesty to acknowledge when metrics are being gamed even when the gamed numbers look impressive, and the wisdom to treat metrics as useful tools rather than as objective truth. Organizations that cultivate this discipline build measurement systems that genuinely drive improvement. Organizations that lack it build measurement systems that generate impressive dashboards and deteriorating performance.


References and Further Reading

  1. Goodhart, C. A. E. (1984). Problems of monetary management: The U.K. experience. In Monetary Theory and Practice. Palgrave. https://doi.org/10.1007/978-1-349-17295-5_4

  2. Muller, J. Z. (2018). The Tyranny of Metrics. Princeton University Press. https://press.princeton.edu/books/hardcover/9780691174952/the-tyranny-of-metrics

  3. Deming, W. E. (1993). The New Economics for Industry, Government, Education. MIT Press. https://mitpress.mit.edu/9780262541169/the-new-economics-for-industry-government-education/

  4. Kaplan, R. S. & Norton, D. P. (1996). The Balanced Scorecard: Translating Strategy into Action. Harvard Business School Press. https://store.hbr.org/product/the-balanced-scorecard-translating-strategy-into-action/8028

  5. Doerr, J. (2018). Measure What Matters: How Google, Bono, and the Gates Foundation Rock the World with OKRs. Portfolio/Penguin. https://www.penguinrandomhouse.com/books/556189/measure-what-matters-by-john-doerr/

  6. Hubbard, D. W. (2010). How to Measure Anything: Finding the Value of Intangibles in Business (2nd edition). Wiley. https://www.wiley.com/en-us/How+to+Measure+Anything-p-9780470539392

  7. Kerr, S. (1975). On the folly of rewarding A, while hoping for B. Academy of Management Journal, 18(4), 769-783. https://doi.org/10.5465/255378

  8. Wheeler, D. J. (2000). Understanding Variation: The Key to Managing Chaos. SPC Press. https://www.spcpress.com/book_understanding_variation.php

  9. Parmenter, D. (2015). Key Performance Indicators: Developing, Implementing, and Using Winning KPIs (3rd edition). Wiley. https://www.wiley.com/en-us/Key+Performance+Indicators-p-9781118925102

  10. Ariely, D. (2010). You are what you measure. Harvard Business Review, 88(6), 38. https://hbr.org/2010/06/column-you-are-what-you-measure

  11. Strathern, M. (1997). "Improving ratings": Audit in the British university system. European Review, 5(3), 305-321. https://doi.org/10.1002/(SICI)1234-981X(199707)5:3%3C305::AID-EURO184%3E3.0.CO;2-4

  12. Caulkin, S. (2008). The rule is simple: be careful what you measure. The Observer. https://www.theguardian.com/business/2008/feb/10/businesscomment1

  13. McNamara, R. S. (1995). In Retrospect: The Tragedy and Lessons of Vietnam. Times Books. https://www.penguinrandomhouse.com/books/108898/in-retrospect-by-robert-s-mcnamara/