Metrics & Measurement When Notes Fly

What Are Metrics?

Metrics are quantitative measures used to track progress, evaluate performance, and inform decisions. In theory, they're objective signals that tell you what's working and what isn't. In practice, they're proxies imperfect representations of complex realities.

The origins of modern performance measurement trace back to Frederick Winslow Taylor's scientific management in the early 1900s, which introduced systematic measurement to industrial work. Taylor's timeandmotion studies established the principle that "what gets measured gets managed" a truth that has since revealed both its power and its perils. Later, Peter Drucker refined management measurement with his emphasis on Management by Objectives (MBO) in the 1950s, arguing that organizations need clear, measurable goals to function effectively.

The fundamental challenge of measurement, articulated by mathematician George Box in his famous aphorism "all models are wrong, but some are useful," is that anything worth measuring is too complex to capture with a single number. Yet you need numbers to make informed decisions at scale. So you choose proxies. Revenue represents business health (but ignores sustainability and customer satisfaction). Daily active users represent engagement (but ignore depth of value and quality of experience). Lines of code represent productivity (but reward bloat over quality and maintainability).

W. Edwards Deming, the statistician who revolutionized quality management, warned against the tyranny of visible figures: "The most important things cannot be measured." His 14 Points for Management emphasized that overreliance on numerical quotas destroys quality and innovation. Deming observed that organizations become what they measure if you measure only efficiency, you optimize for speed at the expense of quality; if you measure only output, you sacrifice thoughtfulness for volume.

The systems thinking perspective, championed by researchers like Jay Forrester and Donella Meadows, reveals that metrics exist within larger systems with feedback loops, delays, and unintended consequences. Meadows' concept of "leverage points" shows that where you measure matters as much as what you measure intervening at the wrong point in a system, even with accurate metrics, produces minimal impact or backfires entirely.

Good metrics illuminate truth. Bad metrics distort it. The difference between the two isn't always obvious and that's the problem. As economist Kenneth Boulding noted, "What is measurable may not be important, and what is important may not be measurable." The art of measurement lies in recognizing this tension and navigating it thoughtfully.

Key Insight: Metrics aren't reality. They're mental models simplified representations of reality. The moment you forget this distinction, your metrics stop being useful and start being dangerous. As Alfred Korzybski famously said, "the map is not the territory." Your metrics are the map; don't mistake them for the territory itself.

Why Metrics Matter (and Fail)

Organizations run on metrics. You can't manage what you can't measure, as Peter Drucker famously argued. Goals need quantification. Strategy requires accountability. Teams need alignment on what success looks like. All true. Research by Edwin Locke and Gary Latham on goalsetting theory demonstrates that specific, measurable goals improve performance significantly in their metaanalysis of 35 years of research, they found that challenging goals improve performance 90% of the time compared to vague "do your best" goals.

Yet the same measurement systems that enable organizational coordination also create predictable dysfunctions. Robert Austin's book Measuring and Managing Performance in Organizations (1996) systematically analyzed why measurement systems fail, identifying a fundamental paradox: the more you rely on metrics for evaluation and incentives, the less reliable those metrics become as indicators of actual performance.

Metrics fail in predictable ways:

Gaming: When metrics become targets, people optimize for the metric rather than the goal. Hit the number, miss the point. Economist Charles Goodhart formalized this in Goodhart's Law, while Marilyn Strathern generalized it: "When a measure becomes a target, it ceases to be a good measure." Research by Maurice Schweitzer and colleagues found that specific, challenging goals increased unethical behavior by 57% compared to vague goals when people fell short.
Narrow focus: Metrics create tunnel vision. You see what you measure and miss everything else. The unmeasured atrophies. Psychologists call this inattentional blindness when attention is focused on one metric, other important information becomes invisible. Daniel Kahneman and Amos Tversky's research on cognitive biases shows that what's easily measured dominates decisions, even when other factors are more important.
Complexity reduction: Important things are often multidimensional and qualitative. Metrics force them into single numbers, losing nuance. James C. Scott's book Seeing Like a State documents how topdown measurement schemes fail by ignoring local knowledge and contextual complexity what Scott calls "legibility" comes at the cost of understanding.
Lagging indicators: By the time most metrics show a problem, it's too late to prevent it. You're measuring outcomes, not causes. Balanced Scorecard pioneer Robert Kaplan emphasizes the critical distinction between outcome measures (what happened) and performance drivers (what causes outcomes) most organizations overindex on the former and underinvest in the latter.
Motivation displacement: Extrinsic metrics undermine intrinsic motivation. Psychologists Edward Deci and Richard Ryan's SelfDetermination Theory shows that controlling reward systems (including metricbased incentives) reduce creativity, persistence, and quality. When people work for the metric rather than the mission, performance quality suffers even as measured performance improves.

The best metrics systems acknowledge these limitations explicitly. They combine quantitative and qualitative measures. They watch for gaming through secondorder thinking. They use metrics to inform decisions, not make them. As Deming emphasized, management requires judgment that transcends numbers metrics provide data points, but decisions require wisdom, context, and understanding of the unmeasurable.

Goodhart's Law: When Measures Become Targets

Core principle: When a measure becomes a target, it ceases to be a good measure.

Named after British economist Charles Goodhart, who formulated it in a 1975 article while serving as an advisor to the Bank of England, this principle explains why so many metrics systems fail. Goodhart observed that any observed statistical regularity breaks down once pressure is placed on it for control purposes monetary targets that worked as predictive indicators became useless once adopted as policy targets.

Anthropologist Marilyn Strathern later generalized the principle: "When a measure becomes a target, it ceases to be a good measure." The moment you tell people they'll be evaluated on a metric, they start optimizing for that metric often in ways that hit the number while undermining the original goal. Related to this is Campbell's Law, articulated by psychologist Donald Campbell in 1976: "The more any quantitative social indicator is used for social decisionmaking, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor."

Classic examples:

Soviet nail factory: The canonical example from Soviet central planning. Measured by weight, factories produced heavy, useless nails. Measured by quantity, they produced tiny, useless nails. Measured by assortment, they produced exactly the specified distribution regardless of actual demand. The metric became the target; utility was ignored. Economic historian Alec Nove documented these perverse outcomes extensively in The Soviet Economic System, showing how centralized measurement divorced from market feedback created systematic dysfunction.
Wells Fargo scandal: From 20112016, Wells Fargo measured branch employees by number of accounts opened per customer (the "crosssell" metric, with targets of 8 accounts per customer). Employees opened over 3.5 million fraudulent accounts to hit targets. The bank paid $3 billion in fines. The metric was gamed; customers were harmed. CEO John Stumpf resigned. This case study appears in business ethics courses as a textbook example of metricinduced misconduct.
Cobra effect: During British colonial rule in India, the government offered bounties for dead cobras to reduce the snake population in Delhi. Enterprising locals started breeding cobras for bounty income. When the government discovered this and ended the program, breeders released their nowworthless snakes, making the problem worse than before. The incentive backfired completely. This perverse incentive pattern appears across domains similar stories emerged with rat bounties in Hanoi and bounties for wild boar tusks in Fort Benning, Georgia.
No Child Left Behind: U.S. educational policy tied school funding to standardized test scores. Result: widespread teaching to the test, narrowed curriculum (cutting arts, physical education, social studies), and documented cheating scandals in Atlanta, Washington D.C., and other cities where administrators altered student answer sheets. Scores improved; education quality didn't. Campbell's Law in action.
Soviet tractor production: Factories measured by total weight produced tractors so heavy they sank in the fields they were meant to plow. Measured by number produced, they made miniature tractors that couldn't pull equipment. Quality metrics were ignored because they weren't measured or weren't measured as heavily as quantity.

Economist Steven Levitt and journalist Stephen Dubner explored metric gaming extensively in Freakonomics, showing how teachers cheat on standardized tests, sumo wrestlers rig matches, and real estate agents manipulate metrics. Their research reveals that gaming isn't an aberration it's the rational response to poorly designed measurement systems.

Goodhart's Law isn't a reason to abandon metrics. It's a warning: design your metrics assuming people will game them, because they will. Behavioral economist Dan Ariely's research shows that even honest people engage in "creative accounting" when metrics determine rewards the issue isn't moral failure, it's predictable human response to incentive structures.

Practical defense: Use paired metrics. Track quality alongside quantity. Measure outcomes alongside outputs. Include constraints: "increase conversion rate without increasing bounce rate." Watch for unintended consequences through systems thinking. Rotate metrics periodically. As management theorist Russell Ackoff noted, "The more complex a system, the more measurement can mislead." Design for complexity, not simplicity.

Leading vs Lagging Indicators

Understanding the difference between leading and lagging indicators is fundamental to building useful measurement systems. This distinction became central to management practice through Robert Kaplan and David Norton's Balanced Scorecard framework (1992), which argued that organizations need to track both outcome measures and performance drivers across four perspectives: financial, customer, internal processes, and learning and growth.

Lagging Indicators

What they are: Outcomes that measure results after they occur. Revenue, profit, customer churn, market share, conversion rate, return on investment. Kaplan and Norton call these "outcome measures" they tell you whether your strategy succeeded.

Strengths: Easy to measure objectively, directly tied to business results, validate whether your strategy worked, provide clear accountability. Financial lagging indicators are typically audited and reliable revenue is revenue, profit is profit. Research by Christopher Ittner and David Larcker found that firms using comprehensive measurement systems including lagging financial indicators showed 5% higher return on assets than those using financial metrics alone.

Weaknesses: Come too late to change the outcome (by the time you know revenue dropped, the quarter is over), tell you what happened but not why (churn went up but what caused it?), can't drive daytoday decisions (checking revenue daily doesn't help salespeople sell). As Clayton Christensen noted in The Innovator's Dilemma, overreliance on lagging financial indicators blinds companies to disruptive threats current profitability looks great right up until your business model becomes obsolete.

Leading Indicators

What they are: Predictive measures that forecast future outcomes. Sales pipeline meetings, feature usage depth, Net Promoter Score (NPS), engagement metrics, trialtopaid conversion rate, employee satisfaction. Kaplan and Norton call these "performance drivers" they predict future success and can be influenced today.

Strengths: Actionable in realtime (if activation rate drops, you can intervene immediately), help you coursecorrect before problems compound (warning signs appear early), drive daily behavior (clear metrics for what to optimize today), reveal causality (show what actions lead to outcomes). Research by Zeynep Ton at MIT showed that retailers who invested in leading indicators like employee training and inventory management had 810% higher sales than competitors, even though these investments hurt shortterm profitability metrics.

Weaknesses: Harder to identify (what actually predicts outcomes? correlation ? causation), noisier signals (daily fluctuations create false alarms), indirect relationship to business results (NPS doesn't directly measure revenue), easy to game (sales teams can inflate pipeline by adding unqualified leads). Research by Ittner, Larcker, and Meyer found that 70% of companies implementing Balanced Scorecard failed to improve performance because they chose poor leading indicators that didn't actually predict their lagging outcomes.

The Right Balance

Effective measurement systems use both. Lagging indicators to validate results. Leading indicators to drive action. The key is identifying which leading indicators actually predict the lagging outcomes you care about and that requires experimentation, data, and honest feedback loops.

Andrew Ehrenberg and Byron Sharp's marketing science research revealed that many assumed leading indicators (brand preference, loyalty metrics) correlate weakly with actual purchasing behavior the lagging outcome that matters. Their work in How Brands Grow shows the danger of optimizing leading indicators that don't predict the results you care about.

The discipline of operations research, pioneered by George Dantzig and others, provides statistical methods for identifying predictive relationships regression analysis, correlation studies, timeseries forecasting. The question isn't "what could predict success?" but "what does predict success, based on our actual data?"

Example SaaS Business:

Lagging: Monthly Recurring Revenue (MRR), annual churn rate, customer lifetime value (LTV), net revenue retention
Leading: Trial signup rate, activation rate (users who complete setup), feature adoption (users who engage with core feature in first week), usage frequency (weekly active users), Net Promoter Score

You track MRR to know if you're winning. You optimize activation and usage frequency to make yourself win. But the connection must be validated: does improving activation actually increase retention and MRR? Product analytics research shows that many companies optimize vanity leading indicators (signups, downloads) that don't predict revenue true leading indicators predict outcomes, not just correlate with them.

North Star Metrics: Finding Your One Number

A North Star Metric is the single metric that best captures the core value you deliver to customers. It's the one number that, if it goes up consistently, your business grows. The concept was popularized by growth teams at companies like Intercom and Amplitude as a way to align entire organizations around a single definition of success.

Why it matters: Organizations drown in metrics. Dashboards with 50 KPIs create diffusion of focus what Herbert Simon called "attention scarcity." Research by Don Moore and colleagues shows that information overload reduces decision quality by 2030% more data doesn't help if people don't know what matters most. A North Star cuts through noise and aligns everyone on what success looks like, providing what organizational theorists call "strategic clarity."

Examples of North Star Metrics

Airbnb: Nights booked (captures value for both hosts and guests, directly correlates with revenue). Airbnb's data team chose this over alternatives like "bookings" because it better represented realized value a booking means nothing if it's canceled; a night stayed means someone experienced the service.
Facebook: Daily active users (DAU) proxy for engagement and network value. Facebook discovered through analysis that DAU predicted ad revenue better than page views or time spent. The metric evolved as the company grew to "daily active people" across all Facebook properties (Instagram, WhatsApp, Messenger) to capture crossplatform value.
Slack: Messages sent by teams (indicates actual usage, not just signups). Slack's growth team found that teams sending 2,000+ messages had 93% retention, making "messages sent" the key predictor of longterm value.
Amazon: Purchases per customer per month (repeat purchases = satisfaction and habit formation). Jeff Bezos famously focused on repeat purchase rate over acquisition, arguing that the customer experience (measured by repeat behavior) was the ultimate metric of success.
Spotify: Time spent listening (not just subscribers listening time predicts retention and willingness to pay). Spotify's research showed that users who listened 5+ hours per week in their first month had 10x higher 1year retention.
Medium: Total time reading (early metrics were "posts published," but this created lowquality content; shifting to "time reading" optimized for quality). Medium's product team discovered that time reading correlated with both reader satisfaction and writer engagement better than claps or views.

How to Identify Your North Star

What value do customers get? Not what you provide, but what they experience. What's the "aha" moment? Product researcher Casey Winters calls this finding the "activation moment" the point where users experience your core value proposition. For Dropbox, it's "store a file from one device, access it on another." For Yelp, it's "find a good restaurant based on reviews."
What behavior indicates they're getting value? What do satisfied customers actually do? Sean Ellis's productmarket fit framework asks: what behavior predicts that users would be "very disappointed" if they could no longer use your product? That behavior is your North Star.
Is this leading or lagging? Leading indicators predict growth. Lagging indicators confirm it after the fact. Your North Star should be leading if it goes up, revenue follows. Research by growth advisor Brian Balfour shows effective North Stars are "leading indicators of sustainable growth, not vanity metrics or lagging indicators."
Does this correlate with revenue? Your North Star should predict business outcomes, not just user activity. Run correlation analysis: does improving this metric actually predict revenue growth 16 months later? Andrew Chen warns against "fake North Stars" that look impressive but don't drive business results.
Is this sustainable? Can you grow this metric indefinitely without destroying value? Twitter's early focus on daily active users created pressure for engagementmaximizing algorithms that degraded user experience. A sustainable North Star measures value creation, not just value extraction.

Watch out for: Choosing a metric that's easy to game (signups but most never activate), purely lagging (revenue doesn't tell you what to optimize), or disconnected from customer value (page views doesn't mean value delivered). Your North Star should be hard to fake and clearly tied to the value exchange. As product leader Lenny Rachitsky emphasizes, "Your North Star should be something that matters to both your business and your customer where their success equals your success."

The concept connects to systems thinking your North Star is a leverage point in the system, a place where focused effort produces disproportionate results. Choosing the right one requires understanding your business model's causal structure: what drives what? This is where first principles thinking helps strip away assumptions and ask: what really creates value here?

Vanity Metrics vs Actionable Metrics

Not all metrics are created equal. Vanity metrics make you feel good but don't inform decisions. Actionable metrics tell you what to do differently. This distinction was popularized by Eric Ries in The Lean Startup (2011) and expanded by Alistair Croll and Benjamin Yoskovitz in Lean Analytics (2013).

Vanity Metrics

Characteristics: Go up over time regardless of business health (cumulative numbers always increase), aggregate totals without segmentation (hide crucial differences between cohorts), measures you can't influence directly (too distant from actionable levers), makes good headlines but poor decisions (impressive for press releases, useless for operators).

Examples: Total registered users (includes dead accounts from years ago Facebook has 2+ billion "active users" but how many are truly engaged?), page views (doesn't indicate value delivered content farms maximize page views with lowquality clickbait), app downloads (research shows 25% of apps are never opened after download, per Adjust's mobile benchmarks), social media followers (engagement rate matters more many celebrities have millions of followers but minimal influence, as documented in HBR's research on social influence), gross revenue (ignores customer acquisition cost, churn, and profitability WeWork's revenue growth looked impressive until you examined unit economics).

Why they're seductive: They're always going up (cumulative totals can't decrease). They sound impressive in presentations. They're easy to report. But they don't help you decide what to do Monday morning. As Ries notes, "vanity metrics wreck havoc because they prey on your weakness for good news" and create illusion of progress.

Actionable Metrics

Characteristics: Segmented by relevant dimensions (cohorts, channels, customer types), tied to specific behaviors you can influence (clear causeeffect), indicate causeeffect relationships (help you understand why, not just what), reveal insights that change decisions (inform specific actions).

Examples: Conversion rate by traffic source (tells you where to focus acquisition spending KISSmetrics research shows conversion rates vary 510x by channel), activation rate by signup source (reveals which channels bring quality users who actually use the product), churn rate by cohort (shows if you're improving retention over time essential for SaaS businesses where reducing churn from 5% to 3% monthly can double customer lifetime value), revenue per customer segment (identifies valuable customers to focus on the Pareto principle applies: typically 20% of customers generate 80% of profit), cohort retention curves (show if product improvements stick retained users in month 6 should be higher for recent cohorts than old ones).

Investment researcher Michael Mauboussin distinguishes between "results" (outcomes you care about) and "process" (actions you control). Vanity metrics measure results without connecting to process. Actionable metrics connect the two: "If we improve X behavior, Y outcome will improve." His book The Success Equation emphasizes that sustainable performance requires tracking the activities and processes that produce results, not just the results themselves.

The Litmus Test

Ask yourself: If this metric doubled tomorrow, would I know what action to take?

If yes, it's actionable. If no, it's vanity. Focus ruthlessly on the former. Ignore the latter no matter how impressive they look on a slide.

Additional tests from Lean Analytics:

Comparative: Can you compare this metric across time periods, groups, or competitors? "Signups increased" is vanity. "Signups from organic search increased 15% while paid decreased 5%" is actionable.
Understandable: Can everyone in your organization understand what this metric means and why it matters? Complex derived metrics that require explanation are less actionable.
Ratio or rate: Ratios and rates are better than absolute numbers they're normalized and reveal efficiency. "100 signups" means nothing without knowing traffic. "5% conversion rate" is comparable and actionable.
Behaviorchanging: Does this metric drive different behavior? If a metric goes up or down, do you know what to do differently? If not, it's decoration.

The framework connects to critical thinking question what you're measuring and why. It also links to decisionmaking metrics exist to improve decisions, not to create reports. As management consultant Tom DeMarco wrote, "You can't control what you can't measure" became perverted into "You must control everything you measure," leading organizations to measure everything and control nothing.

Proxy Metrics and Surrogates: Measuring the Unmeasurable

Many things that matter can't be measured directly. Customer satisfaction, product quality, innovation, organizational culture, employee morale these are multidimensional, qualitative, and contextdependent constructs. So we use proxies: indirect measures that correlate with what we actually care about. In research methodology, this is called construct validity does your measure actually capture the underlying construct you care about?

The Proxy Problem

Proxies are useful but dangerous. They work when the proxy correlates strongly with the underlying goal. Psychologist Lee Cronbach and sociologist Donald Campbell established the framework for evaluating proxies in their work on validity in measurement. They identified that proxies fail when:

The correlation breaks: What predicted success in the past stops predicting it due to market shifts, scale effects, or environmental changes. Nassim Taleb's work on black swans and antifragility shows that correlations valid in normal times break down in extremes proxies that worked in stability fail in volatility. The 2008 financial crisis exemplified this: risk models that used historical correlations as proxies for future risk failed catastrophically when correlations changed.
You optimize the proxy:Goodhart's Law strikes. You hit the proxy metric while missing the actual goal. Campbell called this "teaching to the test" when test scores become the goal, education suffers even as scores rise. Economist Thomas Sowell documented this extensively in education research, showing how standardized test scores as proxies for learning quality broke down once schools optimized for scores rather than learning.
You forget it's a proxy: The metric becomes the goal itself. You mistake the map for the territory. Psychologist Daniel Kahneman calls this "attribute substitution" when faced with a hard question (is the product high quality?), we unconsciously substitute an easier question (does it have good ratings?) and answer that instead, forgetting that ratings are a proxy, not the thing itself.
Construct drift: The underlying construct you're trying to measure changes over time, but your proxy doesn't adapt. Intelligence testing pioneer Alfred Binet warned that IQ tests were proxies for specific cognitive abilities in specific contexts, not measures of general intelligence yet they were treated as if they measured an immutable trait.

Using Proxies Well

Acknowledge they're proxies. Never pretend a proxy is the real thing. Customer survey scores aren't satisfaction they're signals that correlate with satisfaction. Statistician John Tukey said, "Far better an approximate answer to the right question than an exact answer to the wrong question." Know which question you're actually answering.
Use multiple proxies. Triangulate. If all your proxies point the same direction, you can be more confident. Campbell and Donald Fiske's "multitraitmultimethod matrix" (1959) established that valid measurement requires convergence across multiple methods if different proxies disagree, you don't know what's really happening.
Validate regularly. Check that your proxy still correlates with the underlying outcome. Correlations drift over time as contexts change. Run periodic concurrent validity studies: does improving the proxy actually improve the outcome you care about? Research by Harris and Tayler in HBR found that 60% of corporate metrics had weak or negative correlation with the business outcomes they were meant to predict.
Combine with qualitative data. Talk to customers. Observe behavior. Numbers reveal patterns; conversations reveal causes. Anthropologist Clifford Geertz's concept of "thick description" emphasizes that understanding requires context, meaning, and narrative things quantitative proxies can't capture. Use qualitative research to understand what your quantitative proxies mean.
Test proxy validity empirically. Don't assume a proxy works validate it. If you use NPS as a proxy for customer retention, check the correlation: do customers with high NPS actually renew at higher rates? Predictive validity is testable, not assumed.

Example Measuring Code Quality:

You can't measure "code quality" directly it's a complex construct including maintainability, reliability, performance, security, and readability. So you use proxies:

Code coverage (higher = more tested, presumably better but 100% coverage doesn't guarantee quality, and you can game coverage metrics)
Cyclomatic complexity (lower = simpler, presumably better developed by Thomas McCabe in 1976)
Bug reports per release (fewer = higher quality but this assumes bugs are reported and tracked consistently)
Time to onboard new developers (faster = more readable and documented a lagging indicator of maintainability)
Technical debt tracking (fewer shortcuts taken = better longterm quality)

None perfectly captures quality. Together, they give you a picture. And you still need code review by experienced developers qualitative assessment matters. As software engineering researcher Steve McConnell notes in Code Complete, "The jury is still out on whether any objective measure of software quality exists." Proxies help, but judgment matters.

This connects to systems thinking proxies exist within larger measurement systems where they interact with incentives, behaviors, and feedback loops. Understanding these dynamics, not just the proxies themselves, determines whether your measurement system illuminates or distorts reality.

The McNamara Fallacy: When Metrics Miss What Matters

The McNamara Fallacy is the error of making decisions based solely on quantitative metrics while ignoring qualitative factors that can't be measured. It's named after Robert McNamara, U.S. Defense Secretary during the Vietnam War (19611968), whose reliance on quantitative analysis epitomized this dysfunction.

The Vietnam War Example

McNamara, a former Ford Motor Company executive and Harvard Business School professor, brought operations research and statistical analysis to the Pentagon. He measured war progress with quantifiable metrics: enemy body counts ("kill ratios"), weapons captured, villages secured, bombing tonnage delivered, "pacification" percentages. His "Whiz Kids" produced detailed reports showing steady progress. All the numbers showed success. The war was unwinnable.

Why? The metrics missed what mattered: political will of the North Vietnamese, local population support (or lack thereof), effectiveness of guerrilla tactics against conventional forces, cultural factors driving resistance, strategic patience versus American pressure for measurable progress, morale and commitment (ours versus theirs). These were qualitative, hard to measure, so they were ignored. Historian David Halberstam documented this in The Best and the Brightest (1972), showing how brilliant quantitative thinkers created strategic failure by measuring what was measurable rather than what was important.

Body counts exemplify the problem. The metric assumed: more enemy dead = closer to victory. Reality: Viet Cong and North Vietnamese forces could sustain 100,000+ casualties per year and continue fighting their commitment was political and ideological, not rational calculus that more deaths meant defeat. Worse, the metric was systematically gamed: body counts were inflated (counting civilians, doublecounting, pressure on units to report high numbers), creating false signals that operations were succeeding. The result: strategic failure despite tactical "success."

The Four Fallacies

Social scientist Daniel Yankelovich formalized these errors as the McNamara Fallacy in Corporate Priorities: A continuing study of the new demands on business (1972):

Measure whatever can be easily measured (convenience over correctness) prioritize available data over relevant data
Disregard what can't be measured (if you can't count it, it doesn't count) omit qualitative factors from analysis
Assume what can't be measured isn't important (unmeasured = unimportant) devalue intangibles like morale, culture, trust
Conclude that what can't be measured doesn't exist (reality = what we can quantify) deny the existence of unmeasurable factors

This progression represents increasing epistemic arrogance from practical measurement choices to philosophical denial of unmeasurable reality. Philosopher C. West Churchman warned against this "quantification bias" in The Systems Approach (1968), arguing that "the hard problems are the ones we can't measure, and ignoring them doesn't make them go away it makes them dangerous."

Modern Examples

Financial crisis (2008): Risk models focused on quantifiable volatility and historical correlations while ignoring systemic risk, correlation breakdown during stress, and qualitative factors like perverse incentives and moral hazard. Quants measured portfolio risk with Value at Risk (VaR) models that failed catastrophically. Nassim Taleb's critique in The Black Swan showed how quantitative risk management created false confidence.
Healthcare quality: Measuring hospital performance by readmission rates and surgery volumes while ignoring patient outcomes, quality of life, and appropriateness of care. Surgeon Atul Gawande's essays document how measurementdriven healthcare can optimize metrics while degrading care quality.
Education policy:No Child Left Behind optimized test scores while narrowing curriculum, reducing arts and enrichment, and teaching to tests rather than fostering learning. Psychologist Howard Gardner's work on multiple intelligences demonstrates that standardized tests capture narrow cognitive abilities, ignoring creativity, social intelligence, and practical skills.
Police performance:CompStat systems measuring arrests and tickets created pressure to "make numbers," leading to problems documented in The Crime Numbers Game crime statistics manipulated, wrongful arrests, community trust degraded.

How to Avoid It

Acknowledge what you can't measure. Explicitly list the important factors that resist quantification culture, morale, trust, longterm sustainability, strategic optionality. Don't pretend they don't exist. Decision theorist Howard Raiffa advocated creating "shadow prices" for intangibles rough estimates that keep unmeasurable factors in the decision calculus.
Combine quantitative and qualitative. Use metrics for patterns, conversations for understanding. Both matter. Organizational researcher Karl Weick argues for "disciplined imagination" quantitative data constrained by qualitative understanding of context and meaning.
Watch for what metrics miss. If the numbers look great but something feels wrong, investigate. Trust your qualitative judgment. Psychologist Gary Klein's research on naturalistic decisionmaking shows that experienced decisionmakers use pattern recognition and intuition to catch what metrics miss.
Use judgment alongside data. Metrics inform decisions; they don't make them. Context, experience, and wisdom matter. As statistician George Box famously said, "All models are wrong, but some are useful" know the difference between useful approximation and dangerous oversimplification.
Practice qualitative rigor. Qualitative doesn't mean unstructured. Use structured methods for qualitative assessment systematic observation, interviews, case studies, ethnographic research. Sociologist Robert Merton's work on focused interviews provides frameworks for rigorous qualitative inquiry.

"Not everything that counts can be counted, and not everything that can be counted counts."
Often attributed to Einstein (probably apocryphal, but captures the truth). Sociologist William Bruce Cameron actually said it in 1963: "It would be nice if all of the data which sociologists require could be enumerated because then we could run them through IBM machines and draw charts as the economists do. However, not everything that can be counted counts, and not everything that counts can be counted."

Input Metrics vs Output Metrics

Another critical distinction: are you measuring activity or results? This dichotomy appears across management literature under various names: effort versus achievement, process versus outcome, leading versus lagging. Investment strategist Michael Mauboussin frames it as "process versus outcome" good processes sometimes yield bad outcomes (due to luck), and bad processes sometimes yield good outcomes (also luck), so judging quality requires separating the two.

Input Metrics (Activity)

What they measure: Effort, activity, things you do. Hours worked, features shipped, calls made, content published, experiments run, meetings held, emails sent.

Strengths: Within your direct control (you decide what activities to do), can be measured immediately (no lag time), drive consistent effort (clear daily expectations), easy to hold people accountable (did you do the thing?), provide operational visibility (know what people are working on).

Weaknesses: Don't measure results (activity ? impact), reward busyness over effectiveness (presenteeism and "productivity theater"), create false sense of progress (we shipped 50 features but do users care?), can optimize inputs while outcomes deteriorate (more calls, fewer conversions). Management scholar Henry Mintzberg warns that "measurement is not management" tracking inputs without connecting them to outputs creates bureaucracy without performance.

The "broken windows" problem in input metrics: if you measure "tickets closed" (input), engineers close easy tickets and avoid hard ones. If you measure "lines of code written" (input), you get bloated, verbose code. If you measure "sales calls made" (input), you get lowquality, sprayandpray outreach. Inputs divorced from outputs incentivize gaming.

Output Metrics (Results)

What they measure: Outcomes, results, impact. Revenue growth, customer satisfaction scores, market share gained, retention rate, user engagement, quality improvements, problem resolution time.

Strengths: Actually measure success (outcomes matter, not effort), harder to game (though not impossible see Goodhart's Law), align with business goals (revenue, retention, satisfaction are what you ultimately care about), focus attention on what matters (not just activity but achievement).

Weaknesses: Lag behind effort (you do the work now, see results weeks or months later creates attribution problems), influenced by external factors (market conditions, competition, luck makes causality unclear), harder to attribute to specific actions (did sales training improve close rates, or was it better leads?), can be out of individual control (team member contributes but can't control final outcome). Statistician W. Edwards Deming cautioned against holding individuals accountable for outcomes they don't fully control it creates fear and distorts behavior.

The Right Mix

You need both. Input metrics drive daily behavior and provide early signals. Output metrics validate whether that behavior produces results. Problems arise when you optimize inputs without checking outputs (activity without impact), or when you hold people accountable for outputs without clarifying which inputs matter (accountability without guidance).

Andy Grove, former Intel CEO, advocated for "dual measurement" in High Output Management: track both inputs (activities) and outputs (results), looking for misalignment. If inputs are high but outputs are low, your process is broken. If inputs are low but outputs are high, you're either lucky or you've found leverage figure out which and why.

The Balanced Scorecard framework explicitly combines input and output metrics across four perspectives. The "internal processes" quadrant tracks inputs (process efficiency, quality metrics), while "financial" and "customer" quadrants track outputs (revenue, satisfaction). The framework's value is forcing organizations to connect the two: which process inputs predict which outcome outputs?

Research by organizational psychologist Adam Grant on "giver" versus "matcher" productivity shows that input metrics (hours worked, tasks completed) correlate poorly with output metrics (impact delivered, value created) for knowledge workers. High performers work differently, not just harder measuring only inputs misses this entirely.

Example Sales Team:

Input metrics: Calls made (50 per week), emails sent (100 per week), meetings booked (10 per week), proposals sent (5 per week)
Output metrics: Deals closed (revenue generated), customer retention after 12 months, average deal size, time to close, customer satisfaction scores

Track inputs to ensure consistent activity and identify who's doing the work. Measure outputs to see if that activity works and delivers results. If calls are up but deals are down, your process is broken you need to change what you're doing, not just do more of it. If someone closes deals with fewer calls, study their approach they've found leverage. Quality beats quantity, but you need both input and output metrics to see the difference.

Sales researcher Mark Roberge, former CRO of HubSpot, found that top performers made fewer calls but higherquality calls they researched prospects, personalized outreach, and focused on qualified leads. Input metrics alone would have rated them poorly; output metrics showed they were the best. The insight: inputs matter only insofar as they produce outputs.

This connects to leverage the goal isn't maximum activity (inputs) but maximum impact per unit of effort (outputs/inputs). The 80/20 rule suggests that 20% of inputs produce 80% of outputs measuring both helps you identify which inputs have leverage and which are lowvalue busywork.

Choosing the Right Metrics: A Framework

How do you decide what to measure? With infinite possible metrics and finite attention, choice matters. Research by psychologist Barry Schwartz on the paradox of choice shows that too many options reduce decision quality the same applies to metrics. Use these filters:

1. What decision will this metric inform?

If the answer is "none," don't track it. Metrics exist to inform action, not satisfy curiosity. Peter Drucker asked, "What is the one key measurement that will tell us whether we are achieving our objectives?" Every metric should answer: if this changes, what decision changes? If a metric won't change what you do, it's wasting attention what organizational researcher Karl Weick calls "cosmetic measurement," metrics for appearances rather than action.

2. Is this leading or lagging?

You need both. Lagging indicators validate results (did it work?). Leading indicators drive daily behavior (what should I do today?). Skew toward leading indicators for operational metrics that guide action. Research by Kaplan and Norton found that companies using only lagging financial metrics underperformed those balancing leading and lagging indicators across multiple perspectives by 1520% in shareholder returns.

3. Can this metric be gamed?

If yes (and the answer is always yes per Goodhart's Law), pair it with constraints. Don't measure signups without measuring activation (are they real users?). Don't track features shipped without measuring feature usage (does anyone care?). Economist Steven Levitt's research shows that any single metric will be gamed use metric pairs where gaming one hurts the other.

4. Does this measure what I care about, or just what's easy to measure?

Convenience bias is real and pernicious. The easiest things to measure (page views, signups, hours worked) are often the least meaningful. Don't confuse "easy to count" with "important." Psychologist Daniel Kahneman's work on attribute substitution shows we unconsciously substitute hard questions (is this product valuable?) with easy ones (how many downloads does it have?) and answer the easy question instead. Fight this tendency measure what matters, even if it's hard.

5. Will I act differently based on this number?

Another version of question 1. If the metric doubled or halved, would you change your behavior? If not, why are you measuring it? This is the "so what?" test when you see the number, can you say "therefore I should do X"? If not, it's noise.

6. Is this metric aligned with longterm value?

Shortterm metrics can destroy longterm value. Clayton Christensen's research on disruptive innovation shows that firms optimizing shortterm profitability metrics miss disruptive threats. Ask: if we maximize this metric, what gets sacrificed? Customer trust? Employee morale? Product quality? Longterm positioning? Make tradeoffs explicit.

The AARRR Framework (Pirate Metrics)

For product and growth teams, entrepreneur Dave McClure's AARRR framework (2007) provides a useful structure for thinking about the customer journey:

Acquisition: How do people find you? Metrics: traffic sources, cost per acquisition (CPA), conversion rate from visit to signup. Focus: which channels deliver qualified prospects efficiently?
Activation: Do they have a good first experience? Metrics: signuptovalue time, activation rate (users who complete core action), time to "aha moment." Focus: remove friction from getting value. Research by Reforge shows that users who activate in first session have 35x higher retention.
Retention: Do they come back? Metrics: DAU/MAU ratio (daily active users / monthly active users), cohort retention curves, churn rate by segment. Focus: retention is king growth consultant Andrew Chen shows that 5% improvement in retention has more impact than 5% improvement in acquisition.
Revenue: How do you monetize? Metrics: ARPU (average revenue per user), conversion to paid rate, LTV (customer lifetime value), LTV:CAC ratio. Focus: unit economics must work you need to make more from customers than it costs to acquire them.
Referral: Do users bring others? Metrics: Net Promoter Score, viral coefficient (Kfactor), referral rate, time from signup to first referral. Focus: wordofmouth is the most efficient growth channel. Growth researcher Josh Elman shows that products with K>1 (each user brings -->1 new user) achieve exponential organic growth.

Each stage needs its own metrics. Optimize them in order no point optimizing revenue if nobody activates, no point optimizing acquisition if you can't retain users. As product strategist Casey Accidental notes, "Leaks in your funnel compound fix retention before scaling acquisition, or you're pouring water into a leaky bucket."

Additional Frameworks

Google HEART Framework: Happiness (satisfaction, NPS), Engagement (usage frequency, depth), Adoption (new user activation), Retention (repeat usage), Task Success (completion rate, time to complete). Developed by Google's research team for measuring user experience quality.
Balanced Scorecard: Four perspectives (financial, customer, internal process, learning & growth) force you to balance short and longterm metrics, lagging and leading indicators. Most comprehensive for established organizations.
Lean Startup Metrics: Focus on validated learning measure what you learned, not just what you built. BuildMeasureLearn feedback loops with actionable metrics at each stage.

This connects to strategic thinking choosing metrics is choosing what you optimize for, which determines your strategy. It also connects to systems thinking metrics don't exist in isolation, they interact within larger systems with feedback loops and unintended consequences.

Preventing Gaming: Designing Robust Metrics

Metrics will be gamed. This isn't a character flaw it's rational response to incentives. Mechanism design, pioneered by economists Leonid Hurwicz, Eric Maskin, and Roger Myerson (Nobel Prize 2007), studies how to design systems where selfinterested behavior produces socially optimal outcomes. The lesson: design for gaming, don't just hope it won't happen.

1. Use Paired Metrics

Track quality alongside quantity. Measure speed alongside accuracy. Monitor costs alongside revenue. When one metric improves, check that others don't degrade. This is multiobjective optimization force tradeoffs to be explicit rather than hidden.

Examples:

Customer support: Response time AND customer satisfaction (fast responses that don't solve problems game the system)
Sales: Revenue AND customer retention (highpressure sales hit revenue targets but create churn)
Engineering: Features shipped AND bug rate (shipping fast but buggy code games velocity)
Content: Articles published AND timeonpage (high volume but low engagement indicates clickbait)

Research by Michael Harris and Bill Tayler in HBR found that companies using paired metrics had 40% less gaming behavior than those using single metrics, and 25% better longterm performance.

2. Measure What Matters, Not What's Easy

Hardtogame metrics are often better proxies, even if they're harder to collect. It's easy to inflate page views (click refresh, use bots); it's hard to fake genuine engagement (time spent, return visits, actions taken). Choose metrics that resist manipulation. Cryptographer Bruce Schneier's work on security design applies here: assume adversarial actors will try to game your system design accordingly.

3. Include Constraints

Don't just set a target. Add boundaries that prevent gaming. "Increase conversion rate by 20%" invites lowering prices unsustainably or targeting unqualified users who churn immediately. Better: "Increase conversion rate by 20% without increasing bounce rate or decreasing average order value."

This mirrors constraint satisfaction problems in computer science multiple constraints force solutions into acceptable regions of the solution space. Management theorist Eliyahu Goldratt's Theory of Constraints emphasizes that system performance requires managing constraints, not just optimizing individual metrics.

4. Monitor for Unintended Consequences

When one metric improves dramatically, check everything else. What tradeoffs were made? What got sacrificed? Gaming often shows up as distortions elsewhere in the system secondorder effects reveal gaming that firstorder metrics miss.

System dynamics researcher John Sterman emphasizes that interventions in complex systems create side effects through feedback loops optimizing one variable often degrades others. His work at MIT shows that unanticipated consequences aren't aberrations they're how systems work. Build surveillance for them.

5. Rotate Metrics Periodically

Don't use the same metrics forever. Rotate them to prevent longterm gaming strategies and prevent local optimization at the expense of broader goals. If people know metrics will change, they're less likely to optimize narrowly. This is moving target defense from cybersecurity applied to organizational measurement.

However, don't change so frequently that people can't learn from feedback. Reinforcement learning research shows that learning requires stable reward signals if you change metrics weekly, people can't connect actions to outcomes. Balance stability (for learning) with rotation (to prevent gaming). Typical cadence: core metrics stable for 612 months, supporting metrics reviewed quarterly.

6. Focus on Outcomes Over Outputs

Measure results, not just activity. It's harder to game "customer retention rate" (an outcome) than "customer calls made" (an output). Outputs can be inflated without delivering value; outcomes reveal whether value was actually created.

Management consultant W. Edwards Deming warned against "management by results" without understanding the processes that produce results. His insight: measure processes (what produces outcomes) and outcomes (what you care about), not just outputs (what's easy to count). The PDSA cycle (PlanDoStudyAct) embeds this study connects actions to outcomes, enabling learning rather than just measurement.

7. Use Gaming Explicitly as a Design Tool

Before implementing a metric, run a "premortem" on how it could be gamed. Psychologist Gary Klein's premortem technique works here: "Imagine this metric has been gamed catastrophically. How did it happen?" Generate gaming scenarios, then design defenses. This is "red team" thinking from security assume adversarial actors and test your defenses.

8. Audit and Investigate Outliers

When performance looks too good to be true, it often is. Statistical outliers warrant investigation are they legitimate high performers or gaming the system? Benford's Law and other statistical forensics can detect fabricated data. Auditing creates deterrence if people know anomalies will be investigated, gaming becomes riskier.

The Freakonomics analysis of teacher cheating on standardized tests used statistical pattern detection classrooms where too many students improved suspiciously on specific questions suggested answerchanging. Similar techniques apply to detecting gaming in any metric system.

9. Separate Measurement from Rewards

Campbell's Law states that corruption increases as metrics are used for decisionmaking. One solution: measure for learning, not evaluation. When metrics aren't tied to rewards/punishments, gaming pressure drops. Deming advocated "drive out fear" when people fear measurement, they game it; when they trust it's for improvement, they use it honestly.

This creates a tension: metrics without consequences don't drive behavior, but metrics with highstakes consequences get gamed. The solution isn't binary it's calibration. Use metrics to inform conversations and coaching, not as automatic determinants of rewards. As Deming said, "The most important figures that one needs for management are unknown or unknowable."

Building Effective Dashboards That People Actually Use

Most dashboards are graveyards of metrics: 50 numbers, no insights, nobody looks at them. Research by Stephen Few, author of Information Dashboard Design, found that most corporate dashboards fail because they prioritize comprehensiveness over usability they show everything instead of what matters. Effective dashboards are different.

Start With Decisions

What decisions will this dashboard inform? Work backward from there. Every metric on the dashboard should answer a decisionrelevant question. Edward Tufte, pioneer of information design, argues in The Visual Display of Quantitative Information that excellence in statistical graphics consists of complex ideas communicated with clarity, precision, and efficiency. Start with the ideas (decisions), then design the display.

Business intelligence expert Wayne Eckerson identifies three dashboard types by decision context:

Operational dashboards: Realtime monitoring for immediate action (call center wait times, server uptime, manufacturing defects). Update frequency: seconds to minutes. Focus: exceptions and alerts.
Tactical dashboards: Performance tracking for shortterm optimization (weekly sales, marketing campaign ROI, support ticket resolution). Update frequency: daily to weekly. Focus: trends and comparisons.
Strategic dashboards: Highlevel indicators for longterm planning (market share, customer lifetime value, net promoter score trends). Update frequency: monthly to quarterly. Focus: patterns and strategic implications.

Mixing these creates confusion tactical users overwhelmed by strategic context, strategic users distracted by operational noise.

Show Trends Over Time

Single numbers lack context. Is 10,000 users good? Depends is it growing or shrinking? Show direction, velocity, and acceleration. Trends reveal patterns that snapshots miss. John Tukey's exploratory data analysis emphasizes that understanding data requires seeing patterns, not just summary statistics Anscombe's quartet demonstrates how identical statistics can represent wildly different patterns.

Use sparklines (small, inline charts) to show recent trends next to current values. Tufte introduced sparklines as "dataintense, designsimple, wordsized graphics" they pack maximum information in minimum space. Example: "Revenue: $2.4M ?sparkline showing upward trend? ?12% vs last month."

Use Visual Hierarchy

Most important metric at the top, biggest. Supporting metrics below, smaller. Use size, position, and color to indicate importance. Don't treat all metrics equally they're not. Graphic designer Robin Williams' principles of visual design (contrast, repetition, alignment, proximity) apply: create clear visual hierarchy so viewers instantly know where to look first.

Psychologist George Miller's research on working memory limits (7 2 chunks) suggests humans can process 59 distinct elements at once. Dashboards exceeding this overwhelm cognitive capacity viewers see chaos, not patterns. Limit toplevel metrics to 57; group related metrics into clusters to reduce cognitive load.

Include Thresholds

What number triggers action? Make it visible. Use colors, annotations, alerts. If churn rate above 5% means "investigate immediately," show that threshold clearly red zone above 5%, yellow 35%, green below 3%. This is management by exception highlight anomalies that require attention, let normal ranges fade into background.

Walter Shewhart's control charts pioneered this in manufacturing show acceptable variation (control limits) versus signals that require intervention (outofcontrol points). Same principle applies to business dashboards: show normal range and flag deviations.

Segment Your Data

Averages hide insights. Show cohorts, channels, customer segments, time periods. "Conversion rate = 3%" is useless. "Conversion rate: Organic 5%, Paid 2%, Referral 8%" is actionable now you know where to focus. Simpson's paradox demonstrates how aggregated data can show opposite trends from disaggregated data segmentation reveals hidden patterns.

Data analyst Cole Nussbaumer Knaflic, author of Storytelling with Data, emphasizes that context transforms data into insight. Segmentation provides context not just "sales are down" but "enterprise sales are down 20% while SMB sales are up 15%." Now you know what to investigate.

Keep It Simple

Limit dashboards to 57 key metrics. More than that creates cognitive overload. If you need more metrics, create multiple dashboards for different audiences or decisions. Tufte's principle: maximize dataink ratio every pixel should convey information. Remove chart junk, 3D effects, unnecessary decoration. Let data speak.

The 80/20 rule applies: 20% of metrics provide 80% of decision value. Ruthlessly cut the lowvalue 80%. As French writer Antoine de SaintExup ry wrote, "Perfection is achieved not when there is nothing more to add, but when there is nothing left to take away."

Match Refresh Rate to Decision Cadence

Don't update daily metrics every minute creates anxiety and noise. Don't update strategic metrics monthly if you check them weekly creates staleness. Match data refresh to how often you can actually change course based on what you learn. Information overload research shows that toofrequent updates reduce decision quality people can't distinguish signal from noise.

Enable DrillDown Investigation

Dashboard shows highlevel summary; users need ability to drill into details when something looks wrong. Interactive dashboards following Ben Shneiderman's visual informationseeking mantra: "Overview first, zoom and filter, then details on demand." Start broad, enable progressive disclosure.

Choose the Right Visualization

Not all charts are created equal. Cleveland and McGill's research on graphical perception ranked visualization elements by accuracy: position on common scale (best) --> position on nonaligned scale --> length --> angle --> area --> volume --> color (worst). Implications:

Comparing values: Bar charts (length comparison) better than pie charts (angle comparison)
Trends over time: Line charts (position on aligned scale)
Distributions: Histograms or box plots (show spread and outliers)
Relationships: Scatter plots (position on two scales)
Parttowhole: Stacked bar charts better than pie charts for precision

Avoid 3D charts, dual axes (confuse comparisons), and pie charts with -->5 segments (impossible to compare accurately).

The Litmus Test: Show your dashboard to someone unfamiliar with the business. Can they identify within 30 seconds: 1) What's most important? 2) Whether things are getting better or worse? 3) Where to focus attention? If not, simplify. Information designer Darkhorse Analytics demonstrates that less is more remove everything that doesn't add insight.

This connects to information architecture organizing information for findability and usability. It also connects to cognitive load theory minimize extraneous load so users can focus on germane processing (understanding the data, not decoding the interface).

Building a Measurement System That Works

Putting it all together: how do you build measurement systems that inform better decisions without creating dysfunction? This requires integrating insights from management science, behavioral economics, systems thinking, and organizational psychology. W. Edwards Deming emphasized that a "system of profound knowledge" requires understanding variation, psychology, knowledge theory, and systems measurement is useless without this broader context.

Core Principles

Metrics are tools, not goals. They inform decisions; they don't make them. Judgment still matters. As Russell Ackoff wrote, "Managers who don't know how to measure what they want settle for wanting what they can measure." Don't let the tail wag the dog.
Assume gaming. Design metrics assuming people will optimize for them in unexpected ways per Goodhart's Law and Campbell's Law. Use paired metrics, constraints, and regular audits. Mechanism design teaches that incentivecompatible systems require careful engineering wishful thinking isn't enough.
Combine quantitative and qualitative. Not everything that matters can be measured. Use metrics for scale and patterns; use conversations, observations, and case studies for depth and understanding. Anthropologist Clifford Geertz's "thick description" reminds us that meaning requires context that numbers can't capture.
Focus on leading indicators. Lagging indicators tell you what happened (too late to change). Leading indicators let you change what will happen (actionable in realtime). Research by Kaplan and Norton shows that balanced use of leading and lagging indicators across multiple perspectives improves performance 1520% over financial metrics alone.
Validate your proxies. Check regularly that your proxy metrics still correlate with outcomes you care about. Correlations drift as contexts change. Nassim Taleb's work on fragility shows that what worked in past environments breaks in new ones continuous validation is essential.
Keep it simple. Fewer metrics, deeper understanding. 5 wellchosen metrics beat 50 poorlychosen ones. Psychologist George Miller's research on cognitive limits (7 2 items in working memory) suggests anything beyond 59 metrics overwhelms human processing capacity.
Make it visible. If metrics aren't visible, they don't influence behavior what gets measured and reported gets managed. But if they're too visible with high stakes, they'll be gamed per Campbell's Law. Balance transparency with trust. As Deming said, "drive out fear" people game metrics they fear, they improve metrics they trust.
Enable learning over evaluation. Metrics for learning improve systems; metrics for punishment create fear and gaming. Chris Argyris's work on doubleloop learning shows that organizations improve when they question assumptions (including which metrics matter), not just optimize within existing frameworks.

The Implementation Process

Identify objectives. What are you trying to achieve? Be specific. Use SMART criteria (Specific, Measurable, Achievable, Relevant, Timebound) but recognize their limits not everything important is SMARTcompatible. Management researcher Henry Mintzberg warns that overformalization kills strategic thinking.
Choose leading and lagging indicators. What predicts success (leading)? What confirms it (lagging)? Use the Balanced Scorecard four perspectives (financial, customer, internal process, learning & growth) to ensure comprehensive coverage. Each quadrant needs both leading and lagging metrics.
Define how you'll measure. Where does data come from? How often do you check? Who's responsible for collection and analysis? Is measurement automated or manual? Research by Cassie Kozyrkov, Google's Chief Decision Scientist, emphasizes that measurement requires clear operational definitions "customer satisfaction" means nothing until you define how it's measured.
Set thresholds and targets. What number means "success"? What triggers investigation or intervention? Use control charts (Shewhart) to distinguish signal from noise not every fluctuation requires action. Set targets based on capability analysis (what's achievable) not just aspiration (what we want).
Build dashboards. Make metrics visible to the right people at the right cadence. Follow Tufte's principles: maximize dataink ratio, show comparisons and context, integrate text and graphics. Use Stephen Few's dashboard design patterns for different audiences (operational, tactical, strategic).
Monitor for gaming and unintended consequences. Watch for unexpected patterns suggesting gaming. Use statistical outlier detection. Investigate anomalies. Run periodic construct validity studies do improved metrics actually correlate with improved outcomes? systems thinking helps identify feedback loops and side effects.
Review and refine. Every quarter, ask: Are these still the right metrics? Are they still working? Have contexts changed? Are we learning what we need to learn? Treat your measurement system as a hypothesis to be tested, not scripture to be followed. Action research methodology (Kurt Lewin) embeds this continuous improvement cycle.
Connect metrics to narrative. Numbers without stories are sterile; stories without numbers are unverifiable. Storytelling with data (Cole Nussbaumer Knaflic) combines quantitative rigor with qualitative meaning. Ask: what story do these numbers tell? What actions do they suggest? What questions do they raise?

Common Failure Modes

Metric fixation: Confusing metrics with goals (Jerry Muller's The Tyranny of Metrics). Solution: regularly ask "why does this metric matter?" connecting means to ends.
Gaming without consequences: Detecting gaming but not addressing it. Solution: audit + accountability gaming must have consequences or it becomes the norm.
Analysis paralysis: Measuring everything, deciding nothing. Solution: limit metrics ruthlessly 57 key metrics with clear decision rules.
Ignoring qualitative signals:McNamara Fallacy overreliance on quantification. Solution: structured qualitative methods (interviews, case studies, observation) alongside quantitative metrics.
Static metrics in dynamic environments: Using the same metrics as context changes. Solution: regular metric review (quarterly) and rotation to prevent gaming and maintain relevance.
Disconnected metrics: Tracking metrics that don't predict outcomes. Solution: validate predictive power empirically do leading indicators actually predict lagging outcomes?

Integration with Existing Frameworks

Effective measurement systems don't exist in isolation they integrate with:

OKRs (Objectives and Key Results): Objectives provide direction; key results are measurable outcomes. Metrics track progress toward key results. Pioneered by Andy Grove at Intel, popularized by John Doerr.
KPIs (Key Performance Indicators): Critical metrics that drive accountability. Should be chosen carefully not everything measurable is key.
Statistical Process Control: Shewhart and Deming's methods for distinguishing common cause (system) from special cause (anomaly) variation. Prevents overreaction to noise.
PDCA/PDSA cycles: PlanDoCheckAct (Deming) embeds measurement in continuous improvement loops.

Building effective measurement systems requires synthesis of multiple disciplines systems thinking for understanding feedback and unintended consequences, critical thinking for questioning what we measure and why, decision theory for connecting metrics to action, and organizational psychology for understanding how humans respond to measurement. As George Box said, "All models are wrong, but some are useful" the art is building useful ones.

Frequently Asked Questions About Metrics, Measurement & Evaluation

What is Goodhart's Law and why does it matter for metrics?

Goodhart's Law states that when a measure becomes a target, it ceases to be a good measure. People optimize for the metric rather than the underlying goal, often gaming the system in ways that hit the number while missing the point. Example: optimizing for lines of code produces bloated code, not quality software. Goodhart's Law matters because it reveals why so many metrics systems fail the metric becomes the goal instead of a proxy for the actual goal.

What's the difference between leading and lagging indicators?

Lagging indicators measure outcomes after they occur revenue, customer churn, conversion rate. They tell you what happened but come too late to change it. Leading indicators predict future outcomes pipeline meetings, feature usage, engagement metrics. They're actionable in realtime but harder to identify. Effective measurement systems combine both: lagging indicators to validate results, leading indicators to drive action. Focus energy on leading indicators that actually predict the lagging outcomes you care about.

How do I choose the right metrics to track?

Choose metrics by asking: 1) What decision will this metric inform? (if none, don't track it), 2) Is this a leading or lagging indicator? (you need both), 3) Can this metric be gamed? (if yes, pair it with constraints), 4) Does this measure what I actually care about, or just what's easy to measure? (don't mistake convenient for correct), 5) Will I act differently based on this number? (if not, it's vanity). Good metrics are actionable, accessible, auditable, and aligned with actual goals.

What are North Star Metrics and how do I identify mine?

A North Star Metric is the single metric that best captures the core value you deliver to customers. It focuses the entire organization on what matters. Examples: Airbnb tracks nights booked, Facebook tracks daily active users, Slack tracks messages sent. To identify yours: 1) What value do customers actually get from your product?, 2) What behavior indicates they're getting that value?, 3) Is this metric leading (predicts future revenue) or just vanity?, 4) Does improving this metric grow your business? A good North Star is measurable, actionable, leading, and revenuecorrelated.

How do I prevent metrics from being gamed?

Prevent gaming by: 1) Use paired metrics track quality alongside quantity (e.g., customer acquisition AND retention), 2) Measure what matters, not what's easy (hardtogame metrics are often better proxies), 3) Include constraints 'increase conversion rate without increasing bounce rate', 4) Monitor for unintended consequences (watch what happens to other metrics when one improves), 5) Rotate metrics periodically (prevents longterm optimization games), 6) Focus on outcomes over outputs (measure results, not just activity). Remember: if a metric can be gamed, it will be.

What is the difference between vanity metrics and actionable metrics?

Vanity metrics make you feel good but don't inform decisions total users, page views, downloads. They go up over time regardless of whether your business is healthy. Actionable metrics tell you what to do differently conversion rate by channel, activation rate, revenue per customer, churn rate by cohort. They segment data to reveal insights and predict outcomes. Test: If the metric doubled tomorrow, would you know what action to take? If yes, it's actionable. If no, it's vanity. Focus ruthlessly on actionable metrics; ignore vanity metrics no matter how impressive they look.

How do I build effective dashboards that people actually use?

Build effective dashboards by: 1) Start with decisions what decisions will this dashboard inform? Work backward from there, 2) Show trends over time single numbers lack context; show direction and velocity, 3) Use hierarchy most important metric at top, supporting metrics below, 4) Include thresholds what number triggers action? Make it visible, 5) Segment data averages hide insights; show cohorts, channels, segments, 6) Keep it simple limit to 57 key metrics; more creates cognitive overload, 7) Update frequency match refresh rate to decision cadence. Bad dashboards show everything. Good dashboards show what matters.

What is the McNamara Fallacy and how do I avoid it?

The McNamara Fallacy is the error of relying solely on quantitative measurements while ignoring qualitative factors that can't be measured. Named after Defense Secretary Robert McNamara, who measured Vietnam War success by body counts while missing political and social factors. Avoid it by: 1) Acknowledging what you can't measure (many important things are qualitative), 2) Combining quantitative metrics with qualitative research (surveys, interviews, observation), 3) Watching for what metrics miss (gaming, context, unintended consequences), 4) Using judgment alongside data (metrics inform decisions, they don't make them). Not everything that counts can be counted, and not everything that can be counted counts.

When Notes Fly

Search

Popular Searches

What Are Metrics?

Why Metrics Matter (and Fail)

Goodhart's Law: When Measures Become Targets

Leading vs Lagging Indicators

Lagging Indicators

Leading Indicators

The Right Balance

North Star Metrics: Finding Your One Number

Examples of North Star Metrics

How to Identify Your North Star

Vanity Metrics vs Actionable Metrics

Vanity Metrics

Actionable Metrics

The Litmus Test

Proxy Metrics and Surrogates: Measuring the Unmeasurable

The Proxy Problem

Using Proxies Well

The McNamara Fallacy: When Metrics Miss What Matters

The Vietnam War Example

The Four Fallacies

Modern Examples

How to Avoid It

Input Metrics vs Output Metrics

Input Metrics (Activity)

Output Metrics (Results)

The Right Mix

Choosing the Right Metrics: A Framework

1. What decision will this metric inform?

2. Is this leading or lagging?

3. Can this metric be gamed?

4. Does this measure what I care about, or just what's easy to measure?

5. Will I act differently based on this number?

6. Is this metric aligned with longterm value?

The AARRR Framework (Pirate Metrics)

Additional Frameworks

Preventing Gaming: Designing Robust Metrics

1. Use Paired Metrics

2. Measure What Matters, Not What's Easy

3. Include Constraints

4. Monitor for Unintended Consequences

5. Rotate Metrics Periodically

6. Focus on Outcomes Over Outputs

7. Use Gaming Explicitly as a Design Tool

8. Audit and Investigate Outliers

9. Separate Measurement from Rewards

Building Effective Dashboards That People Actually Use

Start With Decisions

Show Trends Over Time

Use Visual Hierarchy

Include Thresholds

Segment Your Data

Keep It Simple

Match Refresh Rate to Decision Cadence

Enable DrillDown Investigation

Choose the Right Visualization

Building a Measurement System That Works

Core Principles

The Implementation Process

Common Failure Modes

Integration with Existing Frameworks

Frequently Asked Questions About Metrics, Measurement & Evaluation

What is Goodhart's Law and why does it matter for metrics?

What's the difference between leading and lagging indicators?

How do I choose the right metrics to track?

What are North Star Metrics and how do I identify mine?

How do I prevent metrics from being gamed?

What is the difference between vanity metrics and actionable metrics?

How do I build effective dashboards that people actually use?

What is the McNamara Fallacy and how do I avoid it?

Test Your Understanding

All Articles

How Goodhart's Law Breaks Metrics

Interpreting Data Without Fooling Yourself

Designing Useful Measurement Systems

Measurement Bias Explained

KPIs Explained Without Buzzwords

Quantitative vs Qualitative Metrics