You measure what matters, right? Revenue, user growth, engagement, efficiency. You track KPIs, build dashboards, review metrics weekly. You're data-driven. Yet decisions don't improve. Teams game the numbers. Efforts misalign. The measurement system that should guide you creates confusion instead.

The problem isn't measuring—it's measuring badly. Most measurement systems suffer from predictable failures: too many metrics (nothing is important), wrong metrics (measure activity not outcomes), gaming-prone metrics (optimize the number not the goal), or disconnected metrics (no relationship to strategy). A useful measurement system does the opposite: focuses attention, reveals truth, resists gaming, and actually improves decisions.

As Douglas Hubbard argues in How to Measure Anything, "If it's important enough to manage, it's important enough to measure—and if it seems immeasurable, that is usually just a failure of imagination."

Designing measurement systems that work requires understanding what makes metrics useful, how systems fail, and how to build frameworks that inform rather than mislead.


What Makes a Measurement System Useful?

The Purpose of Measurement

Not to track everything. To improve decision-making and actions.

"In God we trust; all others must bring data." — W. Edwards Deming, statistician and quality management pioneer

A useful measurement system:

  • Clarifies what success looks like
  • Reveals when you're on or off track
  • Informs resource allocation
  • Enables learning and improvement
  • Aligns team efforts

A useless measurement system:

  • Generates reports no one uses
  • Measures activity without outcomes
  • Creates perverse incentives
  • Obscures reality behind metrics
  • Diverts effort to gaming numbers

Characteristics of Useful Measurement Systems

Characteristic Why It Matters
Aligned with strategy Metrics must connect to actual goals, not proxy activities
Actionable Data should inform specific decisions; if no action possible, why measure?
Timely Data arrives when decisions are made, not weeks later
Balanced Multiple perspectives prevent over-optimization of one dimension
Simple Few, clear metrics beat many confused ones
Gaming-resistant Hard to manipulate without actual improvement
Leading and lagging Predict future (leading) and confirm results (lagging)

The Fundamental Tension: Comprehensiveness vs. Focus

The Comprehensive Measurement Trap

Natural impulse: Measure everything that might matter.

Result:

  • 50+ metrics tracked
  • Nobody knows which matter most
  • Cognitive overload
  • Everything measured, nothing managed

Problem: When everything is important, nothing is important.


Focus Beats Comprehensiveness

Research finding: Organizations with 3-7 key metrics per goal outperform those with 20+ metrics.

As Peter Drucker observed, "What gets measured gets managed—but only if you measure the right things. Measure the wrong things and you manage the wrong things."

Why focus works:

Focused System (3-7 metrics) Comprehensive System (20+ metrics)
Clear priorities Confused priorities
Memorable Forgettable
Attention concentrated Attention diffused
Gaming visible Gaming hidden in noise
Actionable insights Overwhelming data

Rule: If you can't remember your key metrics, you have too many.


The 80/20 of Measurement

Principle: 20% of metrics provide 80% of decision value.

Implication: Identify critical few, track rigorously. Ignore rest or check only occasionally.

Example:

Organization Critical Few Metrics Secondary/Occasional
SaaS company MRR growth, net revenue retention, CAC:LTV 20+ other metrics (track quarterly)
Hospital Patient outcomes, readmission rate, safety incidents Operational efficiency metrics
University Graduation rate, job placement, research output Countless process metrics

The discipline: Resisting the urge to promote everything to "key metric" status.


Step 1: Start With Strategy

Metrics Must Connect to Goals

Broken approach:

  • Pick metrics because they're measurable
  • Track metrics because competitors do
  • Measure what's easy to measure

Effective approach:

  • Define strategic goals
  • Identify drivers of those goals
  • Measure drivers

The Strategy-Metrics Cascade

Level Question Example
Mission Why do we exist? "Make knowledge accessible"
Strategic Goal What does success look like? "Be primary resource for 10M learners"
Key Driver What causes goal achievement? "Content quality + discoverability"
Metric How do we measure driver? "Content depth score, organic traffic, retention rate"

Alignment test: Can you trace each metric back to strategic goal? If not, why measure it?


Common Misalignment Problems

Problem Example Fix
Activity metrics "Articles published" Measure outcomes: "Knowledge gained (retention, application)"
Vanity metrics "Total registered users" Measure engagement: "Active users, completion rates"
Lagging only "Annual revenue" Add leading: "Pipeline velocity, win rate"
One-dimensional "Revenue only" Add: "Customer satisfaction, product quality"

Step 2: Identify Key Performance Drivers

What Drives Success?

Critical question: What factors, if improved, would most advance strategic goals?

Framework:

Goal Key Drivers How to Identify
Revenue growth New customer acquisition, retention, expansion Historical analysis, cohort studies
Customer satisfaction Product quality, support responsiveness, ease of use Surveys, correlation analysis
Operational efficiency Process bottlenecks, automation level, error rates Value stream mapping, time studies

Leading vs. Lagging Indicators

Lagging indicators:

  • Measure results
  • Historical (what happened)
  • Hard to influence directly
  • Examples: Revenue, profit, market share

Leading indicators:

  • Predict future results
  • Forward-looking
  • Actionable
  • Examples: Sales pipeline, customer retention, product quality

A balanced system needs both:

Lagging (Outcome) Leading (Driver)
Revenue Sales pipeline value, win rate
Customer satisfaction Support ticket resolution time, product bugs
Employee retention Employee engagement scores
Market share Product quality ratings, brand awareness

Rule: If system has only lagging indicators, you know results but can't improve them.

"A system that produces data but no learning is not a measurement system—it is a reporting system. The two are not the same." — Russell Ackoff, systems theorist and organizational theorist


Step 3: Select Core Metrics

The Selection Process

For each strategic goal:

  1. Identify 2-4 key drivers
  2. For each driver, select 1-2 metrics
  3. Result: 3-7 metrics per goal

Example: SaaS Company's Growth Goal

Driver Metric 1 Metric 2
Acquisition New MRR CAC (Customer Acquisition Cost)
Retention Net Revenue Retention Churn rate
Expansion Expansion MRR % customers expanding

Total: 6 core metrics


Criteria for Good Metrics

A good metric is:

Criterion Definition Example
Understandable Anyone can grasp meaning "Customer retention %" vs "Complex cohort survival index"
Comparable Trends over time, benchmarks Month-over-month, industry comparison
Ratio or rate Normalized (not absolute) "Conversion rate" better than "conversions"
Behavior-changing Influences decisions Revenue per customer → focus on expansion

Source: Lean Analytics by Croll & Yoskovitz


The SMART Metric Test

Metrics should be:

Attribute Question Bad Example Good Example
Specific Precisely defined? "User engagement" "Daily active users (logged in + action)"
Measurable Can be quantified? "Brand strength" "Net Promoter Score"
Actionable Can you influence it? "Market conditions" "Sales conversion rate"
Relevant Connects to goal? "Page views" (vanity) "Content completion rate" (engagement)
Time-bound Has update frequency? "Eventually" "Updated weekly"

Step 4: Balance Multiple Perspectives

The Balanced Scorecard Framework

Problem: Over-optimization of one dimension damages others.

Solution: Measure across multiple perspectives.

Kaplan & Norton's Balanced Scorecard (1992):

Perspective Questions Example Metrics
Financial How do we look to shareholders? Revenue growth, profitability, ROI
Customer How do customers see us? Satisfaction, retention, NPS
Internal Process What must we excel at? Cycle time, quality, innovation rate
Learning & Growth How can we improve? Employee skills, engagement, R&D investment

Key insight: Excellence in all four predicts long-term success; optimizing only financial metrics often destroys value.


Example: Hospital Measurement System

Balanced approach:

Dimension Metric Why
Clinical outcomes Mortality rate, complication rate Core mission
Patient experience Satisfaction scores, wait times Quality of care
Operational Bed utilization, procedure cost Efficiency
Staff Nurse turnover, training hours Capability
Financial Operating margin Sustainability

Prevents: Cutting costs at expense of outcomes, or maximizing satisfaction at expense of financial viability.


Step 5: Build Gaming Resistance

Goodhart's Law: "When a measure becomes a target, it ceases to be a good measure"

Mechanism:

  • People optimize for metric
  • Metric diverges from underlying goal
  • Metric becomes meaningless

Examples:

Metric as Target Gaming Behavior True Goal Undermined
Call center: Calls handled Rush customers off phone Customer satisfaction
Hospital: Mortality rate Refuse high-risk patients Patient care
Software: Lines of code Write verbose code Code quality
Sales: Number of deals Close small, unprofitable deals Revenue quality

Strategies to Reduce Gaming

Strategy 1: Use Complementary Metrics

Approach: Pair metrics that counterbalance each other.

Metric A (Can Be Gamed) Metric B (Prevents Gaming) Effect
Quantity (calls handled) Quality (customer satisfaction) Can't rush if quality measured
Speed (response time) Accuracy (error rate) Can't be fast and sloppy
Revenue Customer acquisition cost Can't buy revenue at any price
Growth Retention Can't churn through customers

Strategy 2: Focus on Outcomes, Not Outputs

Output (Gameable) Outcome (Meaningful)
Features shipped Customer problems solved
Marketing campaigns run Leads generated, conversion rate
Training hours delivered Skills demonstrated, performance improvement
Reports produced Decisions informed, actions taken

Principle: Measure results, not activities.


Strategy 3: Maintain Qualitative Judgment

Don't rely solely on quantitative metrics.

Hybrid approach:

Quantitative Metric Qualitative Assessment
Sales conversion rate Win/loss analysis: why we won/lost
Customer satisfaction score Customer interviews: what matters
Code quality metrics Peer code review: actual quality judgment

Reason: Numbers are gameable; human judgment (properly structured) is harder to fool.

As Donald Wheeler, statistician and quality expert, puts it: "Every data set contains noise. Some data sets also contain signals. Before you can detect a signal, you have to filter out the noise." Pure quantitative data without judgment amplifies that noise.


Strategy 4: Rotate or Evolve Metrics

When a metric becomes target:

  • Gaming strategies develop
  • Metric loses predictive power

Solution: Periodically change what you measure

Example: Google reportedly rotates quality metrics to prevent SEO gaming.


Step 6: Set Appropriate Measurement Frequency

Match Frequency to Decision Cycle

Principle: Measure as often as you need to make decisions, no more.

Metric Typical Frequency Why
Financial results Monthly/Quarterly Slow-moving, decision cycle is monthly
Website traffic Daily/Weekly Fast-moving, can react quickly
Customer satisfaction Quarterly Changes slowly, surveys have cost
Employee engagement Annually/Biannually Slow to change, survey fatigue issue

The Noise vs. Signal Trade-off

High-frequency measurement:

  • Pro: Detect changes quickly
  • Con: Noise overwhelms signal; random variation looks meaningful

Low-frequency measurement:

  • Pro: Clearer trends
  • Con: Miss timely intervention opportunities

Example:

Daily Revenue Tracking Monthly Revenue Tracking
See random fluctuations See clear trends
Panic over noise Respond to actual changes
Constant reaction Thoughtful response

Best practice: Track high-frequency, decide at lower frequency (moving averages, trend lines).


Step 7: Test and Iterate

Metrics Are Hypotheses

Initial metrics are guesses about what matters.

Test:

  • Do improvements in metric correlate with actual goal progress?
  • Do teams make better decisions with this metric?
  • Is metric being gamed?

If not, change the metric.


The Validation Process

Question How to Test Action If Fails
Does metric predict outcome? Correlation analysis Replace with better predictor
Do decisions improve? Decision audit Simplify or reframe metric
Is it gamed? Behavior observation Add counterbalancing metric
Is it used? Review meeting analysis Remove metric if unused

Evolution Over Time

As organization matures:

Early Stage Growth Stage Mature Stage
Focus: Survival, product-market fit Focus: Scaling, efficiency Focus: Optimization, innovation
Metrics: Cash runway, user feedback Metrics: Growth rate, unit economics Metrics: Market share, profitability

Measurement system must evolve with strategy.


Common Measurement System Mistakes

Mistake 1: Too Many Metrics

Problem: 50+ metrics tracked

Result:

  • No clear priorities
  • Gaming hidden in complexity
  • Analysis paralysis

Fix: Ruthlessly prune to 3-7 per major goal


Mistake 2: Measuring Only Lagging Indicators

Problem: Only track outcomes (revenue, profit)

Result: Know when you've failed, but can't prevent failure

Fix: Add leading indicators (pipeline, quality, engagement)


Mistake 3: No Connection to Strategy

Problem: Metrics chosen because they're available

Result: Measure things that don't matter

Fix: Start with strategy, derive metrics


Mistake 4: One-Dimensional Measurement

Problem: Financial metrics only

Result: Short-term optimization, long-term value destruction

Fix: Balanced scorecard approach


Mistake 5: Static Metrics

Problem: Never change what you measure

Result: Gaming develops, metrics lose meaning

Fix: Periodic review and evolution


Mistake 6: Targets Without Context

Problem: "Increase X by 20%"

Result: Gaming, sandbagging, arbitrary goals

Fix: Understand drivers; set targets based on what's achievable and valuable


Advanced Concepts

Diagnostic vs. Prescriptive Metrics

Diagnostic metrics: Tell you what happened Prescriptive metrics: Tell you what to do

Example:

Diagnostic Prescriptive
"Revenue dropped 10%" "Win rate decreased because competitive pricing changed; need new positioning"
"Churn increased" "Customers churning lack feature X; prioritize development"

Best systems: Provide both diagnosis and prescription.


Metrics at Different Organizational Levels

Different levels need different metrics:

Level Focus Metric Examples
Executive Strategic progress Market share, brand strength, financial health
Department Function performance Sales conversion, product quality, support satisfaction
Team Operational execution Story points completed, bugs fixed, calls handled
Individual Personal contribution Tasks completed, skills developed, feedback scores

Alignment: Individual → Team → Department → Executive metrics should cascade.


Real-Time vs. Periodic Dashboards

Real-time dashboards:

  • For operational metrics (website uptime, system load)
  • When immediate action required

Periodic reporting:

  • For strategic metrics (market position, brand)
  • When thoughtful analysis needed

Mistake: Making everything real-time creates noise and urgency bias.


Case Study: Redesigning a Failed Measurement System

The Problem

Software company with broken metrics:

Old Metric Problem
Lines of code written Incentivized verbose, low-quality code
Features shipped Quantity over quality; features nobody used
Bug count Hid bugs by not reporting them
Sprint velocity Inflated story point estimates

Result: Metrics looked good, product quality terrible, customers churning.


The Redesign Process

Step 1: Strategy clarity

  • Goal: Build product customers love and retain

Step 2: Identify drivers

  • Product quality
  • Customer value delivered
  • Team capability

Step 3: New metrics

Old Metric New Metric Why Better
Lines of code Code quality score (peer review + automated analysis) Measures quality
Features shipped Features adopted (% customers using) Measures value
Bug count Customer-reported bugs, time to fix Can't hide; measures impact
Sprint velocity Delivered value (customer outcome) Focuses on outcomes

Step 4: Balance

  • Added customer satisfaction (quarterly NPS)
  • Added team health (engagement survey)

Step 5: Gaming resistance

  • Multiple complementary metrics
  • Qualitative review (demos, code review)
  • Metric rotation (change technical quality metrics annually)

The Results

After 6 months:

  • Code quality improved (fewer production bugs)
  • Feature adoption increased (only valuable features built)
  • Customer retention improved
  • Team satisfaction increased (not gaming metrics)

Key insight: Fewer, better metrics focused on outcomes beat many activity metrics.


Practical Implementation

Building Your Measurement System

Timeline:

Phase Duration Activities
1. Strategy 1-2 weeks Clarify goals, identify drivers
2. Metric design 2-3 weeks Select metrics, define calculation
3. Infrastructure 4-8 weeks Build data collection, dashboards
4. Pilot 1-3 months Test with one team/function
5. Refine 2-4 weeks Fix issues discovered in pilot
6. Rollout 4-8 weeks Extend to organization
7. Ongoing Continuous Review quarterly, evolve as needed

The Measurement System Document

Create written document:

Section Contents
Strategy Goals, key drivers
Core metrics 3-7 per major goal, with definitions
Calculation Exactly how each metric computed
Frequency How often measured, reported
Ownership Who responsible for each metric
Targets Expected ranges (not rigid)
Review process How often system itself reviewed

Purpose: Clarity, alignment, reference.


Communication and Adoption

Measurement systems fail without adoption.

Keys to adoption:

Factor How
Clarity Everyone understands what metrics mean
Relevance Metrics connect to daily work
Visibility Dashboards accessible, discussed in meetings
Action Metrics inform actual decisions
Trust Metrics seen as fair, not punitive

Conclusion: Measurement as a System

Key principles:

  1. Focus beats comprehensiveness (3-7 metrics per goal)
  2. Start with strategy (metrics derive from goals)
  3. Balance dimensions (financial, customer, process, growth)
  4. Resist gaming (complementary metrics, qualitative judgment)
  5. Match frequency to decisions (measure when you can act)
  6. Iterate (metrics are hypotheses; test and evolve)

Good measurement systems:

  • Clarify priorities
  • Reveal truth
  • Inform decisions
  • Resist manipulation
  • Evolve with strategy

Bad measurement systems:

  • Obscure priorities
  • Create gaming
  • Generate reports nobody uses
  • Persist unchanged
  • Disconnect from goals

The difference is design. Measurement is too important to do accidentally.


What Research Shows About Measurement System Design

Forty years of research on organizational performance measurement systems has produced substantial, convergent findings about what makes these systems work or fail. Several researchers have been particularly influential.

Robert Kaplan and David Norton's Balanced Scorecard research (beginning with their 1992 Harvard Business Review paper and continuing through multiple books) established the foundational empirical case for multi-perspective measurement. Their research across hundreds of organizations showed that companies relying solely on financial measurement systems consistently underinvested in the drivers of future performance. The mechanism was straightforward: financial metrics are lagging indicators that reflect decisions made 12 to 24 months earlier. By the time a decline in customer satisfaction or process quality shows up in financial results, the causal factors have typically been deteriorating for years. A measurement system that includes only financial metrics provides no early warning.

Kaplan and Norton's research also identified a subtler failure: even companies that tracked customer and operational metrics alongside financial ones frequently failed to connect them. They tracked employee training hours, customer satisfaction scores, and process cycle times, but could not explain how improvements in any one of them were expected to drive improvements in another. The strategy map framework they developed in response required organizations to specify explicit causal hypotheses: we believe that improving employee skills in X will reduce defect rates in process Y, which will improve customer retention Z, which will grow revenue W. Each arrow was a testable hypothesis. This turned measurement system design from a data collection exercise into a scientific program for learning how the organization actually creates value.

W. Edwards Deming's statistical process control framework provides the operational foundation for useful measurement system design. Deming's insight, developed from Walter Shewhart's earlier work at Bell Labs and applied most influentially in postwar Japan, was that most variation in organizational outcomes is produced by system factors, not individual performance. When quality problems occur in a manufacturing process, approximately 85 percent of the variation is attributable to the process itself -- materials, equipment, procedures, environmental conditions -- and only 15 percent to individual worker behavior. This has a direct implication for measurement system design: systems should be designed to reveal process variation and enable system improvement, not to evaluate and rank individuals.

Deming's control charts provided a specific measurement tool for this purpose: tracking a metric over time and distinguishing between common cause variation (random fluctuation within a stable system) and special cause variation (signals that the system has changed). This distinction is critical for useful measurement systems: responding to common cause variation as though it were a signal produces "tampering" -- interventions that increase rather than decrease overall variation. Measurement systems that lack this capability for distinguishing signal from noise consistently lead to management by exception that makes things worse.

Donald Campbell's program evaluation research shaped the design of public sector measurement systems through his documentation of what he called "experimenting society" -- the idea that social programs should be treated as experiments that generate data for improvement rather than political commitments that must be defended. Campbell's Law, as formalized in 1979, was drawn from his observation that social programs evaluated on narrow outcome metrics consistently gamed those metrics. His proposed solution was methodological pluralism: measurement systems should use multiple methods with different vulnerability profiles. A program that can game its primary quantitative metric is less likely to successfully game a qualitative case study investigation, a randomized controlled trial, or a population-level administrative data analysis simultaneously.

Douglas Hubbard's measurement economics framework addresses the cost-benefit dimension of measurement system design. His central argument in How to Measure Anything (2014) is that organizations systematically build too large measurement systems because they do not apply economic analysis to measurement decisions. Every measurement has a cost: data collection, analysis, storage, and the opportunity cost of attention. Every measurement has an expected benefit: the expected value of the information for decisions. A measurement worth making is one where the expected benefit exceeds the cost. Hubbard's research found that most organizations can eliminate 60 to 80 percent of their tracked metrics without significant decision quality loss, because most metrics are either redundant, not decision-relevant, or provide information about questions the organization already has sufficient certainty to decide.


Real-World Case Studies in Measurement System Design

Intel's OKR measurement architecture. Andy Grove's implementation of OKRs at Intel in the 1970s is the most influential example of measurement system design in technology companies. Grove's system had several design features that addressed specific failure modes. First, objectives were qualitative (directional goals) while key results were quantitative (specific, measurable outcomes) -- this separated goal-setting from measurement, preventing the confusion between what you want to achieve and what you can count. Second, OKRs were set and reviewed quarterly, creating short feedback loops that allowed rapid adjustment when metrics proved not to predict the outcomes they were supposed to track. Third, OKRs were transparent across the organization -- everyone could see what every team was measuring and why. This transparency created horizontal accountability: teams could not optimize their own metrics in ways that damaged others' outcomes without it being visible.

When Google adopted OKRs in 1999, it added a specific design element: key results were expected to be aspirational, with a "sweet spot" achievement rate of 60 to 70 percent. Consistently achieving 100 percent on key results indicated that goals were too conservative -- teams were sandbagging to ensure they hit numbers rather than stretching to maximize value creation. This design feature directly addressed a failure mode that Goodhart's Law predicts: when key results become targets, people set them at levels they can comfortably achieve. The 60 to 70 percent norm built anti-sandbagging pressure into the measurement system itself.

The NHS balanced measurement evolution. The National Health Service's journey from single-metric to multi-dimensional measurement illustrates how measurement systems should evolve in response to evidence of gaming. The initial focus on waiting time metrics (introduced in the early 2000s) produced documented improvement in waiting times alongside documented gaming: ambulances held outside emergency departments, administrative pausing of waiting lists, reclassification of referrals. The response, developed through the NHS Institute for Innovation and Improvement and documented in multiple Audit Commission reports, was to expand the measurement system to make gaming one dimension costly on others.

The resulting NHS measurement framework includes: clinical outcome metrics (mortality, complication, readmission rates), patient experience metrics (from independent patient surveys), safety metrics (adverse events, near-misses, medication errors), access metrics (waiting times), and efficiency metrics (cost per episode, bed utilization). Gaming waiting times at the expense of patient safety would now be visible in the safety metrics. Gaming outcome metrics by avoiding high-risk patients would show up in access metrics. No single metric could be improved through pure administrative manipulation without creating signals in other dimensions. This is the core principle of complementary metric design: each metric limits the gaming space for the others.

Enron's measurement system failure. Enron's collapse illustrates what happens when measurement systems are designed to report favorably rather than to reveal truth. The company's reporting metrics (revenue, earnings per share, credit ratings, analyst recommendations) were all technically compliant with applicable standards. The measurement system failure was not fraud (though fraud existed) but design: the system measured what could be reported in the most favorable terms under existing rules, rather than what actually indicated business health.

Jeff Skilling, Enron's CEO, had an MBA and was sophisticated about financial measurement. The measurement system he oversaw tracked mark-to-market revenue (projected future cash flows counted as current income), managed earnings per share through asset disposals timed for quarterly reporting cycles, and maintained credit ratings through off-balance-sheet debt vehicles. Each individual metric was technically defensible. The ensemble was systematically misleading. A well-designed measurement system would have required cash flow from operations alongside revenue (immediately revealing the divergence), economic value added rather than accounting earnings, and transparency about off-balance-sheet obligations. The absence of these measures was not oversight -- it was design.

Toyota's visual measurement system. Toyota's production system, which influenced lean manufacturing methodology globally, embedded measurement into the physical production process rather than treating it as a separate reporting function. The andon cord -- which any worker could pull to stop the production line when a defect was detected -- created real-time measurement at the point of production. The quality measurement was inseparable from the production process itself. This design feature eliminated several common measurement system failures: no reporting lag (the measurement happened when the event occurred), no misalignment between who detects problems and who reports them (the worker who detected was the worker who triggered measurement), and no disincentive to report problems (the expected response was investigation and improvement, not punishment).

The Toyota system also built specific measurement system features to resist gaming: stopping the line was rewarded, not penalized. Workers who identified problems frequently were recognized as contributors to improvement. This directly addressed the failure mode in which measurement systems designed around punishment incentivize concealment rather than identification of problems.


Evidence-Based Principles for Useful Measurement System Design

Principle 1: Design for learning, not for reporting. The most consistent finding across Deming, Kaplan and Norton, and Campbell is that measurement systems designed primarily to produce favorable reports consistently fail to provide the information needed for improvement. Useful measurement systems are designed around a different question: what would we need to know to make better decisions and improve performance? The reporting function follows from this design; it does not drive it.

Principle 2: Build causal theories before selecting metrics. Kaplan and Norton's strategy maps, Grove's OKR objective-key result distinction, and Hubbard's decision analysis framework all converge on the same principle: measurement selection should be driven by explicit causal theories about how activities produce outcomes. The theory specifies what to measure (the causal factors), what the relationship should be (the predicted direction and magnitude), and what evidence would confirm or refute the theory. Without this structure, measurement systems become collections of available data rather than instruments for testing causal hypotheses.

Principle 3: Use complementary metrics that limit each other's gaming space. A single metric can almost always be improved through gaming. Multiple metrics that measure different aspects of the same goal create trade-offs that make gaming costly. Speed and quality metrics together (customer support response time and resolution quality) are harder to game simultaneously than either alone. Revenue and customer satisfaction metrics together resist strategies that boost revenue at the expense of customer relationships. The design principle is to identify the most likely gaming strategies for each metric and then select complementary metrics that would make those strategies visible or costly.

Principle 4: Match measurement frequency to decision cycles. Deming's statistical process control insights apply directly to measurement frequency. Measuring too frequently produces noise that overwhelms signal, leading to overreaction to random variation. Measuring too infrequently misses genuine changes in time to act on them. The appropriate frequency depends on the natural variability of the process and the decision cycle: operational metrics that inform daily decisions need daily or real-time measurement; strategic metrics that inform quarterly resource allocation decisions need monthly or quarterly measurement. A common design failure is applying the measurement frequency appropriate for operational metrics to strategic metrics, creating the illusion of signal in what is primarily noise.


References

  1. Kaplan, R. S., & Norton, D. P. (1992). "The Balanced Scorecard: Measures That Drive Performance." Harvard Business Review, 70(1), 71–79.

  2. Kaplan, R. S., & Norton, D. P. (1996). The Balanced Scorecard: Translating Strategy into Action. Harvard Business School Press.

  3. Croll, A., & Yoskovitz, B. (2013). Lean Analytics: Use Data to Build a Better Startup Faster. O'Reilly Media.

  4. Goodhart, C. A. E. (1975). "Problems of Monetary Management: The U.K. Experience." In Papers in Monetary Economics (Vol. 1). Reserve Bank of Australia.

  5. Hubbard, D. W. (2014). How to Measure Anything: Finding the Value of Intangibles in Business (3rd ed.). Wiley.

  6. Marr, B. (2012). Key Performance Indicators: The 75+ Measures Every Manager Needs to Know. FT Press.

  7. Austin, R. D. (1996). Measuring and Managing Performance in Organizations. Dorset House.

  8. Parmenter, D. (2015). Key Performance Indicators: Developing, Implementing, and Using Winning KPIs (3rd ed.). Wiley.

  9. Behn, R. D. (2003). "Why Measure Performance? Different Purposes Require Different Measures." Public Administration Review, 63(5), 586–606.

  10. Kerr, S. (1975). "On the Folly of Rewarding A, While Hoping for B." Academy of Management Journal, 18(4), 769–783.

  11. Meyer, M. W., & Gupta, V. (1994). "The Performance Paradox." Research in Organizational Behavior, 16, 309–369.

  12. Haas, M. R., & Kleingeld, A. (1999). "Multilevel Design of Performance Measurement Systems: Enhancing Strategic Dialogue Throughout the Organization." Management Accounting Research, 10(3), 233–261.

  13. De Waal, A. A. (2003). "Behavioral Factors Important for the Successful Implementation and Use of Performance Management Systems." Management Decision, 41(8), 688–697.

  14. Eccles, R. G. (1991). "The Performance Measurement Manifesto." Harvard Business Review, 69(1), 131–137.

  15. Neely, A., Gregory, M., & Platts, K. (2005). "Performance Measurement System Design: A Literature Review and Research Agenda." International Journal of Operations & Production Management, 25(12), 1228–1263.


About This Series: This article is part of a larger exploration of measurement, metrics, and evaluation. For related concepts, see [Why Metrics Often Mislead], [Goodhart's Law Breaks Metrics], [Vanity Metrics vs Meaningful Metrics], and [KPIs Explained Without Buzzwords].

Frequently Asked Questions

What makes a measurement system useful?

Clear alignment with goals, actionable metrics, resistant to gaming, appropriate granularity, timely data, and actually influences decisions.

How do you design a measurement system?

Start with strategy, identify key drivers, select 3-7 core metrics per goal, balance leading and lagging indicators, test and iterate.

Should measurement systems be comprehensive?

No. Focus beats comprehensiveness. Too many metrics create noise, dilute attention, and make nothing seem important.

What is the balanced scorecard approach?

Measuring multiple perspectives—financial, customer, internal processes, learning/growth—to prevent over-optimization of any single dimension.

How do you prevent gaming in measurement systems?

Use multiple complementary metrics, focus on outcomes over outputs, avoid rigid targets, maintain qualitative judgment, rotate metrics.

When should measurement systems change?

When strategy shifts, when metrics get gamed, when they no longer predict outcomes, or when they stop informing decisions.

What's the right frequency for measurement?

Match to decision cycles—measure often enough to inform action but not so frequently that noise overwhelms signal.

How do you know if your measurement system works?

Metrics inform decisions, improvements in metrics correlate with real performance, gaming is minimal, and goals are actually advancing.