You measure what matters, right? Revenue, user growth, engagement, efficiency. You track KPIs, build dashboards, review metrics weekly. You're data-driven. Yet decisions don't improve. Teams game the numbers. Efforts misalign. The measurement system that should guide you creates confusion instead.
The problem isn't measuring—it's measuring badly. Most measurement systems suffer from predictable failures: too many metrics (nothing is important), wrong metrics (measure activity not outcomes), gaming-prone metrics (optimize the number not the goal), or disconnected metrics (no relationship to strategy). A useful measurement system does the opposite: focuses attention, reveals truth, resists gaming, and actually improves decisions.
As Douglas Hubbard argues in How to Measure Anything, "If it's important enough to manage, it's important enough to measure—and if it seems immeasurable, that is usually just a failure of imagination."
Designing measurement systems that work requires understanding what makes metrics useful, how systems fail, and how to build frameworks that inform rather than mislead.
What Makes a Measurement System Useful?
The Purpose of Measurement
Not to track everything. To improve decision-making and actions.
"In God we trust; all others must bring data." — W. Edwards Deming, statistician and quality management pioneer
A useful measurement system:
- Clarifies what success looks like
- Reveals when you're on or off track
- Informs resource allocation
- Enables learning and improvement
- Aligns team efforts
A useless measurement system:
- Generates reports no one uses
- Measures activity without outcomes
- Creates perverse incentives
- Obscures reality behind metrics
- Diverts effort to gaming numbers
Characteristics of Useful Measurement Systems
| Characteristic | Why It Matters |
|---|---|
| Aligned with strategy | Metrics must connect to actual goals, not proxy activities |
| Actionable | Data should inform specific decisions; if no action possible, why measure? |
| Timely | Data arrives when decisions are made, not weeks later |
| Balanced | Multiple perspectives prevent over-optimization of one dimension |
| Simple | Few, clear metrics beat many confused ones |
| Gaming-resistant | Hard to manipulate without actual improvement |
| Leading and lagging | Predict future (leading) and confirm results (lagging) |
The Fundamental Tension: Comprehensiveness vs. Focus
The Comprehensive Measurement Trap
Natural impulse: Measure everything that might matter.
Result:
- 50+ metrics tracked
- Nobody knows which matter most
- Cognitive overload
- Everything measured, nothing managed
Problem: When everything is important, nothing is important.
Focus Beats Comprehensiveness
Research finding: Organizations with 3-7 key metrics per goal outperform those with 20+ metrics.
As Peter Drucker observed, "What gets measured gets managed—but only if you measure the right things. Measure the wrong things and you manage the wrong things."
Why focus works:
| Focused System (3-7 metrics) | Comprehensive System (20+ metrics) |
|---|---|
| Clear priorities | Confused priorities |
| Memorable | Forgettable |
| Attention concentrated | Attention diffused |
| Gaming visible | Gaming hidden in noise |
| Actionable insights | Overwhelming data |
Rule: If you can't remember your key metrics, you have too many.
The 80/20 of Measurement
Principle: 20% of metrics provide 80% of decision value.
Implication: Identify critical few, track rigorously. Ignore rest or check only occasionally.
Example:
| Organization | Critical Few Metrics | Secondary/Occasional |
|---|---|---|
| SaaS company | MRR growth, net revenue retention, CAC:LTV | 20+ other metrics (track quarterly) |
| Hospital | Patient outcomes, readmission rate, safety incidents | Operational efficiency metrics |
| University | Graduation rate, job placement, research output | Countless process metrics |
The discipline: Resisting the urge to promote everything to "key metric" status.
Step 1: Start With Strategy
Metrics Must Connect to Goals
Broken approach:
- Pick metrics because they're measurable
- Track metrics because competitors do
- Measure what's easy to measure
Effective approach:
- Define strategic goals
- Identify drivers of those goals
- Measure drivers
The Strategy-Metrics Cascade
| Level | Question | Example |
|---|---|---|
| Mission | Why do we exist? | "Make knowledge accessible" |
| Strategic Goal | What does success look like? | "Be primary resource for 10M learners" |
| Key Driver | What causes goal achievement? | "Content quality + discoverability" |
| Metric | How do we measure driver? | "Content depth score, organic traffic, retention rate" |
Alignment test: Can you trace each metric back to strategic goal? If not, why measure it?
Common Misalignment Problems
| Problem | Example | Fix |
|---|---|---|
| Activity metrics | "Articles published" | Measure outcomes: "Knowledge gained (retention, application)" |
| Vanity metrics | "Total registered users" | Measure engagement: "Active users, completion rates" |
| Lagging only | "Annual revenue" | Add leading: "Pipeline velocity, win rate" |
| One-dimensional | "Revenue only" | Add: "Customer satisfaction, product quality" |
Step 2: Identify Key Performance Drivers
What Drives Success?
Critical question: What factors, if improved, would most advance strategic goals?
Framework:
| Goal | Key Drivers | How to Identify |
|---|---|---|
| Revenue growth | New customer acquisition, retention, expansion | Historical analysis, cohort studies |
| Customer satisfaction | Product quality, support responsiveness, ease of use | Surveys, correlation analysis |
| Operational efficiency | Process bottlenecks, automation level, error rates | Value stream mapping, time studies |
Leading vs. Lagging Indicators
Lagging indicators:
- Measure results
- Historical (what happened)
- Hard to influence directly
- Examples: Revenue, profit, market share
Leading indicators:
- Predict future results
- Forward-looking
- Actionable
- Examples: Sales pipeline, customer retention, product quality
A balanced system needs both:
| Lagging (Outcome) | Leading (Driver) |
|---|---|
| Revenue | Sales pipeline value, win rate |
| Customer satisfaction | Support ticket resolution time, product bugs |
| Employee retention | Employee engagement scores |
| Market share | Product quality ratings, brand awareness |
Rule: If system has only lagging indicators, you know results but can't improve them.
"A system that produces data but no learning is not a measurement system—it is a reporting system. The two are not the same." — Russell Ackoff, systems theorist and organizational theorist
Step 3: Select Core Metrics
The Selection Process
For each strategic goal:
- Identify 2-4 key drivers
- For each driver, select 1-2 metrics
- Result: 3-7 metrics per goal
Example: SaaS Company's Growth Goal
| Driver | Metric 1 | Metric 2 |
|---|---|---|
| Acquisition | New MRR | CAC (Customer Acquisition Cost) |
| Retention | Net Revenue Retention | Churn rate |
| Expansion | Expansion MRR | % customers expanding |
Total: 6 core metrics
Criteria for Good Metrics
A good metric is:
| Criterion | Definition | Example |
|---|---|---|
| Understandable | Anyone can grasp meaning | "Customer retention %" vs "Complex cohort survival index" |
| Comparable | Trends over time, benchmarks | Month-over-month, industry comparison |
| Ratio or rate | Normalized (not absolute) | "Conversion rate" better than "conversions" |
| Behavior-changing | Influences decisions | Revenue per customer → focus on expansion |
Source: Lean Analytics by Croll & Yoskovitz
The SMART Metric Test
Metrics should be:
| Attribute | Question | Bad Example | Good Example |
|---|---|---|---|
| Specific | Precisely defined? | "User engagement" | "Daily active users (logged in + action)" |
| Measurable | Can be quantified? | "Brand strength" | "Net Promoter Score" |
| Actionable | Can you influence it? | "Market conditions" | "Sales conversion rate" |
| Relevant | Connects to goal? | "Page views" (vanity) | "Content completion rate" (engagement) |
| Time-bound | Has update frequency? | "Eventually" | "Updated weekly" |
Step 4: Balance Multiple Perspectives
The Balanced Scorecard Framework
Problem: Over-optimization of one dimension damages others.
Solution: Measure across multiple perspectives.
Kaplan & Norton's Balanced Scorecard (1992):
| Perspective | Questions | Example Metrics |
|---|---|---|
| Financial | How do we look to shareholders? | Revenue growth, profitability, ROI |
| Customer | How do customers see us? | Satisfaction, retention, NPS |
| Internal Process | What must we excel at? | Cycle time, quality, innovation rate |
| Learning & Growth | How can we improve? | Employee skills, engagement, R&D investment |
Key insight: Excellence in all four predicts long-term success; optimizing only financial metrics often destroys value.
Example: Hospital Measurement System
Balanced approach:
| Dimension | Metric | Why |
|---|---|---|
| Clinical outcomes | Mortality rate, complication rate | Core mission |
| Patient experience | Satisfaction scores, wait times | Quality of care |
| Operational | Bed utilization, procedure cost | Efficiency |
| Staff | Nurse turnover, training hours | Capability |
| Financial | Operating margin | Sustainability |
Prevents: Cutting costs at expense of outcomes, or maximizing satisfaction at expense of financial viability.
Step 5: Build Gaming Resistance
Goodhart's Law: "When a measure becomes a target, it ceases to be a good measure"
Mechanism:
- People optimize for metric
- Metric diverges from underlying goal
- Metric becomes meaningless
Examples:
| Metric as Target | Gaming Behavior | True Goal Undermined |
|---|---|---|
| Call center: Calls handled | Rush customers off phone | Customer satisfaction |
| Hospital: Mortality rate | Refuse high-risk patients | Patient care |
| Software: Lines of code | Write verbose code | Code quality |
| Sales: Number of deals | Close small, unprofitable deals | Revenue quality |
Strategies to Reduce Gaming
Strategy 1: Use Complementary Metrics
Approach: Pair metrics that counterbalance each other.
| Metric A (Can Be Gamed) | Metric B (Prevents Gaming) | Effect |
|---|---|---|
| Quantity (calls handled) | Quality (customer satisfaction) | Can't rush if quality measured |
| Speed (response time) | Accuracy (error rate) | Can't be fast and sloppy |
| Revenue | Customer acquisition cost | Can't buy revenue at any price |
| Growth | Retention | Can't churn through customers |
Strategy 2: Focus on Outcomes, Not Outputs
| Output (Gameable) | Outcome (Meaningful) |
|---|---|
| Features shipped | Customer problems solved |
| Marketing campaigns run | Leads generated, conversion rate |
| Training hours delivered | Skills demonstrated, performance improvement |
| Reports produced | Decisions informed, actions taken |
Principle: Measure results, not activities.
Strategy 3: Maintain Qualitative Judgment
Don't rely solely on quantitative metrics.
Hybrid approach:
| Quantitative Metric | Qualitative Assessment |
|---|---|
| Sales conversion rate | Win/loss analysis: why we won/lost |
| Customer satisfaction score | Customer interviews: what matters |
| Code quality metrics | Peer code review: actual quality judgment |
Reason: Numbers are gameable; human judgment (properly structured) is harder to fool.
As Donald Wheeler, statistician and quality expert, puts it: "Every data set contains noise. Some data sets also contain signals. Before you can detect a signal, you have to filter out the noise." Pure quantitative data without judgment amplifies that noise.
Strategy 4: Rotate or Evolve Metrics
When a metric becomes target:
- Gaming strategies develop
- Metric loses predictive power
Solution: Periodically change what you measure
Example: Google reportedly rotates quality metrics to prevent SEO gaming.
Step 6: Set Appropriate Measurement Frequency
Match Frequency to Decision Cycle
Principle: Measure as often as you need to make decisions, no more.
| Metric | Typical Frequency | Why |
|---|---|---|
| Financial results | Monthly/Quarterly | Slow-moving, decision cycle is monthly |
| Website traffic | Daily/Weekly | Fast-moving, can react quickly |
| Customer satisfaction | Quarterly | Changes slowly, surveys have cost |
| Employee engagement | Annually/Biannually | Slow to change, survey fatigue issue |
The Noise vs. Signal Trade-off
High-frequency measurement:
- Pro: Detect changes quickly
- Con: Noise overwhelms signal; random variation looks meaningful
Low-frequency measurement:
- Pro: Clearer trends
- Con: Miss timely intervention opportunities
Example:
| Daily Revenue Tracking | Monthly Revenue Tracking |
|---|---|
| See random fluctuations | See clear trends |
| Panic over noise | Respond to actual changes |
| Constant reaction | Thoughtful response |
Best practice: Track high-frequency, decide at lower frequency (moving averages, trend lines).
Step 7: Test and Iterate
Metrics Are Hypotheses
Initial metrics are guesses about what matters.
Test:
- Do improvements in metric correlate with actual goal progress?
- Do teams make better decisions with this metric?
- Is metric being gamed?
If not, change the metric.
The Validation Process
| Question | How to Test | Action If Fails |
|---|---|---|
| Does metric predict outcome? | Correlation analysis | Replace with better predictor |
| Do decisions improve? | Decision audit | Simplify or reframe metric |
| Is it gamed? | Behavior observation | Add counterbalancing metric |
| Is it used? | Review meeting analysis | Remove metric if unused |
Evolution Over Time
As organization matures:
| Early Stage | Growth Stage | Mature Stage |
|---|---|---|
| Focus: Survival, product-market fit | Focus: Scaling, efficiency | Focus: Optimization, innovation |
| Metrics: Cash runway, user feedback | Metrics: Growth rate, unit economics | Metrics: Market share, profitability |
Measurement system must evolve with strategy.
Common Measurement System Mistakes
Mistake 1: Too Many Metrics
Problem: 50+ metrics tracked
Result:
- No clear priorities
- Gaming hidden in complexity
- Analysis paralysis
Fix: Ruthlessly prune to 3-7 per major goal
Mistake 2: Measuring Only Lagging Indicators
Problem: Only track outcomes (revenue, profit)
Result: Know when you've failed, but can't prevent failure
Fix: Add leading indicators (pipeline, quality, engagement)
Mistake 3: No Connection to Strategy
Problem: Metrics chosen because they're available
Result: Measure things that don't matter
Fix: Start with strategy, derive metrics
Mistake 4: One-Dimensional Measurement
Problem: Financial metrics only
Result: Short-term optimization, long-term value destruction
Fix: Balanced scorecard approach
Mistake 5: Static Metrics
Problem: Never change what you measure
Result: Gaming develops, metrics lose meaning
Fix: Periodic review and evolution
Mistake 6: Targets Without Context
Problem: "Increase X by 20%"
Result: Gaming, sandbagging, arbitrary goals
Fix: Understand drivers; set targets based on what's achievable and valuable
Advanced Concepts
Diagnostic vs. Prescriptive Metrics
Diagnostic metrics: Tell you what happened Prescriptive metrics: Tell you what to do
Example:
| Diagnostic | Prescriptive |
|---|---|
| "Revenue dropped 10%" | "Win rate decreased because competitive pricing changed; need new positioning" |
| "Churn increased" | "Customers churning lack feature X; prioritize development" |
Best systems: Provide both diagnosis and prescription.
Metrics at Different Organizational Levels
Different levels need different metrics:
| Level | Focus | Metric Examples |
|---|---|---|
| Executive | Strategic progress | Market share, brand strength, financial health |
| Department | Function performance | Sales conversion, product quality, support satisfaction |
| Team | Operational execution | Story points completed, bugs fixed, calls handled |
| Individual | Personal contribution | Tasks completed, skills developed, feedback scores |
Alignment: Individual → Team → Department → Executive metrics should cascade.
Real-Time vs. Periodic Dashboards
Real-time dashboards:
- For operational metrics (website uptime, system load)
- When immediate action required
Periodic reporting:
- For strategic metrics (market position, brand)
- When thoughtful analysis needed
Mistake: Making everything real-time creates noise and urgency bias.
Case Study: Redesigning a Failed Measurement System
The Problem
Software company with broken metrics:
| Old Metric | Problem |
|---|---|
| Lines of code written | Incentivized verbose, low-quality code |
| Features shipped | Quantity over quality; features nobody used |
| Bug count | Hid bugs by not reporting them |
| Sprint velocity | Inflated story point estimates |
Result: Metrics looked good, product quality terrible, customers churning.
The Redesign Process
Step 1: Strategy clarity
- Goal: Build product customers love and retain
Step 2: Identify drivers
- Product quality
- Customer value delivered
- Team capability
Step 3: New metrics
| Old Metric | New Metric | Why Better |
|---|---|---|
| Lines of code | Code quality score (peer review + automated analysis) | Measures quality |
| Features shipped | Features adopted (% customers using) | Measures value |
| Bug count | Customer-reported bugs, time to fix | Can't hide; measures impact |
| Sprint velocity | Delivered value (customer outcome) | Focuses on outcomes |
Step 4: Balance
- Added customer satisfaction (quarterly NPS)
- Added team health (engagement survey)
Step 5: Gaming resistance
- Multiple complementary metrics
- Qualitative review (demos, code review)
- Metric rotation (change technical quality metrics annually)
The Results
After 6 months:
- Code quality improved (fewer production bugs)
- Feature adoption increased (only valuable features built)
- Customer retention improved
- Team satisfaction increased (not gaming metrics)
Key insight: Fewer, better metrics focused on outcomes beat many activity metrics.
Practical Implementation
Building Your Measurement System
Timeline:
| Phase | Duration | Activities |
|---|---|---|
| 1. Strategy | 1-2 weeks | Clarify goals, identify drivers |
| 2. Metric design | 2-3 weeks | Select metrics, define calculation |
| 3. Infrastructure | 4-8 weeks | Build data collection, dashboards |
| 4. Pilot | 1-3 months | Test with one team/function |
| 5. Refine | 2-4 weeks | Fix issues discovered in pilot |
| 6. Rollout | 4-8 weeks | Extend to organization |
| 7. Ongoing | Continuous | Review quarterly, evolve as needed |
The Measurement System Document
Create written document:
| Section | Contents |
|---|---|
| Strategy | Goals, key drivers |
| Core metrics | 3-7 per major goal, with definitions |
| Calculation | Exactly how each metric computed |
| Frequency | How often measured, reported |
| Ownership | Who responsible for each metric |
| Targets | Expected ranges (not rigid) |
| Review process | How often system itself reviewed |
Purpose: Clarity, alignment, reference.
Communication and Adoption
Measurement systems fail without adoption.
Keys to adoption:
| Factor | How |
|---|---|
| Clarity | Everyone understands what metrics mean |
| Relevance | Metrics connect to daily work |
| Visibility | Dashboards accessible, discussed in meetings |
| Action | Metrics inform actual decisions |
| Trust | Metrics seen as fair, not punitive |
Conclusion: Measurement as a System
Key principles:
- Focus beats comprehensiveness (3-7 metrics per goal)
- Start with strategy (metrics derive from goals)
- Balance dimensions (financial, customer, process, growth)
- Resist gaming (complementary metrics, qualitative judgment)
- Match frequency to decisions (measure when you can act)
- Iterate (metrics are hypotheses; test and evolve)
Good measurement systems:
- Clarify priorities
- Reveal truth
- Inform decisions
- Resist manipulation
- Evolve with strategy
Bad measurement systems:
- Obscure priorities
- Create gaming
- Generate reports nobody uses
- Persist unchanged
- Disconnect from goals
The difference is design. Measurement is too important to do accidentally.
What Research Shows About Measurement System Design
Forty years of research on organizational performance measurement systems has produced substantial, convergent findings about what makes these systems work or fail. Several researchers have been particularly influential.
Robert Kaplan and David Norton's Balanced Scorecard research (beginning with their 1992 Harvard Business Review paper and continuing through multiple books) established the foundational empirical case for multi-perspective measurement. Their research across hundreds of organizations showed that companies relying solely on financial measurement systems consistently underinvested in the drivers of future performance. The mechanism was straightforward: financial metrics are lagging indicators that reflect decisions made 12 to 24 months earlier. By the time a decline in customer satisfaction or process quality shows up in financial results, the causal factors have typically been deteriorating for years. A measurement system that includes only financial metrics provides no early warning.
Kaplan and Norton's research also identified a subtler failure: even companies that tracked customer and operational metrics alongside financial ones frequently failed to connect them. They tracked employee training hours, customer satisfaction scores, and process cycle times, but could not explain how improvements in any one of them were expected to drive improvements in another. The strategy map framework they developed in response required organizations to specify explicit causal hypotheses: we believe that improving employee skills in X will reduce defect rates in process Y, which will improve customer retention Z, which will grow revenue W. Each arrow was a testable hypothesis. This turned measurement system design from a data collection exercise into a scientific program for learning how the organization actually creates value.
W. Edwards Deming's statistical process control framework provides the operational foundation for useful measurement system design. Deming's insight, developed from Walter Shewhart's earlier work at Bell Labs and applied most influentially in postwar Japan, was that most variation in organizational outcomes is produced by system factors, not individual performance. When quality problems occur in a manufacturing process, approximately 85 percent of the variation is attributable to the process itself -- materials, equipment, procedures, environmental conditions -- and only 15 percent to individual worker behavior. This has a direct implication for measurement system design: systems should be designed to reveal process variation and enable system improvement, not to evaluate and rank individuals.
Deming's control charts provided a specific measurement tool for this purpose: tracking a metric over time and distinguishing between common cause variation (random fluctuation within a stable system) and special cause variation (signals that the system has changed). This distinction is critical for useful measurement systems: responding to common cause variation as though it were a signal produces "tampering" -- interventions that increase rather than decrease overall variation. Measurement systems that lack this capability for distinguishing signal from noise consistently lead to management by exception that makes things worse.
Donald Campbell's program evaluation research shaped the design of public sector measurement systems through his documentation of what he called "experimenting society" -- the idea that social programs should be treated as experiments that generate data for improvement rather than political commitments that must be defended. Campbell's Law, as formalized in 1979, was drawn from his observation that social programs evaluated on narrow outcome metrics consistently gamed those metrics. His proposed solution was methodological pluralism: measurement systems should use multiple methods with different vulnerability profiles. A program that can game its primary quantitative metric is less likely to successfully game a qualitative case study investigation, a randomized controlled trial, or a population-level administrative data analysis simultaneously.
Douglas Hubbard's measurement economics framework addresses the cost-benefit dimension of measurement system design. His central argument in How to Measure Anything (2014) is that organizations systematically build too large measurement systems because they do not apply economic analysis to measurement decisions. Every measurement has a cost: data collection, analysis, storage, and the opportunity cost of attention. Every measurement has an expected benefit: the expected value of the information for decisions. A measurement worth making is one where the expected benefit exceeds the cost. Hubbard's research found that most organizations can eliminate 60 to 80 percent of their tracked metrics without significant decision quality loss, because most metrics are either redundant, not decision-relevant, or provide information about questions the organization already has sufficient certainty to decide.
Real-World Case Studies in Measurement System Design
Intel's OKR measurement architecture. Andy Grove's implementation of OKRs at Intel in the 1970s is the most influential example of measurement system design in technology companies. Grove's system had several design features that addressed specific failure modes. First, objectives were qualitative (directional goals) while key results were quantitative (specific, measurable outcomes) -- this separated goal-setting from measurement, preventing the confusion between what you want to achieve and what you can count. Second, OKRs were set and reviewed quarterly, creating short feedback loops that allowed rapid adjustment when metrics proved not to predict the outcomes they were supposed to track. Third, OKRs were transparent across the organization -- everyone could see what every team was measuring and why. This transparency created horizontal accountability: teams could not optimize their own metrics in ways that damaged others' outcomes without it being visible.
When Google adopted OKRs in 1999, it added a specific design element: key results were expected to be aspirational, with a "sweet spot" achievement rate of 60 to 70 percent. Consistently achieving 100 percent on key results indicated that goals were too conservative -- teams were sandbagging to ensure they hit numbers rather than stretching to maximize value creation. This design feature directly addressed a failure mode that Goodhart's Law predicts: when key results become targets, people set them at levels they can comfortably achieve. The 60 to 70 percent norm built anti-sandbagging pressure into the measurement system itself.
The NHS balanced measurement evolution. The National Health Service's journey from single-metric to multi-dimensional measurement illustrates how measurement systems should evolve in response to evidence of gaming. The initial focus on waiting time metrics (introduced in the early 2000s) produced documented improvement in waiting times alongside documented gaming: ambulances held outside emergency departments, administrative pausing of waiting lists, reclassification of referrals. The response, developed through the NHS Institute for Innovation and Improvement and documented in multiple Audit Commission reports, was to expand the measurement system to make gaming one dimension costly on others.
The resulting NHS measurement framework includes: clinical outcome metrics (mortality, complication, readmission rates), patient experience metrics (from independent patient surveys), safety metrics (adverse events, near-misses, medication errors), access metrics (waiting times), and efficiency metrics (cost per episode, bed utilization). Gaming waiting times at the expense of patient safety would now be visible in the safety metrics. Gaming outcome metrics by avoiding high-risk patients would show up in access metrics. No single metric could be improved through pure administrative manipulation without creating signals in other dimensions. This is the core principle of complementary metric design: each metric limits the gaming space for the others.
Enron's measurement system failure. Enron's collapse illustrates what happens when measurement systems are designed to report favorably rather than to reveal truth. The company's reporting metrics (revenue, earnings per share, credit ratings, analyst recommendations) were all technically compliant with applicable standards. The measurement system failure was not fraud (though fraud existed) but design: the system measured what could be reported in the most favorable terms under existing rules, rather than what actually indicated business health.
Jeff Skilling, Enron's CEO, had an MBA and was sophisticated about financial measurement. The measurement system he oversaw tracked mark-to-market revenue (projected future cash flows counted as current income), managed earnings per share through asset disposals timed for quarterly reporting cycles, and maintained credit ratings through off-balance-sheet debt vehicles. Each individual metric was technically defensible. The ensemble was systematically misleading. A well-designed measurement system would have required cash flow from operations alongside revenue (immediately revealing the divergence), economic value added rather than accounting earnings, and transparency about off-balance-sheet obligations. The absence of these measures was not oversight -- it was design.
Toyota's visual measurement system. Toyota's production system, which influenced lean manufacturing methodology globally, embedded measurement into the physical production process rather than treating it as a separate reporting function. The andon cord -- which any worker could pull to stop the production line when a defect was detected -- created real-time measurement at the point of production. The quality measurement was inseparable from the production process itself. This design feature eliminated several common measurement system failures: no reporting lag (the measurement happened when the event occurred), no misalignment between who detects problems and who reports them (the worker who detected was the worker who triggered measurement), and no disincentive to report problems (the expected response was investigation and improvement, not punishment).
The Toyota system also built specific measurement system features to resist gaming: stopping the line was rewarded, not penalized. Workers who identified problems frequently were recognized as contributors to improvement. This directly addressed the failure mode in which measurement systems designed around punishment incentivize concealment rather than identification of problems.
Evidence-Based Principles for Useful Measurement System Design
Principle 1: Design for learning, not for reporting. The most consistent finding across Deming, Kaplan and Norton, and Campbell is that measurement systems designed primarily to produce favorable reports consistently fail to provide the information needed for improvement. Useful measurement systems are designed around a different question: what would we need to know to make better decisions and improve performance? The reporting function follows from this design; it does not drive it.
Principle 2: Build causal theories before selecting metrics. Kaplan and Norton's strategy maps, Grove's OKR objective-key result distinction, and Hubbard's decision analysis framework all converge on the same principle: measurement selection should be driven by explicit causal theories about how activities produce outcomes. The theory specifies what to measure (the causal factors), what the relationship should be (the predicted direction and magnitude), and what evidence would confirm or refute the theory. Without this structure, measurement systems become collections of available data rather than instruments for testing causal hypotheses.
Principle 3: Use complementary metrics that limit each other's gaming space. A single metric can almost always be improved through gaming. Multiple metrics that measure different aspects of the same goal create trade-offs that make gaming costly. Speed and quality metrics together (customer support response time and resolution quality) are harder to game simultaneously than either alone. Revenue and customer satisfaction metrics together resist strategies that boost revenue at the expense of customer relationships. The design principle is to identify the most likely gaming strategies for each metric and then select complementary metrics that would make those strategies visible or costly.
Principle 4: Match measurement frequency to decision cycles. Deming's statistical process control insights apply directly to measurement frequency. Measuring too frequently produces noise that overwhelms signal, leading to overreaction to random variation. Measuring too infrequently misses genuine changes in time to act on them. The appropriate frequency depends on the natural variability of the process and the decision cycle: operational metrics that inform daily decisions need daily or real-time measurement; strategic metrics that inform quarterly resource allocation decisions need monthly or quarterly measurement. A common design failure is applying the measurement frequency appropriate for operational metrics to strategic metrics, creating the illusion of signal in what is primarily noise.
References
Kaplan, R. S., & Norton, D. P. (1992). "The Balanced Scorecard: Measures That Drive Performance." Harvard Business Review, 70(1), 71–79.
Kaplan, R. S., & Norton, D. P. (1996). The Balanced Scorecard: Translating Strategy into Action. Harvard Business School Press.
Croll, A., & Yoskovitz, B. (2013). Lean Analytics: Use Data to Build a Better Startup Faster. O'Reilly Media.
Goodhart, C. A. E. (1975). "Problems of Monetary Management: The U.K. Experience." In Papers in Monetary Economics (Vol. 1). Reserve Bank of Australia.
Hubbard, D. W. (2014). How to Measure Anything: Finding the Value of Intangibles in Business (3rd ed.). Wiley.
Marr, B. (2012). Key Performance Indicators: The 75+ Measures Every Manager Needs to Know. FT Press.
Austin, R. D. (1996). Measuring and Managing Performance in Organizations. Dorset House.
Parmenter, D. (2015). Key Performance Indicators: Developing, Implementing, and Using Winning KPIs (3rd ed.). Wiley.
Behn, R. D. (2003). "Why Measure Performance? Different Purposes Require Different Measures." Public Administration Review, 63(5), 586–606.
Kerr, S. (1975). "On the Folly of Rewarding A, While Hoping for B." Academy of Management Journal, 18(4), 769–783.
Meyer, M. W., & Gupta, V. (1994). "The Performance Paradox." Research in Organizational Behavior, 16, 309–369.
Haas, M. R., & Kleingeld, A. (1999). "Multilevel Design of Performance Measurement Systems: Enhancing Strategic Dialogue Throughout the Organization." Management Accounting Research, 10(3), 233–261.
De Waal, A. A. (2003). "Behavioral Factors Important for the Successful Implementation and Use of Performance Management Systems." Management Decision, 41(8), 688–697.
Eccles, R. G. (1991). "The Performance Measurement Manifesto." Harvard Business Review, 69(1), 131–137.
Neely, A., Gregory, M., & Platts, K. (2005). "Performance Measurement System Design: A Literature Review and Research Agenda." International Journal of Operations & Production Management, 25(12), 1228–1263.
About This Series: This article is part of a larger exploration of measurement, metrics, and evaluation. For related concepts, see [Why Metrics Often Mislead], [Goodhart's Law Breaks Metrics], [Vanity Metrics vs Meaningful Metrics], and [KPIs Explained Without Buzzwords].
Frequently Asked Questions
What makes a measurement system useful?
Clear alignment with goals, actionable metrics, resistant to gaming, appropriate granularity, timely data, and actually influences decisions.
How do you design a measurement system?
Start with strategy, identify key drivers, select 3-7 core metrics per goal, balance leading and lagging indicators, test and iterate.
Should measurement systems be comprehensive?
No. Focus beats comprehensiveness. Too many metrics create noise, dilute attention, and make nothing seem important.
What is the balanced scorecard approach?
Measuring multiple perspectives—financial, customer, internal processes, learning/growth—to prevent over-optimization of any single dimension.
How do you prevent gaming in measurement systems?
Use multiple complementary metrics, focus on outcomes over outputs, avoid rigid targets, maintain qualitative judgment, rotate metrics.
When should measurement systems change?
When strategy shifts, when metrics get gamed, when they no longer predict outcomes, or when they stop informing decisions.
What's the right frequency for measurement?
Match to decision cycles—measure often enough to inform action but not so frequently that noise overwhelms signal.
How do you know if your measurement system works?
Metrics inform decisions, improvements in metrics correlate with real performance, gaming is minimal, and goals are actually advancing.