You announce a new metric: customer response time will now be tracked for every support interaction. Within days, support tickets get closed faster. Success? Not quite. Customers complain about receiving "Is this resolved?" messages before their issue is actually fixed. Agents discovered they could close tickets quickly by

asking customers to confirm resolution, then reopening if needed—technically improving the metric while degrading actual support quality.

This is measurement's paradox: the act of measuring changes what you're measuring. Announce you'll track something, and behavior immediately shifts—sometimes in desirable directions, often in perverse ones. The metric itself becomes the objective, not the underlying goal it was meant to represent.

This phenomenon appears everywhere: students study to the test instead of learning, employees optimize measured behaviors while neglecting unmeasured but critical work, organizations hit metric targets while real performance deteriorates. Understanding why and how measurement changes behavior is essential for using metrics effectively without being deceived by them.


The Core Mechanisms

Mechanism 1: Attention and Focus

When something is measured, it captures attention.

Before measurement:

  • Support team handles tickets based on judgment
  • Some agents prioritize complex issues
  • Others focus on quick wins
  • Natural variation, no clear priority signal

After announcing response time metric:

  • Entire team focuses on response time
  • Complex issues get delayed (hurt response time average)
  • Quick issues get prioritized
  • Unmeasured aspects (solution quality, customer satisfaction) get less attention

Why it happens:

  • Limited attention and working memory
  • Measurement creates salience
  • Tracked items feel more important
  • Untracked items fade to background

The principle: What gets measured gets attention. What doesn't get measured gets ignored.

"You can expect what you inspect." — W. Edwards Deming, quality management theorist and statistician


Mechanism 2: Accountability and Evaluation

Measurement creates accountability.

When performance is measured:

  • Results become visible
  • Comparisons become possible (across people, teams, time periods)
  • Evaluation feels imminent
  • Stakes feel higher

Behavioral response:

  • Increased effort (positive)
  • Strategic behavior to look good (neutral to negative)
  • Gaming and manipulation (negative)

The relationship between incentives and behavior is rarely straightforward—people respond to what is rewarded, not necessarily what is intended.

Example: Sales Quotas

Before quotas measured:

  • Salespeople balance short-term deals with relationship building
  • Mix of large and small deals
  • Long-term customer value prioritized

After monthly quotas measured:

  • End-of-month scramble to hit numbers
  • Discount offers to close marginal deals
  • Pressure customers for early commitments
  • Delay deals to next month if quota already hit (sandbagging)

Metric changed behavior, not always positively.


Mechanism 3: Feedback Loops

Measurement provides feedback that enables learning and adjustment.

Positive feedback loops:

  • See metric → understand performance → adjust behavior → see result
  • Enables improvement when metric aligns with goals

Example: Personal fitness tracker

  • See daily step count
  • Notice low days
  • Adjust: take stairs, walk during lunch
  • See improvement
  • Reinforced behavior

But feedback cuts both ways—it also enables gaming.

Negative feedback loops:

  • See metric → realize it's tracked → optimize metric appearance
  • Behavior shifts to metric, not underlying goal

Example: Teacher evaluation by test scores

  • See that raises depend on scores
  • Realize teaching to test boosts scores more than deep learning
  • Shift instruction: test-taking strategies, narrow curriculum
  • Scores improve, actual learning may decline

Mechanism 4: Signaling and Interpretation

Choosing to measure something sends a message.

Implicit message: "This is important."

Even without explicit consequences, measurement signals priorities.

Example: Company adds "innovation" metric

Announced: "We'll track number of ideas submitted per employee"

Interpretation (even if unstated):

  • Innovation matters to leadership
  • Quantity of ideas is valued
  • Submitting ideas is career-positive

Behavioral response:

  • More ideas submitted (desired)
  • Many low-quality ideas to hit numbers (undesired)
  • Less time refining good ideas (undesired)

The metric signaled "submit lots of ideas," not "innovate thoughtfully."


The Hawthorne Effect

The Original Studies

Background: Western Electric Hawthorne Works (1924-1932)

Initial question: Does lighting affect productivity?

Study design:

  • Increase lighting → productivity improves
  • Decrease lighting → productivity still improves
  • Change nothing (control) → productivity still improves

Interpretation: Workers improved simply because they were being studied and observed, independent of actual changes.

Conclusion: Observation itself changes behavior.

"One of the most important findings of the Hawthorne studies is that the attitude of the workers toward their supervisors was the crucial factor in determining their efficiency." — Elton Mayo, industrial psychologist and lead interpreter of the Hawthorne Works research


Modern Understanding

Contemporary research refined the original interpretation:

Key factors:

  1. Attention and novelty: Being studied made workers feel special, valued
  2. Feedback: Workers got more feedback during study periods
  3. Autonomy: Research gave workers some control over conditions
  4. Demand characteristics: Workers inferred expectations and complied

Critically: The specific intervention (lighting) mattered less than the fact of being observed and studied.


Implications for Measurement

The Hawthorne effect means:

1. Measuring changes what you're measuring

  • Before measurement: natural behavior
  • During measurement: people aware they're observed alter behavior

2. Short-term improvements may not persist

  • Initial novelty creates bump
  • Effect fades as measurement becomes routine
  • Need to distinguish Hawthorne bump from real improvement

3. Blinding and hidden measurement have ethical issues

  • Can reduce observer effect
  • But raise consent and privacy concerns
  • Often not feasible in organizational settings

Examples in Practice

Example 1: Monitoring employee computer usage

Announced monitoring:

  • Productivity metrics improve immediately
  • People appear more focused
  • Result: Combination of real focus + gaming (looks busy, minimizes non-work windows)

Effect fades:

  • After weeks, people adapt
  • Find ways to appear productive while doing other things
  • Or focus less as monitoring becomes routine

Example 2: Customer satisfaction surveys

When customers know they'll be surveyed:

  • Employees become extra attentive during survey periods
  • Experience improves
  • Scores go up
  • After survey period, attention drops, scores decline

The measurement itself temporarily improved experience—not sustainable changes.


The Observer Effect

Beyond Hawthorne: Measurement as Intervention

Observer effect: The act of measurement changes the system being measured.

Distinction from Hawthorne:

  • Hawthorne: People change behavior when observed
  • Observer effect: Measurement itself alters what's measured (even independent of awareness)

Examples

Example 1: Asking about voting intentions

Phenomenon: Surveying people about voting makes them more likely to vote.

Mechanism:

  • Being asked activates identity ("Am I a person who votes?")
  • Creates commitment (stated intention)
  • Increases salienc (voting now on mind)

Result: Polls don't just measure voting intention—they increase it.


Example 2: Weighing yourself daily

Phenomenon: Daily weigh-ins change weight beyond just awareness.

Mechanism:

  • Weight becomes salient daily
  • Each weigh-in is decision point
  • Creates short feedback loops
  • Motivates micro-adjustments

Result: Daily weighing doesn't just track weight—it influences it.


Example 3: Tracking work hours

Phenomenon: Time tracking changes how people work.

Mechanism:

  • Awareness of time passing alters pace
  • Tasks get broken into trackable units
  • Non-trackable work (thinking, collaboration) may decrease
  • Billing by hour creates incentive to work slowly

Result: Time tracking changes both productivity and work quality.


"What Gets Measured Gets Managed"

The Principle

Common saying: "What gets measured gets managed."

Meaning:

  1. Attention: Measured things receive focus
  2. Accountability: Metrics create responsibility
  3. Improvement: Measurement enables optimization
  4. Prioritization: Unmeasured things deprioritized

When It's Good

Beneficial cases:

Situation Metric Positive Behavior Change
Vague goals Define clear metric Creates focus, enables coordination
Hidden performance Make visible Identifies problems, highlights successes
No feedback Provide measurement Enables learning, adjustment
Ambiguous priorities Measure what matters Aligns team efforts

Example: Safety in manufacturing

Before measurement:

  • Accidents happen but not systematically tracked
  • No visibility into causes
  • Varies by manager

After measurement (days since accident, incident reports):

  • Accidents visible
  • Patterns identified
  • Improvement possible
  • Safety becomes priority

Measurement improved outcomes.


When It's Bad: Goodhart's Law

Goodhart's Law: "When a measure becomes a target, it ceases to be a good measure."

"Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes." — Charles Goodhart, economist and Bank of England adviser, 1975

Why:

  • People optimize for metric, not underlying goal
  • Gaming, distortion, tunnel vision
  • Metric decouples from what it was meant to represent

Example: Teaching to standardized tests

Metric: Student test scores

Intended goal: Improve student learning

What happened:

  • Teachers optimize for test performance specifically
  • Narrow curriculum to tested topics
  • Teach test-taking strategies
  • Actual learning breadth decreases
  • Scores rise, learning quality questionable

Metric (test scores) became target, ceased to represent learning.


The Dual Nature

"What gets measured gets managed" is both:

Aspect Positive Negative
Focus Directs attention to important areas Ignores unmeasured but important factors
Feedback Enables learning and improvement Enables gaming and optimization for metric
Accountability Creates responsibility Creates pressure to hit numbers regardless of method
Alignment Coordinates efforts toward goals Coordinates efforts toward metrics (which may diverge from goals)

Key: Whether measurement improves or degrades performance depends on metric design and organizational response.


Positive Uses of Measurement's Influence

Strategy 1: Measure What Truly Matters

If measurement changes behavior, measure the behavior you want.

Wrong: Measure output (features shipped, calls made)

  • Incentivizes quantity over quality
  • Ignore unmeasured outcomes (user value, deal quality)

Right: Measure outcome (user retention, revenue)

  • Incentivizes actual goal achievement
  • Can't easily game without real improvement

Strategy 2: Use Multiple Complementary Metrics

Single metrics get gamed. Balanced metrics resist gaming.

Example: Customer support

Single metric: Response time

  • Gaming: Close tickets fast, reopen later
  • Degraded: Quality, actual resolution

Balanced metrics:

  • Response time (speed)
  • Customer satisfaction score (quality)
  • First-contact resolution rate (effectiveness)
  • Reopened ticket rate (gaming detection)

Harder to game all simultaneously.


Strategy 3: Communicate the "Why"

Explain the goal behind the metric.

Without "why":

  • Metric feels like arbitrary target
  • Invites gaming
  • Loses meaning

With "why":

  • Metric connected to purpose
  • Gaming feels like betraying mission
  • People focus on goal, not just metric

Example:

Announcement: "We'll track average delivery time."

  • Response: Focus on fast delivery, possibly sacrificing accuracy

Better announcement: "We'll track delivery time because customers need products when promised. Let's aim for fast, reliable delivery."

  • Response: Balance speed with reliability

Strategy 4: Review and Rotate Metrics

Metrics degrade over time as people learn to game them.

Solution:

  • Periodically review whether metrics still predict goals
  • Rotate metrics when gaming becomes problematic
  • Keep people focused on goals, not gaming specific metrics

Example: Rotating quality metrics in manufacturing

  • Year 1: Track defect rate
  • People optimize for defect metric (may hide edge cases)
  • Year 2: Switch to customer return rate
  • Harder to game (real customer impact)
  • Forces focus back on actual quality

Strategy 5: Combine Quantitative and Qualitative

Numbers can be gamed. Stories reveal gaming.

Quantitative: Response time improved 30% Qualitative: Customer feedback reveals issues weren't actually resolved

Together: Reveals that metric improved through gaming, not real improvement.


Negative Consequences of Measurement

Problem 1: Tunnel Vision

Focusing intensely on measured aspects blinds you to unmeasured but critical factors.

Example: Hospital emergency department

Metric: Average wait time

Optimization:

  • Triage faster
  • Start treatment quickly (even if not complete)
  • Move patients through system rapidly

Unmeasured but important:

  • Thoroughness of diagnosis
  • Patient understanding of treatment
  • Post-discharge outcomes

Result: Wait times down, but diagnostic errors and readmissions may increase.


Problem 2: Crowding Out Intrinsic Motivation

External measurement can undermine internal drive.

"Control leads to compliance; autonomy leads to engagement." — Daniel Pink, author of Drive: The Surprising Truth About What Motivates Us

Before measurement:

  • Employees motivated by craft, purpose, autonomy
  • Work quality driven by pride
  • Discretionary effort common

After heavy measurement:

  • Motivation shifts to hitting numbers
  • "Why bother if it's not measured?"
  • Discretionary effort declines

Research finding (Deci & Ryan): Extrinsic rewards and measurement can reduce intrinsic motivation for tasks people previously enjoyed.


Problem 3: Metric Fixation

Mistaking the metric for the goal.

The map becomes the territory.

"Metric fixation is the belief that it is possible and desirable to replace judgment, acquired by personal experience and talent, with numerical indicators of comparative performance based upon standardized data." — Jerry Muller, historian and author of The Tyranny of Metrics

Example: Academic citations

Original purpose: Citations as proxy for research impact

Metric fixation:

  • Researchers optimize for citation count
  • Strategic citation rings
  • Self-citation
  • Publish incrementally (more papers = more citations)

Result: Citation counts rise, but don't reliably indicate true research impact anymore.


Problem 4: Creating Perverse Incentives

"The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor." — Donald Campbell, social scientist, 1979 (now known as Campbell's Law)

Metrics can incentivize opposite of intended behavior.

Example: Surgeon mortality rates

Intent: Measure surgeon skill, improve patient outcomes

Perverse incentive:

  • High-risk patients increase mortality rates
  • Surgeons avoid high-risk patients to protect stats
  • Sickest patients can't find surgeons

Result: Metric meant to improve care creates access barriers for those who most need it.


Managing Measurement's Influence

Principle 1: Accept That Measurement Changes Behavior

Don't pretend measurement is neutral observation.

Instead:

  • Design metrics assuming they'll shape behavior
  • Ask: "If people optimize for this metric, what behavior results?"
  • Choose metrics that incentivize desired behavior

Principle 2: Measure Outcomes, Not Just Outputs

Outputs: Activities (features shipped, calls made) Outcomes: Results (user value, deals closed)

Outputs are easier to game and less aligned with goals. See Designing Useful Measurement Systems for guidance on choosing the right unit of measurement.


Principle 3: Use Metrics to Guide, Not Punish

Measurement for learning vs. measurement for judgment:

Purpose Effect on Behavior
Learning and improvement Honest reporting, problem-solving focus
Punishment and consequences Gaming, hiding problems, risk aversion

When metrics tied to punishment:

  • Underreporting of issues
  • Gaming to avoid consequences
  • Optimization for metric, not goal

When metrics used for learning:

  • Transparent sharing
  • Focus on improvement
  • Less gaming

Principle 4: Maintain Qualitative Understanding

Don't let metrics replace actual understanding.

Balanced approach:

  • Use metrics for scale, trends, patterns
  • Use qual (conversations, observations, stories) for context, mechanisms, edge cases

Metrics without qual: Easy to miss gaming, lose context Qual without metrics: Hard to scale, spot trends, prioritize


Principle 5: Monitor for Unintended Consequences

Regularly ask:

  • Is the metric still predicting the goal?
  • Are people gaming it?
  • What unmeasured factors are suffering?
  • What perverse incentives have emerged?

If measurement creates more problems than it solves, change or eliminate it.


Conclusion: Measurement Is Intervention

Key insight: Measurement is never neutral observation. It's an intervention that changes the system.

Why measurement changes behavior:

  1. Focus: Measured things get attention
  2. Accountability: Metrics create responsibility and evaluation
  3. Feedback: Metrics enable optimization (for better or worse)
  4. Signaling: Measurement communicates priorities

Implications:

Positive potential:

  • Focus attention on what matters
  • Enable learning and improvement
  • Align efforts toward goals
  • Provide feedback for adjustment

Negative risks:

  • Tunnel vision (unmeasured factors neglected)
  • Gaming (optimizing metric appearance, not underlying goal)
  • Crowding out intrinsic motivation
  • Perverse incentives

The path forward:

  • Measure what truly matters (not proxies)
  • Use multiple complementary metrics (resist gaming)
  • Communicate why metrics matter (connect to purpose)
  • Balance metrics with qualitative understanding
  • Monitor for gaming and unintended consequences
  • Accept that measurement shapes behavior—design accordingly

"What gets measured gets managed"—for better or worse.

Design measurement systems assuming they'll change behavior. Because they will.


What Research Shows About Measurement and Behavior

The field examining how measurement changes behavior draws on economics, psychology, sociology, and organizational theory. Several researchers have defined the intellectual landscape with particular precision.

Charles Goodhart identified the core mechanism in 1975 while advising the Bank of England on monetary policy. His observation was empirical rather than theoretical: every time the Bank used a monetary aggregate as a policy target, the statistical relationship between that aggregate and inflation broke down. Commercial banks changed their behavior in response to the target, invalidating the model underlying it. Marilyn Strathern later formalized the generalized principle in 1997: "When a measure becomes a target, it ceases to be a good measure." The mechanism is not malice or stupidity. People respond rationally to what is measured and rewarded. When the measure diverges from the goal, rational optimization of the measure produces irrational outcomes from the perspective of the goal.

Donald Campbell, a social scientist who worked extensively on evaluation methodology, articulated the political dimension in 1979. Campbell's Law states: "The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor." Campbell was describing not just individual gaming but systemic institutional pressure. When high-stakes decisions -- funding, employment, reputation -- are tied to a metric, organizations develop cultures oriented toward the metric rather than the underlying goal. This is not a rare failure; Campbell argued it is the predictable default whenever measurement is high-stakes.

Jerry Muller's 2018 book The Tyranny of Metrics provided the most comprehensive empirical survey of Goodhart and Campbell dynamics across multiple sectors: education, healthcare, policing, universities, military, and business. Muller's central finding was that metric fixation -- the institutional belief that quantitative measurement of performance can substitute for substantive judgment -- consistently produces the same pattern: metrics improve on paper while genuine performance stagnates or declines, and the organizations most damaged are those serving populations who cannot exit or complain effectively: students, patients, prisoners.

Elton Mayo and the Hawthorne studies (1924-1932) established the foundational empirical demonstration that observation itself changes behavior. The original finding -- that productivity improved whether lighting was increased or decreased -- has been subject to significant reanalysis since. Later researchers including Richard Franke and James Kaul (1978) argued that the productivity improvements were driven partly by the presence of attentive management and partly by the novelty effect of being studied. Contemporary meta-analyses are more skeptical of the dramatic original claims. The robust finding that survives scrutiny is narrower but still important: people alter their behavior when they know they are being observed, and the direction of the alteration is typically toward what they perceive the observer wants to see. This is consequential for any measurement system: the measurement process is not passive data collection but an active intervention that reshapes what it purports to observe.

W. Edwards Deming brought these dynamics into management practice most influentially. Deming's 14 Points for Management, developed from his work transforming Japanese manufacturing quality after World War II, include explicit warnings against management by numbers alone. Point 11 calls for the elimination of numerical quotas, arguing that quotas focus on numbers rather than quality and lead to exactly the gaming and moral shortcuts that Goodhart and Campbell described. Deming's alternative was management by understanding processes -- using measurement to understand variation and improve systems rather than to evaluate and rank individuals. His approach treated metrics as diagnostic tools for system improvement, not as targets whose achievement proves performance.


Real-World Case Studies: Measurement Changing Behavior

Soviet production quotas and the nail factory. Soviet industrial planning provides the most documented historical example of measurement driving dysfunctional behavior at scale. When factories were assigned production quotas measured by weight (tons of nails produced), they manufactured large, heavy, useless nails that maximized the metric while violating the purpose. When planners switched to quotas measured by unit count, factories switched to tiny, useless nails. The pattern repeated across sectors: shoe factories assigned quotas by number of pairs produced shoes that were all the same size; glass factories assigned quotas by weight produced glass that was impractically thick. The command economy was not uniquely susceptible to this problem -- the same dynamic appears in any system where people are evaluated on proxies rather than outcomes. The Soviet example is simply unusually well-documented and unusually extreme because the feedback loops that normally limit dysfunction (market prices, customer complaints, competition) were absent.

NHS ambulance response times. The UK's National Health Service adopted a target in the 1990s that ambulances should respond to life-threatening emergencies within 8 minutes. The target was genuine and important -- response time is a real predictor of survival in cardiac arrest and major trauma. As the target became high-stakes, behavioral adaptations emerged. First responders on motorcycles were dispatched to "stop the clock" when they arrived at a scene, even though they lacked the equipment to provide definitive treatment. Dispatchers developed sophisticated practices for categorizing calls as non-urgent (longer target times) rather than urgent. When researchers at the Audit Commission investigated in the 2000s, they found that the metric had improved substantially while patient outcomes data showed more ambiguous results. The behavior change measurement induced was real -- it just was not always the behavior the measurement was intended to induce.

Teacher performance pay and test score gaming. The No Child Left Behind Act (2001) in the United States tied school funding to standardized test score performance, creating exactly the high-stakes measurement conditions that Campbell's Law predicts will produce corruption. Investigations in Atlanta, Washington D.C., Philadelphia, and other districts documented systematic test score manipulation: in Atlanta, 178 teachers and principals were implicated in erasing and changing student answer sheets. The documented fraud represented only the most extreme response to measurement pressure. The subtler and more widespread response was legitimate but educationally questionable: narrowing curriculum to tested subjects, reducing time on art, music, and physical education, and concentrating teaching resources on "bubble students" who were close to proficiency thresholds. Research by economists Brian Jacob and Steven Levitt (2003) developed statistical methods to detect gaming from the pattern of answer changes and showed it was far more common than official investigations suggested.

Wells Fargo's cross-selling metrics. Between 2002 and 2016, Wells Fargo measured and rewarded employees for the number of financial products each customer held. The strategic goal was to become the "store" for customers' complete financial lives. The measurement was supposed to track progress toward this goal. What it actually tracked was product count -- which could be increased by opening accounts customers neither wanted nor knew about. Approximately 3.5 million accounts were opened fraudulently. The metric improved dramatically while the underlying goal (genuine customer relationship depth) was actively undermined: customers who discovered unauthorized accounts lost trust and closed existing legitimate accounts. Measurement pressure did change behavior, as intended -- but the behavior it changed was not the behavior the goal required.

Google search quality metrics. Google's experience with its search quality metrics provides a more constructive example of managing measurement-behavior dynamics. Google employs teams of "quality raters" who evaluate search results against detailed guidelines, generating data that informs but does not directly determine the algorithm. The company has also documented its experience with optimization metrics: when it optimized directly for click-through rate, users clicked more but reported lower satisfaction. When it optimized for user-reported satisfaction, click-through rates initially declined but long-term engagement improved. The lesson Google drew was that measurement systems require multiple metrics that counterbalance each other, and that the relationship between any single metric and user value requires constant validation.


Evidence-Based Principles for Managing Measurement Effects

Principle 1: Design metrics assuming they will be optimized, not just observed. The naive assumption behind most metric systems is that people will read the number as information and respond to what it represents. The accurate assumption, supported by Goodhart, Campbell, and decades of organizational research, is that people will optimize the number itself. Effective metric design asks: "If someone tried to hit this number without achieving the underlying goal, how would they do it?" Then it builds defenses -- typically through complementary metrics that make gaming one dimension costly on another.

Principle 2: Stakes determine gaming intensity. Campbell's Law is explicitly about high-stakes metrics. Low-stakes measurement -- tracking things out of curiosity or for operational awareness -- generates much less dysfunctional optimization. The implication is that organizations should be selective about which metrics carry consequences. Tying compensation, career advancement, or organizational funding to metrics creates intense pressure that will reshape behavior. This is not always wrong -- sometimes that pressure is exactly what is needed. But it should be a deliberate choice, made with awareness of the distortions it will create.

Principle 3: Qualitative judgment cannot be fully replaced by quantitative metrics. Deming's insistence on judgment alongside measurement, and Muller's historical documentation of what happens when institutions try to replace judgment with metrics, converge on the same point. Measurement is useful for scale, trend identification, and systematic comparison. It is poor at capturing context, quality, novelty, and the subtle judgment calls that determine whether work is genuinely valuable. Measurement systems that eliminate qualitative assessment in favor of pure quantification consistently produce gaming and metric fixation. The most functional systems use measurement to structure judgment, not to replace it.

Principle 4: Explain the purpose behind the metric. Research on intrinsic motivation (Deci and Ryan, 2000) shows that people respond differently to metrics depending on whether they understand and endorse the underlying goal. When people understand why a metric matters -- what customer outcome it represents, what problem it tracks -- gaming feels like a violation of purpose rather than a rational response to incentive. When the metric is presented as an arbitrary target with consequences, gaming feels like rational self-protection. The practical implication is that metric communication should always include the causal theory: this metric predicts this outcome because of this mechanism.


References

  1. Mayo, E. (1933). The Human Problems of an Industrial Civilization. Macmillan.

  2. Roethlisberger, F. J., & Dickson, W. J. (1939). Management and the Worker: An Account of a Research Program Conducted by the Western Electric Company, Hawthorne Works, Chicago. Harvard University Press.

  3. Goodhart, C. (1975). "Problems of Monetary Management: The U.K. Experience." Papers in Monetary Economics (Reserve Bank of Australia).

  4. Campbell, D. T. (1979). "Assessing the Impact of Planned Social Change." Evaluation and Program Planning, 2(1), 67–90.

  5. Deci, E. L., & Ryan, R. M. (2000). "The 'What' and 'Why' of Goal Pursuits: Human Needs and the Self-Determination of Behavior." Psychological Inquiry, 11(4), 227–268.

  6. Muller, J. Z. (2018). The Tyranny of Metrics. Princeton University Press.

  7. Strathern, M. (1997). "'Improving Ratings': Audit in the British University System." European Review, 5(3), 305–321.

  8. Austin, R. D. (1996). "Measuring and Managing Performance in Organizations." Dorset House.

  9. Kerr, S. (1975). "On the Folly of Rewarding A, While Hoping for B." Academy of Management Journal, 18(4), 769–783.

  10. Ridgway, V. F. (1956). "Dysfunctional Consequences of Performance Measurements." Administrative Science Quarterly, 1(2), 240–247.

  11. Pink, D. H. (2009). Drive: The Surprising Truth About What Motivates Us. Riverhead Books.

  12. Kohn, A. (1999). Punished by Rewards: The Trouble with Gold Stars, Incentive Plans, A's, Praise, and Other Bribes. Houghton Mifflin.

  13. Seddon, J. (2008). Systems Thinking in the Public Sector: The Failure of the Reform Regime...and a Manifesto for a Better Way. Triarchy Press.

  14. Levitt, S. D., & Dubner, S. J. (2005). Freakonomics: A Rogue Economist Explores the Hidden Side of Everything. William Morrow.

  15. Power, M. (1997). The Audit Society: Rituals of Verification. Oxford University Press.


About This Series: This article is part of a larger exploration of measurement, metrics, and evaluation. For related concepts, see [Why Metrics Often Mislead], [Goodhart's Law Breaks Metrics], [Designing Useful Measurement Systems], and [Vanity Metrics vs Meaningful Metrics].

Frequently Asked Questions

Why does measurement change behavior?

People alter behavior when they know they're being measured—through attention focus, accountability, feedback loops, and signaling priorities.

What is the Hawthorne effect?

The Hawthorne effect is when people change behavior simply because they're being observed or studied, independent of the intervention.

What is the observer effect?

The observer effect is when measurement itself changes what's being measured—the act of observing alters the phenomenon.

Is behavior change from measurement good or bad?

Depends. Good if it improves aligned behavior, bad if it creates gaming, distortion, or focus on metrics over real goals.

What does 'what gets measured gets managed' mean?

Measurement signals importance and creates accountability, naturally directing attention and effort toward measured areas.

Can you measure without changing behavior?

Rarely if people know about it. Hidden measurement has ethical issues. Better to accept influence and design measurement thoughtfully.

How do you use measurement's influence positively?

Measure what truly matters, communicate why it matters, use metrics to guide not punish, and balance multiple complementary measures.

What's the danger of measurement changing behavior?

People may optimize for metrics rather than real goals, game the system, or focus narrowly on measured aspects while ignoring unmeasured but important factors.