How Metrics Influence Behavior: The Psychology of Measurement, Why People Optimize for What's Measured, How Goodhart's Law Corrupts Targets, and What Makes Metrics Drive Good Behavior Instead of Gaming
In 2016, Wells Fargo, one of the largest banks in the United States, was fined $185 million after regulators discovered that bank employees had opened approximately 3.5 million unauthorized accounts in customers' names. Employees had created fake email addresses, forged signatures, and secretly moved customer money between accounts--all without customer knowledge or consent.
The employees who committed this fraud were not career criminals. They were ordinary bank tellers and personal bankers, many of them earning $12 to $15 per hour. They committed fraud because Wells Fargo had implemented a metric-driven management system that tracked the number of financial products "sold" to each customer--a metric the company called "cross-selling." The target was eight products per household, an internal goal that CEO John Stumpf championed as "Going for Gr-eight." Employees who met their cross-selling targets received bonuses, recognition, and job security. Employees who missed their targets were subjected to intense pressure from managers, public humiliation in team meetings, and ultimately termination.
The metric--products per household--was supposed to measure customer engagement. The theory was that customers who used more Wells Fargo products (checking account, savings account, credit card, mortgage, investment account) were more engaged, more satisfied, and more profitable. If employees could get each customer to eight products, the bank's revenue would grow substantially.
The metric did not measure customer engagement. It measured the number of accounts associated with a customer's name. And when the metric became a high-stakes target with powerful rewards and punishments attached, employees found the easiest way to hit the target: create accounts that customers had not requested and did not know about. The metric optimized beautifully--products per household rose steadily. The underlying goal--genuine customer engagement--was destroyed as millions of customers were defrauded by their own bank.
The Wells Fargo scandal is the most vivid recent illustration of a phenomenon that has been observed, studied, and documented for decades: metrics change behavior. They change behavior predictably, powerfully, and often in ways that the metric designers did not intend and would not have wanted. Understanding how and why metrics influence behavior is essential for anyone who designs measurement systems, sets targets, manages performance, or works in an environment where metrics are used to evaluate and reward.
Why Do Metrics Influence Behavior?
Why do metrics influence behavior? People optimize for what is measured because metrics signal what the organization values, metrics create accountability by making performance visible, and metrics are often tied to rewards (bonuses, promotions, recognition) or consequences (warnings, demotions, termination). When you measure something and attach consequences to it, you are not merely observing behavior--you are shaping it.
The Signaling Effect
The act of choosing to measure something sends a powerful signal about what matters. When an organization decides to track customer satisfaction scores, it signals that customer satisfaction matters. When it decides to track lines of code written, it signals that code output matters. When it decides to track hours worked, it signals that presence matters. These signals are interpreted by employees as guidance about what the organization actually values--regardless of what the organization says it values in mission statements and company meetings.
This signaling effect operates even when there are no explicit rewards or punishments attached to the metric. The mere act of measurement creates awareness, and awareness changes behavior. This is a variant of the Hawthorne Effect, first observed in the 1920s and 1930s at the Western Electric Hawthorne Works factory in Cicero, Illinois. Researchers studying the relationship between lighting conditions and worker productivity discovered that productivity improved whenever any change was made to working conditions--including changes that should have reduced productivity. The workers' behavior changed not because of the environmental change itself but because they were aware of being observed and measured.
The Hawthorne Effect has been debated and refined in the decades since, but the core insight remains robust: measurement changes behavior through the mechanism of attention. What gets measured gets attended to, and what gets attended to gets effort.
The Accountability Effect
Metrics make performance visible, and visibility creates accountability. When a salesperson's conversion rate is displayed on a dashboard that their manager reviews weekly, the salesperson is accountable for that number in a way they would not be if the number were not tracked. The dashboard transforms conversion rate from something the salesperson might think about occasionally into something they think about constantly.
This accountability can be productive when the metric accurately represents the desired outcome. A surgeon whose post-operative infection rate is tracked and reported will pay more attention to sterile technique. A factory whose defect rate is monitored will invest more effort in quality control. A customer service team whose first-call resolution rate is visible will work harder to resolve issues without callbacks.
But accountability can be destructive when the metric does not accurately represent the desired outcome. The surgeon whose mortality rate is published may avoid operating on the sickest patients--who are most likely to die regardless of surgical quality--to keep their numbers looking good. The factory whose defect rate is monitored may reclassify borderline defects as acceptable rather than genuinely improving quality. The customer service team whose call duration is monitored may rush callers off the phone rather than genuinely resolving their issues.
The accountability effect is amplified enormously when metrics are tied to formal consequences. A metric that is tracked but has no impact on compensation, promotion, or job security influences behavior moderately through the signaling and attention mechanisms. A metric that is tied to a bonus influences behavior strongly. A metric that is tied to continued employment influences behavior overwhelmingly--as the Wells Fargo case demonstrates.
The Crowding Out Effect
One of the most counterintuitive findings in behavioral economics is that external measurement and rewards can destroy intrinsic motivation. When people are intrinsically motivated to do something--they find it interesting, meaningful, or satisfying in itself--adding external metrics and rewards can actually reduce their motivation and performance.
This phenomenon, known as the crowding out effect or overjustification effect, was demonstrated in a classic experiment by psychologist Edward Deci in 1971. Deci gave college students a puzzle to solve. Students who were paid to solve the puzzle spent less time working on it during a free-choice period (when they could do anything they wanted) than students who were not paid. The payment transformed the puzzle from an intrinsically enjoyable activity into work, and when the payment stopped, so did the motivation.
Can metrics destroy motivation? Yes--external measurement can undermine intrinsic motivation, especially for complex or creative work that requires autonomy, mastery, and purpose. Daniel Pink's synthesis of motivation research in Drive identifies three conditions for intrinsic motivation: autonomy (control over your work), mastery (the feeling of improving), and purpose (connection to something meaningful). Metrics that reduce autonomy (by dictating exactly what to optimize), that measure outputs rather than growth (counting widgets rather than recognizing skill development), or that disconnect work from purpose (reducing meaningful activity to a number) can systematically destroy the intrinsic motivation that drives the highest-quality work.
This does not mean that metrics should never be used for creative or complex work. It means that the design of metrics for such work must be extraordinarily careful, preserving the conditions for intrinsic motivation while providing the information needed for organizational learning and improvement.
What Is Goodhart's Law?
What is Goodhart's Law? "When a measure becomes a target, it ceases to be a good measure." This principle, originally articulated by British economist Charles Goodhart in the context of monetary policy in 1975 and later generalized by anthropologist Marilyn Strathern, describes the most fundamental and pervasive failure mode of metric-driven management.
How Goodhart's Law Works in Practice
Goodhart's Law operates through a predictable mechanism:
A metric is chosen that correlates with a desired outcome. Customer satisfaction scores correlate with customer loyalty. Lines of code correlate with software output. Test scores correlate with student learning. Number of products per household correlates with customer engagement.
The metric is made a target. Management tells employees: increase customer satisfaction scores. Increase lines of code. Increase test scores. Increase products per household.
People optimize for the metric rather than for the underlying outcome. They find ways to increase the number without increasing the thing the number was supposed to measure. They give customers discounts for completing satisfaction surveys (inflating scores without improving actual satisfaction). They write verbose, repetitive code (inflating line counts without improving software quality). They teach to the test (inflating scores without deepening learning). They open unauthorized accounts (inflating product counts without increasing engagement).
The correlation between the metric and the outcome breaks. Once the metric is being gamed, it no longer measures what it was originally supposed to measure. The customer satisfaction scores are high, but customers are not actually satisfied. The line counts are high, but the software is not actually better. The test scores are high, but the students have not actually learned. The product counts are high, but customers are being defrauded.
Examples of Goodhart's Law Across Domains
Education: Teaching to the test. The No Child Left Behind Act in the United States tied school funding to standardized test scores in reading and mathematics. Schools responded by increasing instructional time for tested subjects and reducing time for untested subjects (science, social studies, art, music, physical education). Within tested subjects, instruction narrowed to focus on the specific skills and question formats that appeared on the tests. Test scores improved; the breadth and depth of education declined. The metric (test scores) was optimized; the goal (educated students) was undermined.
Research by Daniel Koretz, published in The Testing Charade, documented systematic score inflation across states: test scores rose dramatically on state-specific assessments while showing little or no improvement on independent measures like the National Assessment of Educational Progress (NAEP). The state tests were being gamed; the underlying learning was not improving.
Healthcare: Surgical report cards. In the 1990s, New York State began publishing cardiac surgery mortality rates by hospital and by individual surgeon, with the goal of improving surgical quality through transparency and accountability. The transparency did produce some quality improvements--surgeons paid more attention to surgical technique and post-operative care. But it also produced a predictable Goodhart's Law effect: surgeons began avoiding high-risk patients who were most likely to die, because operating on these patients would worsen the surgeon's published mortality rate.
A study by Dranove, Kessler, McClellan, and Satterthwaite published in the Journal of Political Economy found that the report cards produced a net negative health effect: while average surgical outcomes improved (because surgeons tried harder), overall cardiac mortality in New York actually increased because the sickest patients--who needed surgery most--were being denied it. Doctors were optimizing their metric (reported mortality rate) at the expense of the goal the metric was supposed to serve (saving the most lives).
Policing: Crime statistics gaming. When police departments are evaluated based on crime statistics--particularly the number of reported crimes in their jurisdiction--officers face an incentive to reduce reported crime rather than reduce actual crime. Techniques include downgrading felonies to misdemeanors (reporting a burglary as "trespassing"), discouraging victims from filing reports, or reclassifying crimes to categories that are not tracked. The CompStat system in New York City, which used crime statistics to hold precinct commanders accountable, produced documented cases of statistic manipulation in multiple precincts.
Software development: Lines of code. When programmers are evaluated by lines of code produced, they write more code--but not necessarily better code. Verbose solutions replace elegant ones. Copy-paste replaces refactoring. Code that should be deleted (because it is no longer needed) is kept (because removing it would reduce the line count). The metric incentivizes quantity at the expense of quality, maintainability, and simplicity.
IBM reportedly discovered this effect decades ago and tried measuring programmers by the number of bugs they fixed. Programmers promptly began writing buggier code so they would have more bugs to fix. The metric was abandoned.
Call centers: Average handle time. When call center agents are evaluated by average handle time (the average duration of customer calls), they develop strategies to minimize call duration: transferring customers to other departments unnecessarily (ending their own call timer), hanging up on difficult callers (reducing average handle time), and rushing through interactions without resolving the customer's actual problem (ending the call quickly). The metric is optimized; customer satisfaction degrades.
How Do Metrics Create Unintended Consequences?
How do metrics create unintended consequences? Metrics focus attention on what is measured at the expense of what is unmeasured, enable gaming when rewards are attached, encourage short-term thinking when metrics are evaluated over short time horizons, and may destroy intrinsic motivation for work that was previously meaningful in itself.
The Attention Narrowing Effect
Every metric focuses attention, and focused attention necessarily means unfocused inattention. When a salesperson is measured on revenue, they focus on closing deals and may neglect customer relationships, product feedback, or team collaboration--activities that are valuable but unmeasured. When a teacher is measured on test scores, they focus on tested content and may neglect creativity, critical thinking, social development, and intellectual curiosity--outcomes that are valuable but unmeasured.
This narrowing effect is most damaging when the unmeasured dimensions are the most important ones. In many domains, the most important outcomes are the hardest to measure: trust, culture, innovation, long-term customer loyalty, employee development, ethical behavior. When organizations measure what is easy to measure (revenue, output, speed) and leave the important-but-hard-to-measure dimensions unmeasured, the easy-to-measure dimensions improve while the important dimensions deteriorate.
V.F. Ridgway identified this pattern in his 1956 paper "Dysfunctional Consequences of Performance Measurements," one of the earliest systematic analyses of how metrics distort behavior. Ridgway concluded that "quantitative measures of performance are tools, and are undoubtedly useful. But research indicates that indiscriminate use and undue confidence and reliance in them result from failure to recognize their limitations."
The Short-Termism Effect
Metrics evaluated over short time periods (weekly, monthly, quarterly) create incentives to optimize for short-term results at the expense of long-term outcomes. A salesperson measured on quarterly revenue may push customers into purchases they will regret (generating short-term revenue but long-term churn). A manager measured on quarterly costs may defer maintenance and training (reducing short-term costs but creating long-term problems). A CEO measured on quarterly earnings may cut research and development (boosting short-term profits but undermining long-term competitiveness).
This short-termism effect is particularly severe in publicly traded companies, where quarterly earnings reports drive stock prices and executive compensation. Research by Graham, Harvey, and Campbell found that 78 percent of CFOs admitted to taking actions that sacrificed long-term value to meet short-term earnings targets. The quarterly earnings metric--and the stock price consequences attached to it--systematically incentivizes destruction of long-term value in pursuit of short-term numbers.
The Gaming Taxonomy
Why do people game metrics? When metrics are tied to rewards, punishment, or status, gaming the metric is often easier than improving the underlying performance the metric is supposed to represent. Gaming takes several forms:
Cherry-picking: Selecting which cases to include in the metric to make the number look better. Surgeons who avoid high-risk patients. Schools that exclude low-performing students from tests. Fund managers who close unsuccessful funds and market only the survivors.
Threshold manipulation: Doing just enough to cross a target threshold without genuine improvement. Sales teams that pull revenue forward from next quarter to meet this quarter's target. Students who study only to the level needed for a passing grade rather than for understanding.
Definition shifting: Changing how inputs to the metric are categorized to improve the number without changing reality. Hospitals that reclassify patient deaths as "comfort care transitions." Police departments that downgrade crimes to less serious categories.
Output substitution: Producing what the metric measures instead of what the metric was intended to represent. Teachers who teach test-taking strategies instead of subject knowledge. Customer service agents who rush calls to reduce handle time instead of solving customer problems.
Effort reallocation: Shifting effort from unmeasured but valuable activities to measured activities. Researchers who prioritize publishing more papers (measured) over doing more rigorous research (unmeasured). Employees who prioritize visible, measurable activities over behind-the-scenes work that is equally important but harder to quantify.
What's the Difference Between Measurement and Targets?
What's the difference between measurement and targets? Measurement is observation--tracking a number to understand what is happening. Targets create pressure--specifying what the number should be and attaching consequences to achieving or missing it. Setting targets changes behavior far more than merely measuring, because targets activate the reward and punishment mechanisms that drive gaming and optimization.
A hospital that measures its surgical infection rate is observing reality. A hospital that sets a target of zero infections and penalizes surgeons who have infections is creating incentives that may produce actual quality improvement--but may also produce under-reporting of infections, reclassification of infections as non-infections, and avoidance of high-risk surgeries.
The distinction matters enormously for metric design. Measurement is relatively safe--the distortions it introduces (Hawthorne Effect, attention narrowing) are moderate. Targets with consequences are powerful but dangerous--the distortions they introduce (gaming, cherry-picking, short-termism, motivation crowding) can be severe enough to make the metric counterproductive.
W. Edwards Deming, the quality management pioneer, argued against numerical targets for exactly this reason. In his Fourteen Points for Management, Deming advocated "Eliminate numerical quotas for the workforce and numerical goals for management." His reasoning: targets without methods produce gaming and distortion. "If you have a stable system, then there is no use to specify a goal. You will get whatever the system will deliver. A goal beyond the capability of the system will not be reached."
Deming did not oppose measurement. He opposed the attachment of targets and consequences to measurements without simultaneously providing the methods, resources, and system changes needed to achieve the targets legitimately. The Wells Fargo scandal is a perfect illustration of Deming's concern: the target (eight products per household) was set without the methods (genuine customer engagement strategies) needed to achieve it legitimately, so employees achieved the target illegitimately.
How Do You Prevent Metrics Gaming?
How do you prevent metrics gaming? Complete prevention is impossible--any metric that matters enough to create incentives will attract gaming. But gaming can be significantly reduced through thoughtful metric design and organizational practices:
Use Multiple Balanced Metrics
A single metric is easily gamed because optimizing one number is straightforward. Multiple metrics that balance each other are much harder to game because improving one at the expense of another does not produce an overall improvement.
Example: Instead of measuring customer service agents solely on call duration, measure a balanced set: call duration, customer satisfaction score, first-call resolution rate, and quality audit scores. An agent who rushes calls (gaming duration) will see their satisfaction and resolution scores decline. An agent who chats endlessly with customers (gaming satisfaction) will see their duration increase. The balanced set makes it difficult to game any individual metric without exposure on the others.
Robert Kaplan and David Norton's Balanced Scorecard framework is explicitly designed to address this problem. The Balanced Scorecard requires organizations to track metrics across four perspectives: financial, customer, internal processes, and learning and growth. The balance prevents over-optimization of any single perspective at the expense of the others.
Maintain Human Judgment Alongside Metrics
Metrics should inform human judgment, not replace it. A manager who evaluates employee performance should consider metrics as one input among many--alongside direct observation, peer feedback, qualitative assessment, and contextual understanding. When metrics are the sole basis for evaluation, gaming becomes the rational strategy because metrics are the only thing that matters.
Monitor for Manipulation
Actively look for signs of gaming: sudden improvements in a metric without corresponding improvements in related metrics, unusual patterns in data that suggest manipulation, or complaints from customers or other stakeholders that contradict what the metrics say.
Example: If reported crime drops 20 percent in a police precinct but citizen complaints about crime increase, the divergence suggests that crime statistics are being manipulated rather than that crime is actually declining.
Iterate Metric Design
Metrics should be treated as hypotheses about what matters, not as permanent truth. When gaming is detected, change the metric. When unintended consequences emerge, adjust the measurement system. The metric designers should expect gaming and plan for iterative improvement rather than treating the initial metric design as final.
What Makes Metrics Drive Good Behavior?
What makes metrics drive good behavior? Metrics drive good behavior when they are aligned with actual goals (not just proxies), when they are hard to game (measuring outcomes rather than easily manipulated outputs), when they are balanced with other metrics and with human judgment, and when they are treated as information rather than as targets with punitive consequences.
Align Metrics with Outcomes, Not Activities
The most robust metrics measure outcomes (what actually happened) rather than activities (what people did). Measuring patient health outcomes is harder to game than measuring number of procedures performed. Measuring customer lifetime value is harder to game than measuring number of sales closed. Measuring software reliability in production is harder to game than measuring lines of code written.
Outcome metrics are harder to game because the outcome is what you actually care about. If someone figures out how to improve the outcome metric, they have, by definition, improved the outcome--which is exactly what you wanted. Activity metrics can be improved without improving the outcome, which is why they are susceptible to gaming.
Use Leading and Lagging Indicators Together
Leading indicators predict future outcomes (pipeline volume predicts future revenue). Lagging indicators confirm past outcomes (quarterly revenue confirms whether the business performed). Using both together creates a measurement system that is harder to game because manipulating a leading indicator without producing the expected lagging result raises a red flag, and manipulating a lagging indicator is harder because it measures an actual outcome rather than a predicted one.
Create Psychological Safety Around Metrics
When metrics are used punitively--as weapons to identify and punish underperformers--people game them defensively. When metrics are used informationally--as tools to understand what is happening and identify opportunities for improvement--people engage with them honestly.
Amy Edmondson's research on psychological safety demonstrates that teams where people feel safe raising concerns, admitting mistakes, and discussing problems honestly produce better outcomes than teams where people feel threatened. The same principle applies to metrics: an environment where a declining metric triggers collaborative problem-solving produces better outcomes than an environment where a declining metric triggers blame and punishment.
Design for the Gaming You'll Get
Accept that some gaming will occur and design metrics that make the most productive forms of gaming the easiest path. If you measure customer satisfaction and employees discover that being friendlier to customers improves the score, that is productive gaming--the gaming produces the behavior you actually want. If you measure customer satisfaction and employees discover that offering unauthorized discounts improves the score, that is destructive gaming--the gaming produces behavior you do not want. The design challenge is creating metrics where the easiest way to improve the number is to genuinely improve the underlying outcome.
The fundamental insight about metrics and behavior is this: you are not choosing whether to influence behavior. You are choosing how to influence it. Any metric you implement will change behavior. Any target you set will create incentives. Any reward you attach to a number will motivate optimization of that number. The question is not whether these effects will occur--they will, with the regularity of a natural law--but whether the behavior the metrics encourage is the behavior you actually want. Getting that right requires understanding not just what to measure but how measurement itself reshapes the thing being measured.
References and Further Reading
Goodhart, C.A.E. (1984). "Problems of Monetary Management: The U.K. Experience." In Monetary Theory and Practice. Palgrave. https://en.wikipedia.org/wiki/Goodhart%27s_law
Muller, J.Z. (2018). The Tyranny of Metrics. Princeton University Press. https://press.princeton.edu/books/hardcover/9780691174952/the-tyranny-of-metrics
Kerr, S. (1975). "On the Folly of Rewarding A, While Hoping for B." Academy of Management Journal, 18(4), 769-783. https://doi.org/10.2307/255378
Koretz, D. (2017). The Testing Charade: Pretending to Make Schools Better. University of Chicago Press. https://press.uchicago.edu/ucp/books/book/chicago/T/bo27209989.html
Dranove, D., Kessler, D., McClellan, M. & Satterthwaite, M. (2003). "Is More Information Better? The Effects of 'Report Cards' on Health Care Providers." Journal of Political Economy, 111(3), 555-588. https://doi.org/10.1086/374180
Deci, E.L. (1971). "Effects of Externally Mediated Rewards on Intrinsic Motivation." Journal of Personality and Social Psychology, 18(1), 105-115. https://doi.org/10.1037/h0030644
Pink, D.H. (2009). Drive: The Surprising Truth About What Motivates Us. Riverhead Books. https://www.danpink.com/books/drive/
Kaplan, R.S. & Norton, D.P. (1996). The Balanced Scorecard: Translating Strategy into Action. Harvard Business School Press. https://hbr.org/1992/01/the-balanced-scorecard-measures-that-drive-performance-2
Deming, W.E. (1986). Out of the Crisis. MIT Press. https://en.wikipedia.org/wiki/Out_of_the_Crisis
Ridgway, V.F. (1956). "Dysfunctional Consequences of Performance Measurements." Administrative Science Quarterly, 1(2), 240-247. https://doi.org/10.2307/2390989
Graham, J.R., Harvey, C.R. & Rajgopal, S. (2005). "The Economic Implications of Corporate Financial Reporting." Journal of Accounting and Economics, 40(1-3), 3-73. https://doi.org/10.1016/j.jacceco.2005.01.002
Edmondson, A.C. (2018). The Fearless Organization. Wiley. https://fearlessorganization.com/
Campbell, D.T. (1979). "Assessing the Impact of Planned Social Change." Evaluation and Program Planning, 2(1), 67-90. https://doi.org/10.1016/0149-7189(79)90048-X
Stumpf, J. (2016). Testimony before United States Senate Committee on Banking, Housing, and Urban Affairs. https://www.banking.senate.gov/
Strathern, M. (1997). "'Improving Ratings': Audit in the British University System." European Review, 5(3), 305-321. https://doi.org/10.1002/(SICI)1234-981X(199707)5:3<305::AID-EURO184>3.0.CO;2-4