In 2016, Wells Fargo, one of the largest banks in the United States, was fined $185 million after regulators discovered that bank employees had opened approximately 3.5 million unauthorized accounts in customers' names. Employees had created fake email addresses, forged signatures, and secretly moved customer money between accounts--all without customer knowledge or consent.
The employees who committed this fraud were not career criminals. They were ordinary bank tellers and personal bankers, many of them earning $12 to $15 per hour. They committed fraud because Wells Fargo had implemented a metric-driven management system that tracked the number of financial products "sold" to each customer--a metric the company called "cross-selling." The target was eight products per household, an internal goal that CEO John Stumpf championed as "Going for Gr-eight." Employees who met their cross-selling targets received bonuses, recognition, and job security. Employees who missed their targets were subjected to intense pressure from managers, public humiliation in team meetings, and ultimately termination.
The metric--products per household--was supposed to measure customer engagement. The theory was that customers who used more Wells Fargo products (checking account, savings account, credit card, mortgage, investment account) were more engaged, more satisfied, and more profitable. If employees could get each customer to eight products, the bank's revenue would grow substantially.
The metric did not measure customer engagement. It measured the number of accounts associated with a customer's name. And when the metric became a high-stakes target with powerful rewards and punishments attached, employees found the easiest way to hit the target: create accounts that customers had not requested and did not know about. The metric optimized beautifully--products per household rose steadily. The underlying goal--genuine customer engagement--was destroyed as millions of customers were defrauded by their own bank.
The Wells Fargo scandal is the most vivid recent illustration of a phenomenon that has been observed, studied, and documented for decades: metrics change behavior. They change behavior predictably, powerfully, and often in ways that the metric designers did not intend and would not have wanted. Understanding how and why metrics influence behavior is essential for anyone who designs measurement systems, sets targets, manages performance, or works in an environment where metrics are used to evaluate and reward.
Why Do Metrics Influence Behavior?
Why do metrics influence behavior? People optimize for what is measured because metrics signal what the organization values, metrics create accountability by making performance visible, and metrics are often tied to rewards (bonuses, promotions, recognition) or consequences (warnings, demotions, termination). When you measure something and attach consequences to it, you are not merely observing behavior--you are shaping it.
The Signaling Effect
The act of choosing to measure something sends a powerful signal about what matters. When an organization decides to track customer satisfaction scores, it signals that customer satisfaction matters. When it decides to track lines of code written, it signals that code output matters. When it decides to track hours worked, it signals that presence matters. These signals are interpreted by employees as guidance about what the organization actually values--regardless of what the organization says it values in mission statements and company meetings.
"What gets measured gets managed." -- Peter Drucker
This signaling effect operates even when there are no explicit rewards or punishments attached to the metric. The mere act of measurement creates awareness, and awareness changes behavior. This is a variant of the Hawthorne Effect, first observed in the 1920s and 1930s at the Western Electric Hawthorne Works factory in Cicero, Illinois. Researchers studying the relationship between lighting conditions and worker productivity discovered that productivity improved whenever any change was made to working conditions--including changes that should have reduced productivity. The workers' behavior changed not because of the environmental change itself but because they were aware of being observed and measured.
The Hawthorne Effect has been debated and refined in the decades since, but the core insight remains robust: measurement changes behavior through the mechanism of attention. What gets measured gets attended to, and what gets attended to gets effort.
The Accountability Effect
Metrics make performance visible, and visibility creates accountability. When a salesperson's conversion rate is displayed on a dashboard that their manager reviews weekly, the salesperson is accountable for that number in a way they would not be if the number were not tracked. The dashboard transforms conversion rate from something the salesperson might think about occasionally into something they think about constantly.
This accountability can be productive when the metric accurately represents the desired outcome. A surgeon whose post-operative infection rate is tracked and reported will pay more attention to sterile technique. A factory whose defect rate is monitored will invest more effort in quality control. A customer service team whose first-call resolution rate is visible will work harder to resolve issues without callbacks.
But accountability can be destructive when the metric does not accurately represent the desired outcome. The surgeon whose mortality rate is published may avoid operating on the sickest patients--who are most likely to die regardless of surgical quality--to keep their numbers looking good. The factory whose defect rate is monitored may reclassify borderline defects as acceptable rather than genuinely improving quality. The customer service team whose call duration is monitored may rush callers off the phone rather than genuinely resolving their issues.
The accountability effect is amplified enormously when metrics are tied to formal consequences. A metric that is tracked but has no impact on compensation, promotion, or job security influences behavior moderately through the signaling and attention mechanisms. A metric that is tied to a bonus influences behavior strongly. A metric that is tied to continued employment influences behavior overwhelmingly--as the Wells Fargo case demonstrates.
The Crowding Out Effect
One of the most counterintuitive findings in behavioral economics is that external measurement and rewards can destroy intrinsic motivation. When people are intrinsically motivated to do something--they find it interesting, meaningful, or satisfying in itself--adding external metrics and rewards can actually reduce their motivation and performance.
"The more a human behavior is controlled by an external event, the less it will be performed when the external event is removed." -- Edward Deci
This phenomenon, known as the crowding out effect or overjustification effect, was demonstrated in a classic experiment by psychologist Edward Deci in 1971. Deci gave college students a puzzle to solve. Students who were paid to solve the puzzle spent less time working on it during a free-choice period (when they could do anything they wanted) than students who were not paid. The payment transformed the puzzle from an intrinsically enjoyable activity into work, and when the payment stopped, so did the motivation.
Can metrics destroy motivation? Yes--external measurement can undermine intrinsic motivation, especially for complex or creative work that requires autonomy, mastery, and purpose. Daniel Pink's synthesis of motivation research in Drive identifies three conditions for intrinsic motivation: autonomy (control over your work), mastery (the feeling of improving), and purpose (connection to something meaningful). Metrics that reduce autonomy (by dictating exactly what to optimize), that measure outputs rather than growth (counting widgets rather than recognizing skill development), or that disconnect work from purpose (reducing meaningful activity to a number) can systematically destroy the intrinsic motivation that drives the highest-quality work.
This does not mean that metrics should never be used for creative or complex work. It means that the design of metrics for such work must be extraordinarily careful, preserving the conditions for intrinsic motivation while providing the information needed for organizational learning and improvement.
Goodhart's Law in Practice: Domain Examples
| Domain | Intended Goal | Metric Used | Gaming Behavior | Outcome |
|---|---|---|---|---|
| Banking (Wells Fargo) | Customer engagement | Products per household | Unauthorized account creation | 3.5 million fake accounts; $185 million fine |
| Education (No Child Left Behind) | Student learning | Standardized test scores | Teaching to the test; excluding struggling students | Scores rose; NAEP showed little real improvement |
| Healthcare (NY cardiac surgery) | Save most lives | Surgeon mortality rate | Avoiding high-risk patients | Overall cardiac mortality increased |
| Policing | Reduce crime | Reported crime statistics | Downgrading felonies; discouraging reports | Statistics improved; actual crime unchanged |
| Software development | Code quality | Lines of code | Verbose, repetitive code; avoiding deletion | Inflated output; worse maintainability |
| Academic publishing | Research quality | Citation count / h-index | Citation cartels; self-citation | Metric inflation without scientific impact |
| Soviet manufacturing | Economic output | Production quota (units) | Large quantities of small useless items | Chronic shortage of useful goods |
What Is Goodhart's Law?
What is Goodhart's Law? "When a measure becomes a target, it ceases to be a good measure." This principle, originally articulated by British economist Charles Goodhart in the context of monetary policy in 1975 and later generalized by anthropologist Marilyn Strathern, describes the most fundamental and pervasive failure mode of metric-driven management.
"When a measure becomes a target, it ceases to be a good measure." -- Charles Goodhart
How Goodhart's Law Works in Practice
Goodhart's Law operates through a predictable mechanism:
A metric is chosen that correlates with a desired outcome. Customer satisfaction scores correlate with customer loyalty. Lines of code correlate with software output. Test scores correlate with student learning. Number of products per household correlates with customer engagement.
The metric is made a target. Management tells employees: increase customer satisfaction scores. Increase lines of code. Increase test scores. Increase products per household.
People optimize for the metric rather than for the underlying outcome. They find ways to increase the number without increasing the thing the number was supposed to measure. They give customers discounts for completing satisfaction surveys (inflating scores without improving actual satisfaction). They write verbose, repetitive code (inflating line counts without improving software quality). They teach to the test (inflating scores without deepening learning). They open unauthorized accounts (inflating product counts without increasing engagement).
The correlation between the metric and the outcome breaks. Once the metric is being gamed, it no longer measures what it was originally supposed to measure. The customer satisfaction scores are high, but customers are not actually satisfied. The line counts are high, but the software is not actually better. The test scores are high, but the students have not actually learned. The product counts are high, but customers are being defrauded.
Examples of Goodhart's Law Across Domains
Education: Teaching to the test. The No Child Left Behind Act in the United States tied school funding to standardized test scores in reading and mathematics. Schools responded by increasing instructional time for tested subjects and reducing time for untested subjects (science, social studies, art, music, physical education). Within tested subjects, instruction narrowed to focus on the specific skills and question formats that appeared on the tests. Test scores improved; the breadth and depth of education declined. The metric (test scores) was optimized; the goal (educated students) was undermined.
Research by Daniel Koretz, published in The Testing Charade, documented systematic score inflation across states: test scores rose dramatically on state-specific assessments while showing little or no improvement on independent measures like the National Assessment of Educational Progress (NAEP). The state tests were being gamed; the underlying learning was not improving.
Healthcare: Surgical report cards. In the 1990s, New York State began publishing cardiac surgery mortality rates by hospital and by individual surgeon, with the goal of improving surgical quality through transparency and accountability. The transparency did produce some quality improvements--surgeons paid more attention to surgical technique and post-operative care. But it also produced a predictable Goodhart's Law effect: surgeons began avoiding high-risk patients who were most likely to die, because operating on these patients would worsen the surgeon's published mortality rate.
A study by Dranove, Kessler, McClellan, and Satterthwaite published in the Journal of Political Economy found that the report cards produced a net negative health effect: while average surgical outcomes improved (because surgeons tried harder), overall cardiac mortality in New York actually increased because the sickest patients--who needed surgery most--were being denied it. Doctors were optimizing their metric (reported mortality rate) at the expense of the goal the metric was supposed to serve (saving the most lives).
Policing: Crime statistics gaming. When police departments are evaluated based on crime statistics--particularly the number of reported crimes in their jurisdiction--officers face an incentive to reduce reported crime rather than reduce actual crime. Techniques include downgrading felonies to misdemeanors (reporting a burglary as "trespassing"), discouraging victims from filing reports, or reclassifying crimes to categories that are not tracked. The CompStat system in New York City, which used crime statistics to hold precinct commanders accountable, produced documented cases of statistic manipulation in multiple precincts.
Software development: Lines of code. When programmers are evaluated by lines of code produced, they write more code--but not necessarily better code. Verbose solutions replace elegant ones. Copy-paste replaces refactoring. Code that should be deleted (because it is no longer needed) is kept (because removing it would reduce the line count). The metric incentivizes quantity at the expense of quality, maintainability, and simplicity.
IBM reportedly discovered this effect decades ago and tried measuring programmers by the number of bugs they fixed. Programmers promptly began writing buggier code so they would have more bugs to fix. The metric was abandoned.
Call centers: Average handle time. When call center agents are evaluated by average handle time (the average duration of customer calls), they develop strategies to minimize call duration: transferring customers to other departments unnecessarily (ending their own call timer), hanging up on difficult callers (reducing average handle time), and rushing through interactions without resolving the customer's actual problem (ending the call quickly). The metric is optimized; customer satisfaction degrades.
How Do Metrics Create Unintended Consequences?
How do metrics create unintended consequences? Metrics focus attention on what is measured at the expense of what is unmeasured, enable gaming when rewards are attached, encourage short-term thinking when metrics are evaluated over short time horizons, and may destroy intrinsic motivation for work that was previously meaningful in itself.
The Attention Narrowing Effect
Every metric focuses attention, and focused attention necessarily means unfocused inattention. When a salesperson is measured on revenue, they focus on closing deals and may neglect customer relationships, product feedback, or team collaboration--activities that are valuable but unmeasured. When a teacher is measured on test scores, they focus on tested content and may neglect creativity, critical thinking, social development, and intellectual curiosity--outcomes that are valuable but unmeasured.
This narrowing effect is most damaging when the unmeasured dimensions are the most important ones. In many domains, the most important outcomes are the hardest to measure: trust, culture, innovation, long-term customer loyalty, employee development, ethical behavior. When organizations measure what is easy to measure (revenue, output, speed) and leave the important-but-hard-to-measure dimensions unmeasured, the easy-to-measure dimensions improve while the important dimensions deteriorate.
V.F. Ridgway identified this pattern in his 1956 paper "Dysfunctional Consequences of Performance Measurements," one of the earliest systematic analyses of how metrics distort behavior.
"Quantitative measures of performance are tools, and are undoubtedly useful. But research indicates that indiscriminate use and undue confidence and reliance in them result from failure to recognize their limitations." -- V.F. Ridgway
The Short-Termism Effect
Metrics evaluated over short time periods (weekly, monthly, quarterly) create incentives to optimize for short-term results at the expense of long-term outcomes. A salesperson measured on quarterly revenue may push customers into purchases they will regret (generating short-term revenue but long-term churn). A manager measured on quarterly costs may defer maintenance and training (reducing short-term costs but creating long-term problems). A CEO measured on quarterly earnings may cut research and development (boosting short-term profits but undermining long-term competitiveness).
This short-termism effect is particularly severe in publicly traded companies, where quarterly earnings reports drive stock prices and executive compensation. Research by Graham, Harvey, and Campbell found that 78 percent of CFOs admitted to taking actions that sacrificed long-term value to meet short-term earnings targets. The quarterly earnings metric--and the stock price consequences attached to it--systematically incentivizes destruction of long-term value in pursuit of short-term numbers.
The Gaming Taxonomy
Why do people game metrics? When metrics are tied to rewards, punishment, or status, gaming the metric is often easier than improving the underlying performance the metric is supposed to represent. Gaming takes several forms:
Cherry-picking: Selecting which cases to include in the metric to make the number look better. Surgeons who avoid high-risk patients. Schools that exclude low-performing students from tests. Fund managers who close unsuccessful funds and market only the survivors.
Threshold manipulation: Doing just enough to cross a target threshold without genuine improvement. Sales teams that pull revenue forward from next quarter to meet this quarter's target. Students who study only to the level needed for a passing grade rather than for understanding.
Definition shifting: Changing how inputs to the metric are categorized to improve the number without changing reality. Hospitals that reclassify patient deaths as "comfort care transitions." Police departments that downgrade crimes to less serious categories.
Output substitution: Producing what the metric measures instead of what the metric was intended to represent. Teachers who teach test-taking strategies instead of subject knowledge. Customer service agents who rush calls to reduce handle time instead of solving customer problems.
Effort reallocation: Shifting effort from unmeasured but valuable activities to measured activities. Researchers who prioritize publishing more papers (measured) over doing more rigorous research (unmeasured). Employees who prioritize visible, measurable activities over behind-the-scenes work that is equally important but harder to quantify.
What's the Difference Between Measurement and Targets?
What's the difference between measurement and targets? Measurement is observation--tracking a number to understand what is happening. Targets create pressure--specifying what the number should be and attaching consequences to achieving or missing it. Setting targets changes behavior far more than merely measuring, because targets activate the reward and punishment mechanisms that drive gaming and optimization.
A hospital that measures its surgical infection rate is observing reality. A hospital that sets a target of zero infections and penalizes surgeons who have infections is creating incentives that may produce actual quality improvement--but may also produce under-reporting of infections, reclassification of infections as non-infections, and avoidance of high-risk surgeries.
The distinction matters enormously for metric design. Measurement is relatively safe--the distortions it introduces (Hawthorne Effect, attention narrowing) are moderate. Targets with consequences are powerful but dangerous--the distortions they introduce (gaming, cherry-picking, short-termism, motivation crowding) can be severe enough to make the metric counterproductive.
W. Edwards Deming, the quality management pioneer, argued against numerical targets for exactly this reason. In his Fourteen Points for Management, Deming advocated "Eliminate numerical quotas for the workforce and numerical goals for management."
"If you have a stable system, then there is no use to specify a goal. You will get whatever the system will deliver. A goal beyond the capability of the system will not be reached." -- W. Edwards Deming
His reasoning: targets without methods produce gaming and distortion.
Deming did not oppose measurement. He opposed the attachment of targets and consequences to measurements without simultaneously providing the methods, resources, and system changes needed to achieve the targets legitimately. The Wells Fargo scandal is a perfect illustration of Deming's concern: the target (eight products per household) was set without the methods (genuine customer engagement strategies) needed to achieve it legitimately, so employees achieved the target illegitimately.
How Do You Prevent Metrics Gaming?
How do you prevent metrics gaming? Complete prevention is impossible--any metric that matters enough to create incentives will attract gaming. But gaming can be significantly reduced through thoughtful metric design and organizational practices:
Use Multiple Balanced Metrics
A single metric is easily gamed because optimizing one number is straightforward. Multiple metrics that balance each other are much harder to game because improving one at the expense of another does not produce an overall improvement.
Example: Instead of measuring customer service agents solely on call duration, measure a balanced set: call duration, customer satisfaction score, first-call resolution rate, and quality audit scores. An agent who rushes calls (gaming duration) will see their satisfaction and resolution scores decline. An agent who chats endlessly with customers (gaming satisfaction) will see their duration increase. The balanced set makes it difficult to game any individual metric without exposure on the others.
Robert Kaplan and David Norton's Balanced Scorecard framework is explicitly designed to address this problem. The Balanced Scorecard requires organizations to track metrics across four perspectives: financial, customer, internal processes, and learning and growth. The balance prevents over-optimization of any single perspective at the expense of the others.
Maintain Human Judgment Alongside Metrics
Metrics should inform human judgment, not replace it. A manager who evaluates employee performance should consider metrics as one input among many--alongside direct observation, peer feedback, qualitative assessment, and contextual understanding. When metrics are the sole basis for evaluation, gaming becomes the rational strategy because metrics are the only thing that matters.
Monitor for Manipulation
Actively look for signs of gaming: sudden improvements in a metric without corresponding improvements in related metrics, unusual patterns in data that suggest manipulation, or complaints from customers or other stakeholders that contradict what the metrics say.
Example: If reported crime drops 20 percent in a police precinct but citizen complaints about crime increase, the divergence suggests that crime statistics are being manipulated rather than that crime is actually declining.
Iterate Metric Design
Metrics should be treated as hypotheses about what matters, not as permanent truth. When gaming is detected, change the metric. When unintended consequences emerge, adjust the measurement system. The metric designers should expect gaming and plan for iterative improvement rather than treating the initial metric design as final.
What Makes Metrics Drive Good Behavior?
What makes metrics drive good behavior? Metrics drive good behavior when they are aligned with actual goals (not just proxies), when they are hard to game (measuring outcomes rather than easily manipulated outputs), when they are balanced with other metrics and with human judgment, and when they are treated as information rather than as targets with punitive consequences.
Align Metrics with Outcomes, Not Activities
The most robust metrics measure outcomes (what actually happened) rather than activities (what people did). Measuring patient health outcomes is harder to game than measuring number of procedures performed. Measuring customer lifetime value is harder to game than measuring number of sales closed. Measuring software reliability in production is harder to game than measuring lines of code written.
Outcome metrics are harder to game because the outcome is what you actually care about. If someone figures out how to improve the outcome metric, they have, by definition, improved the outcome--which is exactly what you wanted. Activity metrics can be improved without improving the outcome, which is why they are susceptible to gaming.
Use Leading and Lagging Indicators Together
Leading indicators predict future outcomes (pipeline volume predicts future revenue). Lagging indicators confirm past outcomes (quarterly revenue confirms whether the business performed). Using both together creates a measurement system that is harder to game because manipulating a leading indicator without producing the expected lagging result raises a red flag, and manipulating a lagging indicator is harder because it measures an actual outcome rather than a predicted one.
Create Psychological Safety Around Metrics
When metrics are used punitively--as weapons to identify and punish underperformers--people game them defensively. When metrics are used informationally--as tools to understand what is happening and identify opportunities for improvement--people engage with them honestly.
Amy Edmondson's research on psychological safety demonstrates that teams where people feel safe raising concerns, admitting mistakes, and discussing problems honestly produce better outcomes than teams where people feel threatened. The same principle applies to metrics: an environment where a declining metric triggers collaborative problem-solving produces better outcomes than an environment where a declining metric triggers blame and punishment.
Design for the Gaming You'll Get
Accept that some gaming will occur and design metrics that make the most productive forms of gaming the easiest path. If you measure customer satisfaction and employees discover that being friendlier to customers improves the score, that is productive gaming--the gaming produces the behavior you actually want. If you measure customer satisfaction and employees discover that offering unauthorized discounts improves the score, that is destructive gaming--the gaming produces behavior you do not want. The design challenge is creating metrics where the easiest way to improve the number is to genuinely improve the underlying outcome.
The fundamental insight about metrics and behavior is this: you are not choosing whether to influence behavior. You are choosing how to influence it. Any metric you implement will change behavior. Any target you set will create incentives. Any reward you attach to a number will motivate optimization of that number. The question is not whether these effects will occur--they will, with the regularity of a natural law--but whether the behavior the metrics encourage is the behavior you actually want. Getting that right requires understanding not just what to measure but how measurement itself reshapes the thing being measured.
What Research Shows About Metrics and Behavior
Charles Goodhart's original observation appeared in a 1975 Bank of England discussion paper examining why monetary policy targets kept breaking down. Goodhart, then an economic advisor to the Bank of England, noticed that whenever the Bank targeted a specific monetary aggregate -- M3, say -- the statistical relationship between that aggregate and economic activity promptly deteriorated. Market participants changed their behavior in response to the target, breaking the correlation that had made the aggregate useful as a measure. Anthropologist Marilyn Strathern generalized this observation to all measurement systems in her 1997 paper "'Improving Ratings': Audit in the British University System," coining the formulation now widely cited: "When a measure becomes a target, it ceases to be a good measure."
Donald Campbell, a social psychologist at Northwestern University, articulated a related principle independently in 1979, now called "Campbell's Law": "The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor." Campbell's version emphasizes the corruption mechanism -- not just that metrics break down, but that they actively corrupt the processes they measure. A school measured on test scores does not merely fail to capture learning; it actively reorganizes instruction in ways that damage learning.
Steven Kerr's 1975 paper "On the Folly of Rewarding A, While Hoping for B" in the Academy of Management Journal provided the most systematic catalogue of institutional metric failures in organizational research. Kerr documented cases from medicine, academia, government, and business where measurement systems created incentives to optimize for the measured proxy at the expense of the actual goal. His most cited example: university administrators measured on the number of publications produced faculty who maximized publications rather than knowledge. The paper remains one of the most downloaded articles in management science, having documented in 1975 a dynamic that organizations continue to rediscover.
Edward Deci and Richard Ryan's Self-Determination Theory, developed at the University of Rochester from the 1970s onward, provides the psychological mechanism behind the crowding-out effect. Their research distinguishes between intrinsic motivation (doing something because it is inherently interesting or satisfying) and extrinsic motivation (doing something for an external reward). Adding external rewards to an intrinsically motivated activity consistently reduces intrinsic motivation in experimental settings -- a finding replicated in over 100 studies across cultures. The key mechanism: external rewards shift the perceived locus of causality from internal to external, converting self-directed behavior into compliance behavior, with all the monitoring and gaming dynamics that compliance produces.
Real-World Case Studies in Metric Dysfunction
The Wells Fargo cross-selling scandal (2016) is the textbook case, but the pattern appears across sectors with depressing consistency.
The British National Health Service's four-hour emergency wait-time target, introduced in 2000, produced exactly the dynamic Campbell's Law predicts. By 2004, 98 percent of patients were being seen within four hours, a dramatic apparent improvement. An investigation by the Healthcare Commission in 2009 found that hospitals had gamed the target through multiple mechanisms: logging patients as "seen" when a nurse briefly assessed them even without treatment, creating temporary observation units that did not officially count as emergency beds, and in some cases instructing ambulance crews to circle hospitals until a bed was available so the clock would not start. The target improved; patient care did not. The NHS subsequently replaced the single metric with a dashboard of five indicators -- a direct application of balanced measurement principles.
The Soviet Union's experience with production quotas documented by economists like Robert Conquest and later Alec Nove provides the longest-running natural experiment in metric gaming. Soviet factory managers were measured on output quantity, producing the predictable result: managers optimized for output at the expense of quality, variety, and resource efficiency. When quotas were set in units, factories produced large quantities of small, useless items. When quotas were switched to weight, factories produced massive quantities of heavy items. When quotas were set in value terms, factories produced expensive items regardless of demand. Each metric was gamed within months of introduction. The Soviet planning system's chronic dysfunction was substantially a metric design failure.
In academic publishing, the widespread use of citation counts and the h-index as proxies for research quality has produced a documented increase in citation practices that inflate these metrics without reflecting actual scientific impact. A 2017 study by John Ioannidis and colleagues at Stanford found that approximately 250 researchers worldwide were cited more than 2,000 times per year -- a rate biologically impossible through actual reading and careful reference selection. Citation cartels, strategic self-citation, and the citing of papers based on their citation count rather than their content are all gaming behaviors that Goodhart's Law predicts and the data confirms.
The Science Behind Metric Gaming
Why do rational, ethical people engage in metric gaming? Research in behavioral economics provides a consistent answer: the perceived legitimacy of gaming depends on how closely the metric is identified with the goal.
Francesca Gino at Harvard Business School and Dan Ariely at Duke University have studied the "fudge factor" in ethical behavior -- the degree to which people rationalize borderline behavior without identifying it as dishonest. Their research, summarized in Ariely's 2012 book The Honest Truth About Dishonesty, found that people are more willing to cheat when the cheating involves a physical or psychological distance from the dishonest act. Metric gaming creates exactly this distance: the employee who creates an unauthorized bank account is not "stealing from the customer" in their own mental accounting; they are "hitting the cross-sell target." The metric acts as a cognitive buffer that makes the harmful behavior legible as compliance behavior.
Research by Uri Gneezy (UC San Diego) and Aldo Rustichini (University of Minnesota) documented a related effect in their famous 2000 study published in the Journal of Legal Studies. When an Israeli daycare center introduced a fine for parents who picked up their children late, late pickups increased rather than decreased. Before the fine, parents experienced social obligation (the psychological cost of inconveniencing the teachers); after the fine, they experienced a transaction (a payment for extra time). The metric -- the fine -- transformed the social relationship into a commercial one, eliminating the non-monetary motivation that had been producing the desired behavior. This monetization-of-motivation effect means that introducing financial metrics into previously non-monetary contexts can destroy the existing motivation structures those contexts depended on.
References and Further Reading
Goodhart, C.A.E. (1984). "Problems of Monetary Management: The U.K. Experience." In Monetary Theory and Practice. Palgrave. https://en.wikipedia.org/wiki/Goodhart%27s_law
Muller, J.Z. (2018). The Tyranny of Metrics. Princeton University Press. https://press.princeton.edu/books/hardcover/9780691174952/the-tyranny-of-metrics
Kerr, S. (1975). "On the Folly of Rewarding A, While Hoping for B." Academy of Management Journal, 18(4), 769-783. https://doi.org/10.2307/255378
Koretz, D. (2017). The Testing Charade: Pretending to Make Schools Better. University of Chicago Press. https://press.uchicago.edu/ucp/books/book/chicago/T/bo27209989.html
Dranove, D., Kessler, D., McClellan, M. & Satterthwaite, M. (2003). "Is More Information Better? The Effects of 'Report Cards' on Health Care Providers." Journal of Political Economy, 111(3), 555-588. https://doi.org/10.1086/374180
Deci, E.L. (1971). "Effects of Externally Mediated Rewards on Intrinsic Motivation." Journal of Personality and Social Psychology, 18(1), 105-115. https://doi.org/10.1037/h0030644
Pink, D.H. (2009). Drive: The Surprising Truth About What Motivates Us. Riverhead Books. https://www.danpink.com/books/drive/
Kaplan, R.S. & Norton, D.P. (1996). The Balanced Scorecard: Translating Strategy into Action. Harvard Business School Press. https://hbr.org/1992/01/the-balanced-scorecard-measures-that-drive-performance-2
Deming, W.E. (1986). Out of the Crisis. MIT Press. https://en.wikipedia.org/wiki/Out_of_the_Crisis
Ridgway, V.F. (1956). "Dysfunctional Consequences of Performance Measurements." Administrative Science Quarterly, 1(2), 240-247. https://doi.org/10.2307/2390989
Graham, J.R., Harvey, C.R. & Rajgopal, S. (2005). "The Economic Implications of Corporate Financial Reporting." Journal of Accounting and Economics, 40(1-3), 3-73. https://doi.org/10.1016/j.jacceco.2005.01.002
Edmondson, A.C. (2018). The Fearless Organization. Wiley. https://fearlessorganization.com/
Campbell, D.T. (1979). "Assessing the Impact of Planned Social Change." Evaluation and Program Planning, 2(1), 67-90. https://doi.org/10.1016/0149-7189(79)90048-X
Stumpf, J. (2016). Testimony before United States Senate Committee on Banking, Housing, and Urban Affairs. https://www.banking.senate.gov/
Strathern, M. (1997). "'Improving Ratings': Audit in the British University System." European Review, 5(3), 305-321. https://doi.org/10.1002/(SICI)1234-981X(199707)5:3<305::AID-EURO184>3.0.CO;2-4
Drucker, P.F. (1954). The Practice of Management. Harper & Row.
Deming, W.E. (1986). Out of the Crisis. MIT Center for Advanced Engineering Study.
Deci, E.L. & Ryan, R.M. (1985). Intrinsic Motivation and Self-Determination in Human Behavior. Plenum Press.
Ariely, D. (2010). "You Are What You Measure." Harvard Business Review, 88(6), 38.
Pfeffer, J. & Sutton, R.I. (2006). Hard Facts, Dangerous Half-Truths and Total Nonsense: Profiting from Evidence-Based Management. Harvard Business School Press.
Bohnet, I. (2016). What Works: Gender Equality by Design. Harvard University Press.
Espeland, W.N. & Sauder, M. (2007). "Rankings and Reactivity: How Public Measures Recreate Social Worlds." American Journal of Sociology, 113(1), 1-40.
Power, M. (1997). The Audit Society: Rituals of Verification. Oxford University Press.
Frequently Asked Questions
Why do metrics influence behavior?
People optimize for what's measured—metrics signal what's valued, create accountability, and often tied to rewards or consequences.
What is Goodhart's Law?
'When measure becomes target, ceases to be good measure'—people game metrics, optimizing measure rather than underlying goal.
How do metrics create unintended consequences?
Focus attention on measured at expense of unmeasured, enable gaming, encourage short-term thinking, and may destroy intrinsic motivation.
What's the difference between measurement and targets?
Measurement is observation; targets create pressure. Setting targets changes behavior more than just measuring.
Why do people game metrics?
When metrics tied to rewards, punishment, or status—easier to optimize metric than improve underlying performance.
Can metrics destroy motivation?
Yes—external measurement can undermine intrinsic motivation, especially for complex or creative work requiring autonomy.
How do you prevent metrics gaming?
Use multiple balanced metrics, maintain human judgment, make gaming costly, monitor for manipulation, and iterate design.
What makes metrics drive good behavior?
Aligned with actual goals, hard to game, measuring outcomes not just outputs, and balanced with unmeasured important factors.