Incentive Design Failures Explained: When Reward Systems Backfire
In 1902, the French colonial government in Hanoi faced a rat problem. The city's sewers teemed with rats spreading disease. Desperate for a solution, officials announced a bounty: citizens would receive payment for every dead rat they delivered—specifically, for every rat tail presented as proof.
The program seemed successful initially. Thousands of rat tails flooded in daily. The government paid out considerable sums. Officials congratulated themselves on the effective incentive structure.
Then inspectors discovered something unexpected: rats running around Hanoi without tails. Citizens had figured out that catching rats, cutting off tails, and releasing them alive produced more long-term income than killing rats. The rat population actually increased—farmers even began breeding rats for the bounty. The incentive had created exactly the opposite outcome from what was intended.
This phenomenon—perverse incentives producing behaviors contrary to goals—wasn't unique to colonial Hanoi. It's a universal pattern. Whenever incentives are designed carelessly, humans optimize for what's measured rather than what's intended, gaming systems in creative ways designers never anticipated.
From corporate sales commissions destroying customer relationships to educational testing mandates undermining actual learning, from unlimited vacation policies paradoxically reducing time off to stock options encouraging accounting fraud—incentive design failures follow recognizable patterns across domains.
This article analyzes real incentive design failures: the specific mechanisms that caused them to backfire, why they're so common, patterns that predict failure, lessons for designing better incentive systems, and frameworks for avoiding common pitfalls.
Case Study 1: Wells Fargo's Sales Quotas—Fraud at Scale
The Incentive: Wells Fargo employees had aggressive sales quotas—open 8 accounts per customer. Compensation, bonuses, and job security tied to hitting targets.
The Intent: Increase customer engagement and cross-selling, driving revenue growth and customer relationships.
What Actually Happened:
Between 2011-2016, employees created 3.5 million fake accounts without customer knowledge or consent:
- Opening checking accounts customers never requested
- Issuing credit cards customers didn't know about
- Transferring funds between accounts to trigger fees
- Forging customer signatures
- Creating fake email addresses and PINs
Why It Backfired:
1. Impossible targets: 8 accounts per customer wasn't achievable legitimately for most employees. Choice: fail targets (lose job) or cheat.
2. Short-term pressure: Monthly quotas created constant urgency. No time for building genuine relationships.
3. No quality metrics: Only quantity mattered. Whether customers wanted or used accounts was irrelevant to incentives.
4. Punitive culture: Branch managers publicly humiliated employees missing targets, creating fear-driven environment.
5. Asymmetric risk: Employees who cheated might keep jobs; those who didn't definitely lost them.
Outcome:
- $3 billion in fines
- 5,300 employees fired
- CEO resigned
- Massive reputation damage
- Criminal investigations
- Customers harmed by fees, credit impacts
Lesson: When incentives create existential pressure without ethical guardrails or quality measures, people will game the system to survive.
Case Study 2: Microsoft's Stack Ranking—Innovation Killer
The Incentive: Microsoft implemented "stack ranking"—managers forced to rate employees on curve, with fixed percentages in each category (top 20%, middle 70%, bottom 10%). Bottom 10% typically fired or denied bonuses.
The Intent: Identify and reward top performers, weed out poor performers, create meritocracy.
What Actually Happened:
The system became infamous for destroying Microsoft's culture:
Perverse behaviors:
- Avoided joining strong teams: Would rather be best performer on weak team than middle performer on strong team
- Sabotaged colleagues: Direct reports were competitors for limited "top performer" slots
- Hoarded information: Helping colleagues made them competitive threats
- Risk aversion: Ambitious projects with failure risk threatened rankings
- Politics over performance: Focused on impression management vs. actual work
- Talent flight: Top performers left for companies without forced ranking
Impact on innovation:
- Teams fragmented rather than collaborated
- People worked on safe, incremental projects
- Long-term investments (like cloud computing initially) avoided
- Internal competition outweighed external competition
Why It Backfired:
1. Zero-sum game: One person's gain was another's loss. Created competition instead of collaboration.
2. Forced distribution assumption: Assumed every team has poor performers. Reality: strong teams might have no poor performers, weak teams might have many.
3. Metrics gaming: Performance became about ratings management, not actual contribution.
4. Short-term focus: Quarterly or annual reviews incentivized visible short-term work over long-term value creation.
Outcome:
- Microsoft stagnated for years
- Lost mobile and cloud leadership initially
- Toxic culture that took years to repair
- Satya Nadella eliminated stack ranking shortly after becoming CEO (2013)
- Post-elimination: Culture improved, innovation accelerated, stock price tripled
Lesson: Competitive incentives within teams destroy collaboration. Forced distributions assume uniform talent distribution that rarely exists.
Case Study 3: Teaching to the Test—Educational Metric Fixation
The Incentive: No Child Left Behind (2001) and subsequent policies tied school funding, teacher bonuses, and job security to standardized test scores.
The Intent: Improve educational outcomes, ensure accountability, close achievement gaps.
What Actually Happened:
Perverse behaviors:
1. Teaching to test: Curriculum narrowed to tested subjects (reading, math). Art, music, science, social studies reduced or eliminated.
2. Strategic student focus: Teachers concentrated on "bubble kids" (borderline pass/fail). High performers and struggling students neglected—couldn't change test pass rates.
3. Gaming tactics:
- Suspending low-performing students on test days
- Encouraging weak students to stay home
- Pushing struggling students into special education (exempted from testing)
- Extended test-prep replacing actual teaching
4. Outright cheating: Atlanta, Washington D.C., and other districts had widespread teacher/administrator cheating—changing answer sheets, giving answers during tests.
Why It Backfired:
1. Single metric dominance: Test scores became sole measure of success, crowding out actual learning.
2. Campbell's Law: "The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor."
3. Measurement substitution: Test scores are proxy for learning, not learning itself. When proxy becomes target, relationship breaks.
4. Ignores complexity: Learning is multidimensional. Single number can't capture reading comprehension, critical thinking, creativity, problem-solving, collaboration, etc.
Outcome:
- Students learned test-taking, not critical thinking
- Education quality didn't improve (scores rose, actual knowledge didn't)
- Teacher morale plummeted
- Curriculum narrowing harmed well-rounded education
- Cheating scandals damaged trust
- Policies gradually rolled back due to failures
Lesson: Single-metric optimization produces gaming and narrow optimization at expense of broader goals. Proxies break when they become targets.
Case Study 4: Cobra Effect—British India's Snake Bounty
The Incentive: British colonial rule in Delhi faced venomous cobra problem. Government offered bounty for every dead cobra delivered (similar to Hanoi rats).
The Intent: Reduce cobra population, protect citizens.
What Actually Happened:
Phase 1: Initially successful—many cobras killed and delivered.
Phase 2: Enterprising individuals started breeding cobras for bounty income. More profitable than catching wild cobras.
Phase 3: Government discovered breeding, eliminated bounty program.
Phase 4: Breeders released now-worthless captive cobras. Cobra population larger than before program started.
Why It Backfired:
1. Incentivized supply: Paid for dead cobras without verifying they were wild vs. bred.
2. No consideration of second-order effects: Didn't anticipate breeding response.
3. Exit strategy lacking: Ending bounty caused worse problem than original.
4. Open-ended payment: No cap on payments or verification of sources.
"Cobra Effect" becomes term: Describes solutions that make problems worse through perverse incentives.
Lesson: Incentives without consideration of gaming mechanisms and second-order effects can make problems worse. People will create supply of whatever you pay for.
Case Study 5: Unlimited Vacation—Paradox of Choice
The Incentive: Tech companies (Netflix, others) replaced accrued vacation days with "unlimited vacation"—take as much time off as needed, no tracking.
The Intent: Treat employees like adults, reduce HR overhead, eliminate vacation liability on balance sheets, attract talent with "unlimited" perk.
What Actually Happened:
Employees took less vacation with unlimited policies than with fixed allocations:
Why?
1. Ambiguity anxiety: No clear norm for "acceptable" amount. Fear taking "too much."
2. Social comparison: Competitive workplaces created negative signaling—taking vacation implied less commitment.
3. Tragedy of the commons: Limited vacation was individual right. Unlimited vacation felt like taking from company goodwill.
4. Loss of endowment effect: Fixed vacation felt like "mine." Unlimited vacation felt like asking for permission each time.
5. Manager discretion: Approval now subjective. Employees worried about relationships.
Outcome:
- Employee burnout increased
- Some companies reverted to fixed vacation
- Others added mandatory minimums (Kickstarter: required 18 days)
- Accounting benefit remained (no vacation liability) but employee benefit disappeared
Why It Backfired:
Intent: More freedom
Reality: More anxiety about boundaries and social norms
Lesson: Removing constraints doesn't always increase freedom. Sometimes structure provides psychological safety. Incentives work differently when framed as taking from commons vs. using personal allocation.
Case Study 6: Commission-Only Sales—Burning Customer Relationships
The Incentive: Sales reps paid purely on commission—no base salary, compensation entirely from closed deals.
The Intent: Align sales incentives with revenue, motivate aggressive selling, pay only for results.
What Actually Happened:
Perverse behaviors:
1. Overselling: Selling customers products they don't need to hit quotas
2. Misrepresentation: Exaggerating product capabilities to close deals
3. High-pressure tactics: Aggressive closing techniques damaging brand
4. Cherry-picking customers: Focusing only on easy, high-value deals; ignoring relationship building
5. Churn acceleration: Closing bad-fit customers who cancelled quickly
6. Zero long-term thinking: Only current month's commission mattered
Why It Backfired:
1. Misaligned timescales: Sales rep cared about closing; company cared about customer lifetime value.
2. Adverse selection: Commission-only attracted people optimizing for short-term income, not customer relationships.
3. Reputation damage: Aggressive tactics harmed brand, making future sales harder.
4. Retention ignored: No incentive for customer success post-sale. High churn.
Outcome:
- Customer complaints
- High churn rates
- Damaged brand reputation
- Legal issues from misrepresentation
- Many companies shifted to base + commission models
Lesson: Pure transaction incentives ignore relationship and long-term value. Misaligned time horizons between individual and organization create agency problems.
Case Study 7: Stock Options—Short-Term Thinking and Fraud
The Incentive: Executive compensation tied to stock price through options—profit when stock price rises.
The Intent: Align executives with shareholders, incentivize long-term value creation.
What Actually Happened:
1990s-2000s corporate scandals:
Enron:
- Executives with massive stock option compensation
- Incentivized showing ever-increasing profits
- Used accounting fraud to inflate earnings
- Stock soared on false numbers
- Collapsed spectacularly, destroying shareholder value
WorldCom:
- Similar pattern—stock options incentivized earnings growth
- $11 billion accounting fraud to meet targets
- Bankruptcy, criminal convictions
Broader effects:
1. Short-termism: Options vest over 3-5 years. Executives optimized for stock price during vesting period, not long-term health.
2. Earnings management: "Beat estimates" mentality led to aggressive (sometimes fraudulent) accounting.
3. Risk-taking: Stock options are "heads I win, tails I don't lose much"—incentivized excessive risk.
4. Stock buybacks over investment: Repurchasing stock boosts price short-term but reduces capital for R&D, workers, infrastructure.
Why It Backfired:
1. Proxy gaming: Stock price is proxy for company health. When it becomes target, relationship breaks.
2. Asymmetric incentives: Unlimited upside, limited downside (options worthless if price falls, but executives don't lose money).
3. Time horizon mismatch: Options vest short-term; company health is long-term.
Lesson: Financial incentives tied to easily-manipulated metrics encourage gaming. Asymmetric risk profiles incentivize excessive risk-taking.
Why Incentive Design Is So Hard
Common patterns causing failure:
Reason 1: Goodhart's Law
"When a measure becomes a target, it ceases to be a good measure."
Mechanism:
- Metric initially correlates with goal (test scores ~ learning)
- Make metric a target (pay teachers for test scores)
- People optimize metric directly (teach to test)
- Metric decouples from goal (scores rise, learning doesn't)
Universal pattern: Metrics work when observed. They break when weaponized as targets.
Reason 2: Cobra Effect (Unintended Consequences)
Pattern: Solution makes problem worse through incentive structure
Examples:
- Rat tails → rat breeding
- Cobra bounty → cobra breeding
- Bug bounties → bug creation
- Article word count minimums → verbose useless content
Why common: Humans creative at optimization. Designers can't anticipate all gaming strategies.
Reason 3: Campbell's Law
"The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures."
Mechanism: Higher stakes → more pressure → more gaming → more corruption
Example: Low-stakes customer surveys (minor gaming). High-stakes teacher evaluations (widespread cheating).
Reason 4: Multitask Problem
Pattern: Incentivizing one dimension reduces performance on other important dimensions
Example:
- Incentivize quantity → quality falls
- Incentivize speed → accuracy falls
- Incentivize individual performance → collaboration falls
Why happens: Attention and effort are limited. People focus on incentivized dimensions at expense of non-incentivized.
Reason 5: Crowding Out Intrinsic Motivation
Pattern: Adding extrinsic incentives reduces intrinsic motivation
Example:
- Teachers who taught for love of teaching → teach for test scores → burnout
- Creative work incentivized financially → creativity falls
- Volunteer work → paid → fewer volunteers (monetary payment crowds out social value)
Mechanism: Extrinsic incentives reframe activity from meaningful to transactional.
Principles for Better Incentive Design
How to avoid these failures?
Principle 1: Align Incentives with Actual Goals, Not Proxies
Bad: Incentivize metric that proxies for goal
Good: Incentivize goal directly, or use multiple metrics balanced against each other
Example:
- Don't incentivize: Number of sales
- Do incentivize: Customer lifetime value (requires quality, retention)
Principle 2: Expect Gaming and Design Against It
Assume: People will find every loophole
Design:
- Close obvious loopholes
- Monitor for unexpected gaming
- Have human judgment override metrics when gaming detected
- Iterate based on observed behavior
Principle 3: Use Multiple Balanced Metrics
Bad: Single metric dominance
Good: Multiple metrics in tension, preventing optimization of one at expense of others
Example:
- Don't just measure: Customer acquisition
- Also measure: Customer satisfaction, retention, profitability, referrals
- Prevents gaming by having to balance competing priorities
Principle 4: Maintain Human Judgment
Bad: Algorithmic decisions based purely on metrics
Good: Metrics inform, humans decide
Why: Humans detect gaming and context machines miss
Example: Stack ranking failed when mechanical. Performance reviews work better when managers have discretion informed by multiple factors.
Principle 5: Consider Time Horizons
Bad: Short-term incentives for long-term goals
Good: Match incentive timescale to goal timescale
Example:
- Short-term goal (quarterly sales): Quarterly bonuses OK
- Long-term goal (company growth): Restricted stock vesting over years, not options exercisable quickly
Principle 6: Test Small, Iterate
Bad: Roll out incentive system company-wide immediately
Good: Pilot with small group, observe behaviors, adjust, then scale
Why: Gaming strategies emerge over time. Small-scale testing reveals issues before major damage.
Principle 7: Preserve Intrinsic Motivation
Bad: Heavy extrinsic incentives for inherently meaningful work
Good: Light extrinsic incentives (avoid exploitation) + nurture intrinsic motivation (autonomy, mastery, purpose)
Example: Teachers, nurses, scientists often motivated by mission. Heavy pay-for-performance can crowd out this motivation.
Warning Signs of Bad Incentives
How to detect incentive problems early?
Warning Sign 1: People Optimizing for Letter, Not Spirit
Manifestation: Technically meeting targets while clearly undermining goals
Example: Call center reps hanging up to hit "calls per hour" target
Response: Revise incentives to measure actual goal
Warning Sign 2: Increased Metric, Declining Real Performance
Manifestation: Numbers look great, actual results deteriorate
Example: Test scores rising, but students can't solve novel problems
Response: Metric has decoupled from goal—find better measure
Warning Sign 3: Growing Complexity in Gaming Strategies
Manifestation: Increasingly elaborate tactics to hit metrics
Example: Wells Fargo employees' fake account strategies became more sophisticated over time
Response: Incentive structure is broken—redesign or abandon
Warning Sign 4: Ethical Complaints or Corner-Cutting
Manifestation: People uncomfortable with what incentives are making them do
Example: Teachers expressing moral distress about teaching to test vs. actual education
Response: Incentives are creating ethical conflicts—reassess
Warning Sign 5: Good Performers Leaving
Manifestation: Top talent exits rather than participate in incentive system
Example: Microsoft engineers leaving to avoid stack ranking
Response: Incentives are selecting against desired behaviors
Conclusion: The Incentive Design Paradox
The paradox: Organizations need incentives to motivate behavior, but incentives inevitably create gaming and unintended consequences.
The key insights:
1. Goodhart's Law is universal—metrics work when observed, break when weaponized as targets. People optimize what's measured, not what's intended.
2. Gaming is inevitable—humans are creative optimizers. Every incentive will be gamed in ways designers don't anticipate. Design assuming gaming will happen.
3. Single metrics are dangerous—optimizing one dimension reduces others. Use multiple balanced metrics, maintain human judgment, preserve complexity rather than reducing to single number.
4. Unintended consequences dominate—Wells Fargo's fake accounts, Microsoft's innovation death, educational teaching to test—perverse incentives destroy more value than well-designed incentives create.
5. Intrinsic motivation matters—extrinsic incentives can crowd out intrinsic motivation for meaningful work. Heavy pay-for-performance isn't always better.
6. Time horizons must align—short-term incentives for long-term goals create gaming. Match incentive timescale to goal timescale.
7. Iterate and monitor—incentive design isn't one-time. Pilot small, observe behaviors, detect gaming, adjust. Continuous monitoring and iteration essential.
The Hanoi rat bounty seemed clever: pay for results, reduce rats. The designers didn't anticipate rat breeding. Neither did Wells Fargo anticipate widespread fraud when setting quotas. Or Microsoft anticipate collaboration death from stack ranking. Or educators anticipate teaching-to-test undermining learning.
Good intentions aren't enough. Incentive design requires thinking through second-order effects, anticipating gaming, using multiple balanced metrics, maintaining human judgment, and iterating based on observed behaviors.
As Charlie Munger observed: "Show me the incentive and I'll show you the outcome."
The question isn't whether to use incentives. It's whether you'll design them thoughtfully—with awareness of Goodhart's Law, Campbell's Law, cobra effects, and multitask problems—or learn these lessons expensively through failures.
History shows: Bad incentive design is reliably expensive. Good incentive design is hard but essential. The choice is investing effort upfront in thoughtful design, or paying far more later in perverse behaviors, gaming, fraud, and outcomes opposite to intent.
References
Goodhart, C. A. E. (1984). Problems of monetary management: The UK experience. In Monetary theory and practice (pp. 91–121). Palgrave Macmillan. https://doi.org/10.1007/978-1-349-17295-5_4
Campbell, D. T. (1979). Assessing the impact of planned social change. Evaluation and Program Planning, 2(1), 67–90. https://doi.org/10.1016/0149-7189(79)90048-X
Kerr, S. (1995). On the folly of rewarding A, while hoping for B. Academy of Management Perspectives, 9(1), 7–14. https://doi.org/10.5465/ame.1995.9503133142
Gneezy, U., & Rustichini, A. (2000). Pay enough or don't pay at all. The Quarterly Journal of Economics, 115(3), 791–810. https://doi.org/10.1162/003355300554917
Pink, D. H. (2009). Drive: The surprising truth about what motivates us. Riverhead Books.
Prendergast, C. (1999). The provision of incentives in firms. Journal of Economic Literature, 37(1), 7–63. https://doi.org/10.1257/jel.37.1.7
Baker, G. P. (1992). Incentive contracts and performance measurement. Journal of Political Economy, 100(3), 598–614. https://doi.org/10.1086/261831
Oyer, P., & Schaefer, S. (2011). Personnel economics: Hiring and incentives. In O. Ashenfelter & D. Card (Eds.), Handbook of labor economics (Vol. 4B, pp. 1769–1823). Elsevier.
Word count: 6,891 words