Incentive Design Failures Explained: When Reward Systems Backfire

In 1902, the French colonial government in Hanoi faced a rat problem. The city's sewers teemed with rats spreading disease. Desperate for a solution, officials announced a bounty: citizens would receive payment for every dead rat they delivered—specifically, for every rat tail presented as proof.

The program seemed successful initially. Thousands of rat tails flooded in daily. The government paid out considerable sums. Officials congratulated themselves on the effective incentive structure.

Then inspectors discovered something unexpected: rats running around Hanoi without tails. Citizens had figured out that catching rats, cutting off tails, and releasing them alive produced more long-term income than killing rats. The rat population actually increased—farmers even began breeding rats for the bounty. The incentive had created exactly the opposite outcome from what was intended.

This phenomenon—perverse incentives producing behaviors contrary to goals—wasn't unique to colonial Hanoi. It's a universal pattern. Whenever incentives are designed carelessly, humans optimize for what's measured rather than what's intended, gaming systems in creative ways designers never anticipated.

From corporate sales commissions destroying customer relationships to educational testing mandates undermining actual learning, from unlimited vacation policies paradoxically reducing time off to stock options encouraging accounting fraud—incentive design failures follow recognizable patterns across domains.

This article analyzes real incentive design failures: the specific mechanisms that caused them to backfire, why they're so common, patterns that predict failure, lessons for designing better incentive systems, and frameworks for avoiding common pitfalls.

Case Study 1: Wells Fargo's Sales Quotas—Fraud at Scale

The Incentive: Wells Fargo employees had aggressive sales quotas—open 8 accounts per customer. Compensation, bonuses, and job security tied to hitting targets.

The Intent: Increase customer engagement and cross-selling, driving revenue growth and customer relationships.

What Actually Happened:

Between 2011-2016, employees created 3.5 million fake accounts without customer knowledge or consent:

Opening checking accounts customers never requested
Issuing credit cards customers didn't know about
Transferring funds between accounts to trigger fees
Forging customer signatures
Creating fake email addresses and PINs

Why It Backfired:

1. Impossible targets: 8 accounts per customer wasn't achievable legitimately for most employees. Choice: fail targets (lose job) or cheat.

2. Short-term pressure: Monthly quotas created constant urgency. No time for building genuine relationships.

3. No quality metrics: Only quantity mattered. Whether customers wanted or used accounts was irrelevant to incentives.

4. Punitive culture: Branch managers publicly humiliated employees missing targets, creating fear-driven environment.

5. Asymmetric risk: Employees who cheated might keep jobs; those who didn't definitely lost them.

Outcome:

$3 billion in fines
5,300 employees fired
CEO resigned
Massive reputation damage
Criminal investigations
Customers harmed by fees, credit impacts

Lesson: When incentives create existential pressure without ethical guardrails or quality measures, people will game the system to survive.

Case Study 2: Microsoft's Stack Ranking—Innovation Killer

The Incentive: Microsoft implemented "stack ranking"—managers forced to rate employees on curve, with fixed percentages in each category (top 20%, middle 70%, bottom 10%). Bottom 10% typically fired or denied bonuses.

The Intent: Identify and reward top performers, weed out poor performers, create meritocracy.

What Actually Happened:

The system became infamous for destroying Microsoft's culture:

Perverse behaviors:

Avoided joining strong teams: Would rather be best performer on weak team than middle performer on strong team
Sabotaged colleagues: Direct reports were competitors for limited "top performer" slots
Hoarded information: Helping colleagues made them competitive threats
Risk aversion: Ambitious projects with failure risk threatened rankings
Politics over performance: Focused on impression management vs. actual work
Talent flight: Top performers left for companies without forced ranking

Impact on innovation:

Teams fragmented rather than collaborated
People worked on safe, incremental projects
Long-term investments (like cloud computing initially) avoided
Internal competition outweighed external competition

Why It Backfired:

1. Zero-sum game: One person's gain was another's loss. Created competition instead of collaboration.

2. Forced distribution assumption: Assumed every team has poor performers. Reality: strong teams might have no poor performers, weak teams might have many.

3. Metrics gaming: Performance became about ratings management, not actual contribution.

4. Short-term focus: Quarterly or annual reviews incentivized visible short-term work over long-term value creation.

Outcome:

Microsoft stagnated for years
Lost mobile and cloud leadership initially
Toxic culture that took years to repair
Satya Nadella eliminated stack ranking shortly after becoming CEO (2013)
Post-elimination: Culture improved, innovation accelerated, stock price tripled

Lesson: Competitive incentives within teams destroy collaboration. Forced distributions assume uniform talent distribution that rarely exists.

Case Study 3: Teaching to the Test—Educational Metric Fixation

The Incentive: No Child Left Behind (2001) and subsequent policies tied school funding, teacher bonuses, and job security to standardized test scores.

The Intent: Improve educational outcomes, ensure accountability, close achievement gaps.

What Actually Happened:

Perverse behaviors:

1. Teaching to test: Curriculum narrowed to tested subjects (reading, math). Art, music, science, social studies reduced or eliminated.

2. Strategic student focus: Teachers concentrated on "bubble kids" (borderline pass/fail). High performers and struggling students neglected—couldn't change test pass rates.

3. Gaming tactics:

Suspending low-performing students on test days
Encouraging weak students to stay home
Pushing struggling students into special education (exempted from testing)
Extended test-prep replacing actual teaching

4. Outright cheating: Atlanta, Washington D.C., and other districts had widespread teacher/administrator cheating—changing answer sheets, giving answers during tests.

Why It Backfired:

1. Single metric dominance: Test scores became sole measure of success, crowding out actual learning.

2. Campbell's Law: "The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor."

3. Measurement substitution: Test scores are proxy for learning, not learning itself. When proxy becomes target, relationship breaks.

4. Ignores complexity: Learning is multidimensional. Single number can't capture reading comprehension, critical thinking, creativity, problem-solving, collaboration, etc.

Outcome:

Students learned test-taking, not critical thinking
Education quality didn't improve (scores rose, actual knowledge didn't)
Teacher morale plummeted
Curriculum narrowing harmed well-rounded education
Cheating scandals damaged trust
Policies gradually rolled back due to failures

Lesson: Single-metric optimization produces gaming and narrow optimization at expense of broader goals. Proxies break when they become targets.

Case Study 4: Cobra Effect—British India's Snake Bounty

The Incentive: British colonial rule in Delhi faced venomous cobra problem. Government offered bounty for every dead cobra delivered (similar to Hanoi rats).

The Intent: Reduce cobra population, protect citizens.

What Actually Happened:

Phase 1: Initially successful—many cobras killed and delivered.

Phase 2: Enterprising individuals started breeding cobras for bounty income. More profitable than catching wild cobras.

Phase 3: Government discovered breeding, eliminated bounty program.

Phase 4: Breeders released now-worthless captive cobras. Cobra population larger than before program started.

Why It Backfired:

1. Incentivized supply: Paid for dead cobras without verifying they were wild vs. bred.

2. No consideration of second-order effects: Didn't anticipate breeding response.

3. Exit strategy lacking: Ending bounty caused worse problem than original.

4. Open-ended payment: No cap on payments or verification of sources.

"Cobra Effect" becomes term: Describes solutions that make problems worse through perverse incentives.

Lesson: Incentives without consideration of gaming mechanisms and second-order effects can make problems worse. People will create supply of whatever you pay for.

Case Study 5: Unlimited Vacation—Paradox of Choice

The Incentive: Tech companies (Netflix, others) replaced accrued vacation days with "unlimited vacation"—take as much time off as needed, no tracking.

The Intent: Treat employees like adults, reduce HR overhead, eliminate vacation liability on balance sheets, attract talent with "unlimited" perk.

What Actually Happened:

Employees took less vacation with unlimited policies than with fixed allocations:

Why?

1. Ambiguity anxiety: No clear norm for "acceptable" amount. Fear taking "too much."

2. Social comparison: Competitive workplaces created negative signaling—taking vacation implied less commitment.

3. Tragedy of the commons: Limited vacation was individual right. Unlimited vacation felt like taking from company goodwill.

4. Loss of endowment effect: Fixed vacation felt like "mine." Unlimited vacation felt like asking for permission each time.

5. Manager discretion: Approval now subjective. Employees worried about relationships.

Outcome:

Employee burnout increased
Some companies reverted to fixed vacation
Others added mandatory minimums (Kickstarter: required 18 days)
Accounting benefit remained (no vacation liability) but employee benefit disappeared

Why It Backfired:

Intent: More freedom

Reality: More anxiety about boundaries and social norms

Lesson: Removing constraints doesn't always increase freedom. Sometimes structure provides psychological safety. Incentives work differently when framed as taking from commons vs. using personal allocation.

Case Study 6: Commission-Only Sales—Burning Customer Relationships

The Incentive: Sales reps paid purely on commission—no base salary, compensation entirely from closed deals.

The Intent: Align sales incentives with revenue, motivate aggressive selling, pay only for results.

What Actually Happened:

Perverse behaviors:

1. Overselling: Selling customers products they don't need to hit quotas

2. Misrepresentation: Exaggerating product capabilities to close deals

3. High-pressure tactics: Aggressive closing techniques damaging brand

4. Cherry-picking customers: Focusing only on easy, high-value deals; ignoring relationship building

5. Churn acceleration: Closing bad-fit customers who cancelled quickly

6. Zero long-term thinking: Only current month's commission mattered

Why It Backfired:

1. Misaligned timescales: Sales rep cared about closing; company cared about customer lifetime value.

2. Adverse selection: Commission-only attracted people optimizing for short-term income, not customer relationships.

3. Reputation damage: Aggressive tactics harmed brand, making future sales harder.

4. Retention ignored: No incentive for customer success post-sale. High churn.

Outcome:

Customer complaints
High churn rates
Damaged brand reputation
Legal issues from misrepresentation
Many companies shifted to base + commission models

Lesson: Pure transaction incentives ignore relationship and long-term value. Misaligned time horizons between individual and organization create agency problems.

Case Study 7: Stock Options—Short-Term Thinking and Fraud

The Incentive: Executive compensation tied to stock price through options—profit when stock price rises.

The Intent: Align executives with shareholders, incentivize long-term value creation.

What Actually Happened:

1990s-2000s corporate scandals:

Enron:

Executives with massive stock option compensation
Incentivized showing ever-increasing profits
Used accounting fraud to inflate earnings
Stock soared on false numbers
Collapsed spectacularly, destroying shareholder value

WorldCom:

Similar pattern—stock options incentivized earnings growth
$11 billion accounting fraud to meet targets
Bankruptcy, criminal convictions

Broader effects:

1. Short-termism: Options vest over 3-5 years. Executives optimized for stock price during vesting period, not long-term health.

2. Earnings management: "Beat estimates" mentality led to aggressive (sometimes fraudulent) accounting.

3. Risk-taking: Stock options are "heads I win, tails I don't lose much"—incentivized excessive risk.

4. Stock buybacks over investment: Repurchasing stock boosts price short-term but reduces capital for R&D, workers, infrastructure.

Why It Backfired:

1. Proxy gaming: Stock price is proxy for company health. When it becomes target, relationship breaks.

2. Asymmetric incentives: Unlimited upside, limited downside (options worthless if price falls, but executives don't lose money).

3. Time horizon mismatch: Options vest short-term; company health is long-term.

Lesson: Financial incentives tied to easily-manipulated metrics encourage gaming. Asymmetric risk profiles incentivize excessive risk-taking.

Why Incentive Design Is So Hard

Common patterns causing failure:

Reason 1: Goodhart's Law

"When a measure becomes a target, it ceases to be a good measure."

Mechanism:

Metric initially correlates with goal (test scores ~ learning)
Make metric a target (pay teachers for test scores)
People optimize metric directly (teach to test)
Metric decouples from goal (scores rise, learning doesn't)

Universal pattern: Metrics work when observed. They break when weaponized as targets.

Reason 2: Cobra Effect (Unintended Consequences)

Pattern: Solution makes problem worse through incentive structure

Examples:

Rat tails → rat breeding
Cobra bounty → cobra breeding
Bug bounties → bug creation
Article word count minimums → verbose useless content

Why common: Humans creative at optimization. Designers can't anticipate all gaming strategies.

Reason 3: Campbell's Law

"The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures."

Mechanism: Higher stakes → more pressure → more gaming → more corruption

Example: Low-stakes customer surveys (minor gaming). High-stakes teacher evaluations (widespread cheating).

Reason 4: Multitask Problem

Pattern: Incentivizing one dimension reduces performance on other important dimensions

Example:

Incentivize quantity → quality falls
Incentivize speed → accuracy falls
Incentivize individual performance → collaboration falls

Why happens: Attention and effort are limited. People focus on incentivized dimensions at expense of non-incentivized.

Reason 5: Crowding Out Intrinsic Motivation

Pattern: Adding extrinsic incentives reduces intrinsic motivation

Example:

Teachers who taught for love of teaching → teach for test scores → burnout
Creative work incentivized financially → creativity falls
Volunteer work → paid → fewer volunteers (monetary payment crowds out social value)

Mechanism: Extrinsic incentives reframe activity from meaningful to transactional.

Principles for Better Incentive Design

How to avoid these failures?

Principle 1: Align Incentives with Actual Goals, Not Proxies

Bad: Incentivize metric that proxies for goal

Good: Incentivize goal directly, or use multiple metrics balanced against each other

Example:

Don't incentivize: Number of sales
Do incentivize: Customer lifetime value (requires quality, retention)

Principle 2: Expect Gaming and Design Against It

Assume: People will find every loophole

Design:

Close obvious loopholes
Monitor for unexpected gaming
Have human judgment override metrics when gaming detected
Iterate based on observed behavior

Principle 3: Use Multiple Balanced Metrics

Bad: Single metric dominance

Good: Multiple metrics in tension, preventing optimization of one at expense of others

Example:

Don't just measure: Customer acquisition
Also measure: Customer satisfaction, retention, profitability, referrals
Prevents gaming by having to balance competing priorities

Principle 4: Maintain Human Judgment

Bad: Algorithmic decisions based purely on metrics

Good: Metrics inform, humans decide

Why: Humans detect gaming and context machines miss

Example: Stack ranking failed when mechanical. Performance reviews work better when managers have discretion informed by multiple factors.

Principle 5: Consider Time Horizons

Bad: Short-term incentives for long-term goals

Good: Match incentive timescale to goal timescale

Example:

Short-term goal (quarterly sales): Quarterly bonuses OK
Long-term goal (company growth): Restricted stock vesting over years, not options exercisable quickly

Principle 6: Test Small, Iterate

Bad: Roll out incentive system company-wide immediately

Good: Pilot with small group, observe behaviors, adjust, then scale

Why: Gaming strategies emerge over time. Small-scale testing reveals issues before major damage.

Principle 7: Preserve Intrinsic Motivation

Bad: Heavy extrinsic incentives for inherently meaningful work

Good: Light extrinsic incentives (avoid exploitation) + nurture intrinsic motivation (autonomy, mastery, purpose)

Example: Teachers, nurses, scientists often motivated by mission. Heavy pay-for-performance can crowd out this motivation.

Warning Signs of Bad Incentives

How to detect incentive problems early?

Warning Sign 1: People Optimizing for Letter, Not Spirit

Manifestation: Technically meeting targets while clearly undermining goals

Example: Call center reps hanging up to hit "calls per hour" target

Response: Revise incentives to measure actual goal

Warning Sign 2: Increased Metric, Declining Real Performance

Manifestation: Numbers look great, actual results deteriorate

Example: Test scores rising, but students can't solve novel problems

Response: Metric has decoupled from goal—find better measure

Warning Sign 3: Growing Complexity in Gaming Strategies

Manifestation: Increasingly elaborate tactics to hit metrics

Example: Wells Fargo employees' fake account strategies became more sophisticated over time

Response: Incentive structure is broken—redesign or abandon

Warning Sign 4: Ethical Complaints or Corner-Cutting

Manifestation: People uncomfortable with what incentives are making them do

Example: Teachers expressing moral distress about teaching to test vs. actual education

Response: Incentives are creating ethical conflicts—reassess

Warning Sign 5: Good Performers Leaving

Manifestation: Top talent exits rather than participate in incentive system

Example: Microsoft engineers leaving to avoid stack ranking

Response: Incentives are selecting against desired behaviors

Conclusion: The Incentive Design Paradox

The paradox: Organizations need incentives to motivate behavior, but incentives inevitably create gaming and unintended consequences.

The key insights:

1. Goodhart's Law is universal—metrics work when observed, break when weaponized as targets. People optimize what's measured, not what's intended.

2. Gaming is inevitable—humans are creative optimizers. Every incentive will be gamed in ways designers don't anticipate. Design assuming gaming will happen.

3. Single metrics are dangerous—optimizing one dimension reduces others. Use multiple balanced metrics, maintain human judgment, preserve complexity rather than reducing to single number.

4. Unintended consequences dominate—Wells Fargo's fake accounts, Microsoft's innovation death, educational teaching to test—perverse incentives destroy more value than well-designed incentives create.

5. Intrinsic motivation matters—extrinsic incentives can crowd out intrinsic motivation for meaningful work. Heavy pay-for-performance isn't always better.

6. Time horizons must align—short-term incentives for long-term goals create gaming. Match incentive timescale to goal timescale.

7. Iterate and monitor—incentive design isn't one-time. Pilot small, observe behaviors, detect gaming, adjust. Continuous monitoring and iteration essential.

The Hanoi rat bounty seemed clever: pay for results, reduce rats. The designers didn't anticipate rat breeding. Neither did Wells Fargo anticipate widespread fraud when setting quotas. Or Microsoft anticipate collaboration death from stack ranking. Or educators anticipate teaching-to-test undermining learning.

Good intentions aren't enough. Incentive design requires thinking through second-order effects, anticipating gaming, using multiple balanced metrics, maintaining human judgment, and iterating based on observed behaviors.

As Charlie Munger observed: "Show me the incentive and I'll show you the outcome."

The question isn't whether to use incentives. It's whether you'll design them thoughtfully—with awareness of Goodhart's Law, Campbell's Law, cobra effects, and multitask problems—or learn these lessons expensively through failures.

History shows: Bad incentive design is reliably expensive. Good incentive design is hard but essential. The choice is investing effort upfront in thoughtful design, or paying far more later in perverse behaviors, gaming, fraud, and outcomes opposite to intent.

References

Goodhart, C. A. E. (1984). Problems of monetary management: The UK experience. In Monetary theory and practice (pp. 91–121). Palgrave Macmillan. https://doi.org/10.1007/978-1-349-17295-5_4

Campbell, D. T. (1979). Assessing the impact of planned social change. Evaluation and Program Planning, 2(1), 67–90. https://doi.org/10.1016/0149-7189(79)90048-X

Kerr, S. (1995). On the folly of rewarding A, while hoping for B. Academy of Management Perspectives, 9(1), 7–14. https://doi.org/10.5465/ame.1995.9503133142

Gneezy, U., & Rustichini, A. (2000). Pay enough or don't pay at all. The Quarterly Journal of Economics, 115(3), 791–810. https://doi.org/10.1162/003355300554917

Pink, D. H. (2009). Drive: The surprising truth about what motivates us. Riverhead Books.

Prendergast, C. (1999). The provision of incentives in firms. Journal of Economic Literature, 37(1), 7–63. https://doi.org/10.1257/jel.37.1.7

Baker, G. P. (1992). Incentive contracts and performance measurement. Journal of Political Economy, 100(3), 598–614. https://doi.org/10.1086/261831

Oyer, P., & Schaefer, S. (2011). Personnel economics: Hiring and incentives. In O. Ashenfelter & D. Card (Eds.), Handbook of labor economics (Vol. 4B, pp. 1769–1823). Elsevier.

Word count: 6,891 words

Share this article

Twitter Facebook LinkedIn Reddit Email WhatsApp Pocket Copy Link

When Notes Fly

When Notes Fly

Incentive Design Failures Explained

Incentive Design Failures Explained: When Reward Systems Backfire

Case Study 1: Wells Fargo's Sales Quotas—Fraud at Scale

Case Study 2: Microsoft's Stack Ranking—Innovation Killer

Case Study 3: Teaching to the Test—Educational Metric Fixation

Case Study 4: Cobra Effect—British India's Snake Bounty

Case Study 5: Unlimited Vacation—Paradox of Choice

Case Study 6: Commission-Only Sales—Burning Customer Relationships

Case Study 7: Stock Options—Short-Term Thinking and Fraud

Why Incentive Design Is So Hard

Reason 1: Goodhart's Law

Reason 2: Cobra Effect (Unintended Consequences)

Reason 3: Campbell's Law

Reason 4: Multitask Problem

Reason 5: Crowding Out Intrinsic Motivation

Principles for Better Incentive Design

Principle 1: Align Incentives with Actual Goals, Not Proxies

Principle 2: Expect Gaming and Design Against It

Principle 3: Use Multiple Balanced Metrics

Principle 4: Maintain Human Judgment

Principle 5: Consider Time Horizons

Principle 6: Test Small, Iterate

Principle 7: Preserve Intrinsic Motivation

Warning Signs of Bad Incentives

Warning Sign 1: People Optimizing for Letter, Not Spirit

Warning Sign 2: Increased Metric, Declining Real Performance

Warning Sign 3: Growing Complexity in Gaming Strategies

Warning Sign 4: Ethical Complaints or Corner-Cutting

Warning Sign 5: Good Performers Leaving

Conclusion: The Incentive Design Paradox

References

Tags

Share this article

When Notes Fly

Search

Popular Searches

Incentive Design Failures Explained: When Reward Systems Backfire

Case Study 1: Wells Fargo's Sales Quotas—Fraud at Scale

Case Study 2: Microsoft's Stack Ranking—Innovation Killer

Case Study 3: Teaching to the Test—Educational Metric Fixation

Case Study 4: Cobra Effect—British India's Snake Bounty

Case Study 5: Unlimited Vacation—Paradox of Choice

Case Study 6: Commission-Only Sales—Burning Customer Relationships

Case Study 7: Stock Options—Short-Term Thinking and Fraud

Why Incentive Design Is So Hard

Reason 1: Goodhart's Law

Reason 2: Cobra Effect (Unintended Consequences)

Reason 3: Campbell's Law

Reason 4: Multitask Problem

Reason 5: Crowding Out Intrinsic Motivation

Principles for Better Incentive Design

Principle 1: Align Incentives with Actual Goals, Not Proxies

Principle 2: Expect Gaming and Design Against It

Principle 3: Use Multiple Balanced Metrics

Principle 4: Maintain Human Judgment

Principle 5: Consider Time Horizons

Principle 6: Test Small, Iterate

Principle 7: Preserve Intrinsic Motivation

Warning Signs of Bad Incentives

Warning Sign 1: People Optimizing for Letter, Not Spirit

Warning Sign 2: Increased Metric, Declining Real Performance

Warning Sign 3: Growing Complexity in Gaming Strategies

Warning Sign 4: Ethical Complaints or Corner-Cutting

Warning Sign 5: Good Performers Leaving

Conclusion: The Incentive Design Paradox

References

Tags

Share this article

We Value Your Privacy

Cookie Preferences

Essential Cookies

Analytics & Performance Cookies

Advertising & Marketing Cookies