In colonial India, the British administration in Delhi grew concerned about the city's cobra population. Their solution was straightforward: offer a bounty for every dead cobra. Citizens would collect the snakes, kill them, and claim payment. Simple incentives, measurable outcomes, problem solved.

Except it wasn't. Enterprising locals began breeding cobras specifically to collect the bounty. When the colonial government discovered this and cancelled the program, the cobra farmers released their now-worthless snakes, and the population surged beyond its original level.

This story — possibly apocryphal, but instructive regardless — gave its name to a phenomenon that plagues every organization that measures performance: the Cobra Effect. When you create an incentive for a metric, rational actors respond to the incentive rather than the underlying goal. The measure improves while the reality it was supposed to represent gets worse.

Understanding how and why metric gaming happens, what forms it takes in modern organizations, and how to design measurement systems that resist it, is one of the most practically valuable things a manager or team lead can learn.

Goodhart's Law: The Theory Behind the Problem

The formal version of this problem was stated by British economist Charles Goodhart in a 1975 paper on monetary policy. Goodhart's Law, as it came to be known, states:

"Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes."

The simplified version, widely attributed to sociologist Marilyn Strathern, captures it more memorably: "When a measure becomes a target, it ceases to be a good measure."

Goodhart's original context was narrow — he was observing how targeting monetary aggregates in UK policy caused those aggregates to lose their predictive value. But the principle generalizes widely. Every metric is a proxy for something we actually care about. The metric works as a proxy when people are not aware it is being measured. Once it becomes a target, the relationship between the metric and the underlying reality breaks down because people can optimize for the number without changing (or while actively degrading) the underlying reality.

Why Metric Gaming Is Rational

Before examining examples, it is important to establish that metric gaming is not primarily a moral failure. It is the rational response to misaligned incentives.

Employees face two kinds of demands simultaneously:

  1. Do the actual work that achieves the organization's goals.
  2. Perform well on whatever metrics are used to evaluate and reward them.

When these two demands are aligned, metric gaming is unnecessary — doing good work naturally produces good metrics. But when they diverge, employees face a choice between doing their job well and appearing to do their job well. In most organizational environments, appearance wins, because:

  • Visibility: Metrics are visible to management; underlying quality is often not.
  • Speed: Metric gains are immediate; real-world outcomes may lag by months.
  • Reward: Compensation, promotion, and recognition are tied to metrics.
  • Safety: Missing metrics has consequences; subtly degrading quality often does not.

This is not cynicism. It is the predictable output of any system that rewards proxy measures rather than real outcomes. As W. Edwards Deming argued repeatedly in his work on quality management, the vast majority of performance problems are system problems, not individual failures.

How Metric Gaming Manifests in Practice

Call Center Metrics Gaming

The call center is the canonical example of metric gaming because its measurement systems are so explicit and the consequences so visible.

Common call center metrics include:

  • Average Handle Time (AHT): time spent per call
  • First Call Resolution (FCR): percentage of issues resolved on first contact
  • Customer Satisfaction Score (CSAT): post-call survey ratings
  • Calls per Hour: volume throughput

When AHT is targeted, agents rush customers off the phone. Calls get shorter. AHT improves. FCR plummets as issues are not actually resolved. Customer satisfaction falls. When FCR is subsequently targeted, agents may put callers on hold indefinitely rather than escalating, claiming resolution on calls that are not resolved, or instructing customers not to call back.

When CSAT becomes the key metric, agents game surveys directly — coaching customers to give high ratings, suggesting at the end of calls "Is there anything I could have done to deserve a 10 today?", or excluding difficult customers from survey invitations where possible.

The individual metrics each look fine. The customer experience deteriorates.

Agile Velocity Inflation

In agile software development, velocity measures the number of story points a team completes per sprint. It is designed as a planning tool — to help teams estimate how much work they can take on — not as a performance metric.

When management begins using velocity as a measure of team productivity, the gaming begins almost immediately.

Teams facing velocity pressure typically respond by:

Gaming Behavior How It Looks What It Hides
Inflating story point estimates Higher points per story No actual complexity increase
Breaking stories into many small tasks More completions per sprint Artificial fragmentation of work
Prioritizing easy "low-hanging fruit" Velocity improves Technical debt and complex work accumulate
Marking stories complete prematurely Sprint burndown looks clean Rework hidden in future sprints
Avoiding hard-to-estimate work Predictable velocity Important, complex work delayed

The result is a team whose velocity numbers climb while actual output — features users value, technical quality, system reliability — stagnates or declines. Velocity, intended as a team planning tool, has been converted into a performance theater metric.

This dynamic was described prophetically in Robert D. Austin's 1996 book Measuring and Managing Performance in Organizations, which remains one of the most rigorous treatments of why measurement dysfunction occurs.

Sales Metrics Gaming

Sales organizations face similar dynamics. When call volume is measured, salespeople make shallow calls that do not advance relationships. When demos booked is the target, salespeople book unqualified demos that waste engineering time. When pipeline value is tracked, they inflate deal sizes or include early-stage prospects in committed pipeline.

When annual quota achievement is the primary metric with cliff effects (nothing for 99%, bonus for 100%+), salespeople engage in sandbagging — deliberately holding deals until the next period when they have already exceeded or failed quota — creating artificial lumpiness in revenue that makes forecasting impossible.

Healthcare Metrics Gaming

The UK's National Health Service provides a well-documented case. After waiting time targets were introduced in emergency departments (e.g., patients must be seen within four hours), hospitals found creative ways to meet the metric without reducing actual waits:

  • Patients were assessed briefly by a nurse to formally "start the clock," then placed back in the waiting area.
  • Ambulance handovers were delayed at hospital entrances so the four-hour clock did not start.
  • Admissions were reclassified as "day cases" to avoid the target's scope.

The metric improved. The experience of patients waiting in ambulances or hallways did not.

The Deeper Problem: Measurement Changes What Is Measured

There is a subtler effect beyond gaming. The act of measuring something changes how people think about and perform the underlying activity. This is sometimes called the observer effect in management.

When teachers are evaluated on test scores, they teach to the test. This is rational, but it changes the nature of education — more time drilling tested skills, less time on untested creative thinking, collaborative projects, or content outside the exam's scope. Over time, what is tested tends to become what is taught, even among teachers who believe in broader educational goals.

This effect operates even without explicit gaming. The measurement creates salience that reshapes attention and effort allocation. Things that are measured feel important; things that are not measured can feel like they do not count, even when everyone knows intellectually that they do.

Alison Davis-Blake, former dean of the University of Michigan's Ross School of Business, observed this in academic research evaluation: when citation counts became a dominant measure of research impact, academic writing subtly shifted toward citation-maximizing strategies — citing authors likely to reciprocate, writing short papers that create citation chains, and targeting high-impact journals regardless of whether they were the best fit for the work.

Designing Metrics That Are Harder to Game

No metric is entirely ungameable. The goal is not perfection but making gaming harder, more visible, and less rewarding. Several design principles help.

Measure Outcomes, Not Activities

Activity metrics (calls made, features shipped, hours worked) are the easiest to game because they measure effort that can be performed without producing results. Outcome metrics (revenue generated, bugs in production, customer retention rate) are harder to fake because they require the underlying reality to change.

The tradeoff is that outcome metrics lag. You learn about revenue this quarter, not what caused it. This is why effective measurement systems combine both.

Use Portfolios of Correlated Metrics

If gaming one metric always shows up negatively in another, gaming becomes costly. A call center that measures both AHT and FCR simultaneously creates a natural tension: rushing customers to reduce AHT tends to reduce FCR. Salespeople who inflate pipeline value face scrutiny when close rates fall. Software teams that inflate velocity face consequences when deploy-to-production rates and bug rates are also tracked.

The key is choosing metrics that are genuinely correlated with the outcome and that pull in different directions if gamed.

Build in Qualitative Checks

Quantitative metrics should be accompanied by qualitative assessment that cannot be easily gamed: customer interviews, peer reviews, manager observation, random audits. These are harder to scale but they catch systematic gaming that aggregated numbers miss.

Rotate and Refresh Metrics

Once employees learn to game a metric, the game tends to be played indefinitely. Periodically replacing or substantially revising metrics disrupts established gaming strategies and forces renewed attention to underlying performance. This is operationally disruptive but often worth it in contexts where gaming is severe.

Measure at Multiple Levels

Gaming that looks rational at the individual level often becomes irrational at the team or organizational level. Measurement systems that aggregate upward and make team-level patterns visible can expose individual gaming that is invisible in individual data.

Separate Incentive and Learning Metrics

One powerful principle from Austin's work: separate the metrics used for performance management (and tied to pay and promotion) from the metrics used for learning and improvement. When a metric is tied to rewards, gaming becomes inevitable. Diagnostic metrics used only by teams for their own improvement are less subject to gaming because the incentive to distort them is weaker.

Balancing Lag and Lead Indicators

The distinction between lag indicators (what happened) and lead indicators (what predicts what will happen) is fundamental to designing measurement systems that drive improvement rather than game-playing.

Indicator Type Examples Strength Weakness
Lag Revenue, customer churn, product defects Measures actual outcomes Cannot be acted on in time
Lead Prospect meetings, feature cycle time, employee engagement Enables early course correction Proxy for outcome; gameable

Organizations that measure only lag indicators are flying blind until it is too late. Organizations that measure only lead indicators are optimizing proxies and may be gaming their way to metric success while the underlying business deteriorates.

The most effective measurement frameworks, like the Balanced Scorecard (Kaplan and Norton) and OKRs (Objectives and Key Results), explicitly combine both: ambitious outcome goals (what we want to achieve) supported by leading activity metrics (how we are tracking toward those goals), with regular review cycles that check whether the lead metrics are actually predicting the outcomes.

Conclusion: The Measurement Paradox

Organizations face an inescapable paradox: without measurement, there is no visibility into performance, no way to learn, and no accountability. With measurement, rational actors optimize for measures rather than underlying goals, and the measures gradually lose their validity.

The solution is not to stop measuring. It is to:

  1. Treat all metrics as provisional and imperfect proxies, not as reality itself.
  2. Design measurement portfolios with correlated metrics that make gaming costly.
  3. Separate learning metrics from incentive metrics where possible.
  4. Build in qualitative checks and audits.
  5. Revisit and rotate metrics regularly.
  6. Measure outcomes, not just activities.
  7. Fix the system before blaming the people.

The Cobra Effect is not a sign of employees behaving badly. It is a sign of organizations designing incentive systems without thinking through their second-order effects. Cobras bred for bounty money are as predictable as story points inflated for velocity dashboards. The question is whether leadership will recognize the pattern before the snakes get released.

Frequently Asked Questions

What is Goodhart's Law?

Goodhart's Law states that 'when a measure becomes a target, it ceases to be a good measure.' Originally formulated by economist Charles Goodhart in 1975 regarding monetary policy, it describes the general phenomenon where optimizing for a metric distorts the underlying reality the metric was designed to track.

What is the Cobra Effect?

The Cobra Effect refers to unintended consequences where a solution worsens the problem it was meant to solve. The term comes from a story about colonial India, where the British government offered bounties for dead cobras to reduce the snake population. Enterprising Indians began breeding cobras to collect the bounty, increasing the population. When the program was cancelled, the bred cobras were released, making the problem worse.

How do agile development teams game velocity metrics?

Agile velocity measures story points completed per sprint. When velocity becomes a management target, teams inflate point estimates for new stories, break work into smaller chunks to show more completions, and prioritize easily completed tasks over important but complex ones. The result is that velocity numbers rise while actual output and quality stagnate or decline.

What is the difference between lead and lag indicators?

Lag indicators measure outcomes after they have occurred, such as quarterly revenue or customer churn rate. Lead indicators measure activities that predict future outcomes, such as number of prospect meetings or feature completion rate. Effective measurement systems use both: lag indicators to confirm outcomes and lead indicators to enable course correction before it is too late.

How can organizations design metrics that are harder to game?

Harder-to-game metrics tend to measure outcomes rather than activities, use multiple correlated indicators so gaming one shows up in others, are close to the ultimate goal rather than a proxy, and are periodically rotated or replaced before optimization distorts them. Qualitative measures, customer surveys, and random audits also help catch gaming that quantitative metrics miss.