In 1975, the New York Police Department began routing crime data through a new accountability system. Precinct commanders were evaluated on the numbers — robbery rates, felony counts, response times. The logic was straightforward: if you measure crime, you can manage it. The problem was equally straightforward in retrospect. Officers across the five boroughs began downgrading felonies to misdemeanors, discouraging victims from filing formal reports, and reclassifying robberies as "lost property." Crime statistics improved. Crime did not. The metric had replaced the reality it was supposed to represent.

This pattern — the predictable degradation of any social indicator once it becomes the primary basis for reward and punishment — was named and formalized a year later by sociologist Donald T. Campbell. In a 1976 paper titled "Assessing the Impact of Planned Social Change," Campbell wrote what would become one of the most consequential observations in applied social science:

"The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor." — Donald T. Campbell, Assessing the Impact of Planned Social Change (1976)

Campbell's Law is not a critique of measurement. It is a precise claim about what happens to measurement under a specific institutional condition: when quantitative indicators carry high-stakes consequences for the people being measured. Under that condition, rational behavior by individuals produces predictable system-level corruption. Understanding why this happens — and what, if anything, can be done about it — is essential for anyone designing accountability systems, evaluating organizational performance, or interpreting statistics generated under institutional pressure.


Key Definitions

Campbell's Law — The principle that any quantitative social indicator used for high-stakes social decision-making will be subject to corruption pressures, causing it to distort the very social process it was designed to monitor. Proposed by Donald T. Campbell (1976).

Indicator — A measurable variable used as a proxy for something that is harder or impossible to measure directly. Crime statistics are an indicator of public safety. Test scores are an indicator of educational achievement. Return on equity is an indicator of business performance.

Proxy measure — A substitute variable used in place of the actual outcome of interest, on the grounds that the proxy reliably correlates with the outcome under normal conditions. The problem Campbell identified is that "normal conditions" change once the proxy becomes a target.

Metric gaming — The practice of optimizing a measured variable without improving or while actively degrading the underlying outcome it represents. Distinct from fraud in that it often involves fully legal, defensible behavior; distinct from genuine performance improvement in that the underlying outcome does not improve.

Goodhart's Law — A related principle attributed to British economist Charles Goodhart (1975): "When a measure becomes a target, it ceases to be a good measure." Narrower in origin (monetary economics) but similar in mechanism. See the comparison below.

High-stakes accountability — A measurement regime in which rewards, punishments, funding, or career consequences are directly tied to measured performance on specific indicators. Campbell's Law operates most powerfully under high-stakes accountability.

The experimenting society — Campbell's proposed remedy: an institutional culture committed to rigorous evaluation, willingness to act on evidence, and humility about the limits of any single indicator. He considered it rare.


The Mechanism: Why Rational Actors Corrupt Metrics

Campbell's Law operates through a simple mechanism that does not require bad actors, dishonesty, or even awareness of the dynamic. It emerges from the combination of three ordinary features of institutional life:

1. Agents optimize for what is measured. When rewards and punishments are tied to specific metrics, individuals and organizations rationally devote effort toward improving those metrics. This is not gaming — it is a basic response to incentive structures.

2. Metrics are proxies, not outcomes. A test score is not learning. A crime statistic is not public safety. A citation count is not intellectual contribution. Every quantitative indicator is a proxy for something that is harder to observe directly. The proxy correlates with the outcome under normal conditions — that correlation is why the proxy was chosen.

3. Optimizing the proxy is not the same as improving the outcome. Once agents understand that the proxy is what is being measured, they can improve the proxy through means that do not improve or that actively harm the underlying outcome. Teaching specifically to test formats improves test scores without necessarily improving learning. Reclassifying crimes reduces crime statistics without reducing crime.

The result is that every high-stakes measurement regime contains the seeds of its own corruption. The indicators that administrators rely on to assess performance become progressively less reliable as the people being assessed learn to manage them.

"Show me the incentive and I'll show you the outcome." — Charlie Munger, Vice Chairman, Berkshire Hathaway


Campbell's Law vs. Goodhart's Law

The two most frequently conflated principles in the literature on measurement distortion are Campbell's Law and Goodhart's Law. They describe closely related but meaningfully distinct phenomena.

Dimension Campbell's Law Goodhart's Law
Originator Donald T. Campbell, sociologist, 1976 Charles Goodhart, economist, 1975
Original domain Social policy and program evaluation Monetary policy
Core formulation Quantitative social indicators degrade under high-stakes decision-making pressure Statistical regularities collapse when placed under control pressure
Mechanism emphasis Corruption of underlying social processes by institutional incentives Breakdown of statistical relationships when behavior changes in response to policy
Scope Explicitly social: institutions, organizations, accountability systems Originally economic; subsequently generalized
Includes fraud? Yes — Campbell explicitly discussed escalation to outright corruption Not explicitly; focuses on the statistical problem
Proposed remedy The experimenting society: multiple indicators, rigorous evaluation Multiple diverse measures; separating measurement from control

Goodhart's formulation, as popularly restated by Marilyn Strathern, is more memorable: "when a measure becomes a target, it ceases to be a good measure." Campbell's is more specific about the mechanism — it is not simply that the measure ceases to be good, but that the underlying process it was designed to monitor is itself distorted.

Both principles ultimately describe the same class of failure. The practical difference lies in what each emphasizes: Goodhart focuses on the statistical deterioration of measurement validity; Campbell focuses on the institutional and political dynamics that produce both the pressure and the corruption.


Classic Cases

Education: Standardized Testing

Standardized testing in American public education provides the most thoroughly documented illustration of Campbell's Law. Under the No Child Left Behind Act of 2001 and subsequent accountability legislation, standardized test scores became the primary measure used to evaluate schools, teachers, and districts, with direct financial and operational consequences for poor performance.

The behavioral responses were documented extensively by researchers in the decade that followed:

Curriculum narrowing. Schools reduced instructional time on subjects not covered by state accountability tests — art, music, physical education, science, social studies — in order to concentrate resources on tested subjects. Studies in high-stakes accountability districts found reductions of 30–75 minutes per week in non-tested subjects.

Test-specific instruction. Teaching practice shifted from conceptual understanding toward format familiarity, test-taking strategies, and practised item types. Students became better at answering multiple-choice questions about the specific content and in the specific format of the accountability test without necessarily developing deeper command of the subject.

Exclusion of low-performing students. Schools in some states found mechanisms to remove low-performing students from tested populations: special education classifications increased in accountability-sensitive categories, chronic-absence patterns shifted, and administrative transfers spiked around testing windows.

Outright cheating. The most dramatic response occurred in Atlanta, Georgia, where an independent investigation commissioned by the Governor found that 178 teachers and principals across 44 schools had participated in systematic erasure and correction of student answer sheets. The cheating had been occurring for years. Administrators who raised concerns were discouraged or threatened. Seventeen educators were indicted under the Georgia RICO statute.

Test scores in many states improved substantially. Scores on the National Assessment of Educational Progress — a separate assessment not subject to the same accountability stakes — showed more modest and inconsistent improvement. The divergence between the two trend lines is consistent with Campbell's Law: the high-stakes indicator improved while the lower-stakes, harder-to-game indicator showed weaker effects.

"We tend to get what we measure. What we measure is not always what matters." — Daniel Yankelovich, Coming to Public Judgment (1991)

Policing: Crime Statistics

The New York CompStat system, introduced in 1994 by Commissioner William Bratton, made precinct-level crime statistics the primary accountability mechanism for precinct commanders. Commanders whose statistics showed improvement were promoted; those whose statistics didn't improve were replaced. The system was credited with contributing to New York's dramatic crime decline in the 1990s and was replicated by police departments worldwide.

It also produced systematic distortion that was documented across multiple investigations:

The NYPD's own internal audits, the city's Inspector General, and academic researchers studying the department documented a consistent pattern of downgrading — reclassifying crimes to lower categories to improve statistics. Robberies became "lost property." Burglaries became "criminal mischief." Rapes were classified as unfounded. The intensity of downgrading pressure increased as statistical accountability became more central to promotion decisions.

Adrian Schoolcraft, an NYPD officer in Brooklyn, secretly recorded roll-call meetings in which supervisors explicitly instructed officers to downgrade crime reports and to discourage victims from filing. When he reported his concerns internally, he was forcibly removed from his apartment and involuntarily committed to a psychiatric facility. A subsequent lawsuit resulted in a settlement of approximately $600,000.

Similar patterns have been documented in London's Metropolitan Police, in the management of hospital waiting times in the NHS, and in Vietnamese War era military body counts, which David Halberstam described as "the statistical strategy": the appearance of progress produced by optimizing the metric that headquarters was measuring rather than the military reality those metrics were designed to represent.

Finance: The 2008 Mortgage Crisis

The most catastrophic financial crisis of the twenty-first century illustrates Campbell's Law operating at systemic scale. Mortgage-backed securities received ratings from credit agencies whose revenue depended on the issuers they were rating. AAA ratings became the gating criterion for institutional investment, creating intense pressure to achieve those ratings regardless of underlying credit quality.

Issuers responded by restructuring instruments specifically to achieve target ratings. The rating became the target; actual credit quality became secondary. As long as home prices continued rising, the divergence between ratings and underlying risk was concealed. When mortgage defaults accelerated, the scale of the distortion became apparent simultaneously. The measurement system had been corrupted so thoroughly that it provided essentially no signal about the actual risk it was designed to measure.

Healthcare: NHS Waiting Targets

The British National Health Service introduced a four-hour waiting time target for accident and emergency departments in the early 2000s: no patient should wait more than four hours before being treated or admitted. The target was designed to reduce genuine patient harm from excessive delays.

Gwyn Bevan and Christopher Hood at the London School of Economics, studying the NHS target regime for a 2006 paper in Public Administration, documented the behavioral responses in detail:

Hospitals created intermediate holding areas — "clinical decision units" — that were not classified as A&E, so that patients could be moved out of the measured waiting clock without receiving substantive care. Ambulances circled outside hospitals, delaying the moment of hospital arrival and therefore the start of the clock. Patients were discharged at three hours and fifty-five minutes and readmitted when their conditions deteriorated. The measured performance improved. Independent assessments of actual care quality showed more ambiguous results.


The Corruption of Underlying Processes

Campbell's most important observation is not simply that metrics degrade as targets. It is that the underlying social processes those metrics were designed to monitor are themselves distorted and corrupted by the measurement pressure.

This distinction matters. If only the metric degrades, the damage is limited: we have an unreliable measure, but the reality we care about continues undisturbed. If the underlying process is corrupted, the reality itself changes in response to measurement.

In education, high-stakes testing does not only produce inaccurate test scores. It changes what happens in classrooms — the content of instruction, the relationship between teacher and student, the skills students develop, the professional identity of teachers. The educational process is restructured around the assessment format. A generation of students may develop genuine expertise in a narrow band of tested content while remaining less developed in areas that were instructionally sacrificed. The metric has reshaped the process, not just the number.

In policing, systematic downgrading does not only produce inaccurate crime statistics. It changes how officers interact with communities, how victims experience reporting crime, and whether communities develop the trust in law enforcement that enables effective crime control. The social process of public safety is damaged.

This is why reversing Campbell's Law effects is so difficult. Once institutional processes have been reshaped around a metric, removing or replacing the metric does not automatically restore the original process. Organizations have built skills, incentive structures, professional cultures, and administrative systems around managing the metric. These persist even after the metric changes.


Why Awareness Is Insufficient

A natural response to learning about Campbell's Law is to assume that awareness provides protection. If administrators know that metrics can be gamed, they will be skeptical of good numbers and look for signs of corruption. The evidence suggests this protection is weaker than it appears.

Competitive dynamics undermine individual awareness. When multiple actors compete on the same metric — schools competing on test scores, hospitals competing on quality rankings, research departments competing on grant income — any single actor who reduces gaming faces competitive disadvantage relative to those who continue. The rational individual response to knowing that everyone is gaming is often to game more carefully, not to stop.

Institutional momentum works against reversal. Organizations develop entire administrative structures around metric management — data systems, training programs, reporting processes, specialist roles. These structures create constituencies with interests in the current measurement regime. Challenging the validity of the metric challenges the infrastructure and the careers built around it.

The illusion of objectivity makes quantitative metrics resistant to criticism. A school with improving test scores appears, by the numbers, to be improving. Arguing that the numbers are misleading requires demonstrating that the scores reflect gaming rather than learning — a substantially harder argumentative task than pointing to the scores themselves. This asymmetry protects metric gaming from challenge.

"Not everything that counts can be counted, and not everything that can be counted counts." — Attributed to William Bruce Cameron, Informal Sociology (1963); often misattributed to Albert Einstein


Mitigation Strategies

Campbell himself was pessimistic about systematic solutions, arguing that political incentives to use simple quantitative indicators for high-stakes decisions are persistent. But research and organizational experience suggest several approaches that reduce the effects:

Multiple independent indicators. Using several metrics that cannot all be simultaneously optimized makes pure gaming more difficult. Each indicator serves as a partial check on the others. The key requirement is genuine diversity — metrics that measure different aspects of the underlying goal from genuinely different angles. Highly correlated metrics effectively constitute a single metric and can be gamed together.

Metric rotation. Changing the specific metrics used over time imposes costs on gaming that make it less worthwhile relative to actually improving underlying performance. Organizations can prepare deeply for stable metrics; frequent rotation prevents that level of optimization.

Low-stakes measurement. Measures that do not directly drive rewards and punishments are less subject to gaming pressure. Using indicators primarily for internal learning — understanding what is and isn't working — rather than external accountability preserves measurement validity at the cost of reduced external pressure for performance.

Separation of assessment and accountability. Campbell specifically distinguished between measurement for understanding (learning whether an intervention works) and measurement for accountability (rewarding and punishing actors). The same indicator serves both functions poorly. Keeping them separate — with different measurement systems for each purpose — preserves the validity of at least the measurement-for-understanding system.

Qualitative checks alongside quantitative indicators. Periodic expert reviews, direct observation, interviews, and narrative assessment provide information that is harder to game because it is harder to standardize. Qualitative evidence of process deterioration can alert evaluators that quantitative metrics are diverging from the underlying reality.

External, independent measurement. When those being evaluated control how their performance is measured and reported, gaming opportunities multiply. External auditors, regulatory inspections, independent researchers, and separation of measurement from reporting chains make it harder to conceal the gap between metric and reality.


The Experimenting Society

Campbell's constructive proposal was what he called "the experimenting society" — an institutional culture characterized by genuine commitment to learning from evidence, rigorous evaluation using multiple methods, willingness to abandon programs that evidence shows are not working, and humility about the limits of any single indicator.

He believed this was rare because the institutional incentives run in the opposite direction: leaders who commission evaluations of their programs have interests in good results; administrations that have publicly committed to a program resist evidence of failure; measurement systems develop constituencies that defend them. The experimenting society requires institutions to value learning over self-justification — a demand that conflicts with basic dynamics of organizational survival and political accountability.

Campbell spent his career articulating what genuine program evaluation would require and why institutions reliably fall short of it. His law describes the mechanism by which well-intentioned measurement regimes degrade. His experimenting society describes what would have to be different for them not to.

For related concepts, see Goodhart's Law explained, first-order vs second-order effects, and how metrics influence behavior.


References

Frequently Asked Questions

What is Campbell's Law?

Campbell's Law states that the more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor. Proposed by sociologist Donald T. Campbell in 1976.

How is Campbell's Law different from Goodhart's Law?

Both describe how measures degrade when they become targets, but Campbell's Law specifically addresses social indicators used in high-stakes policy decisions and emphasizes corruption of the underlying social process. Goodhart's Law is narrower, originating in economics: 'when a measure becomes a target, it ceases to be a good measure.' Campbell's formulation is broader and more explicitly political.

What is the classic example of Campbell's Law?

Standardized testing in education. When test scores are used to evaluate teachers, fund schools, or determine student promotion, schools shift resources toward test preparation at the expense of broader learning. The test score rises while actual educational outcomes may stagnate or decline.

Why does measurement corruption happen?

Agents under pressure to perform on a specific metric have strong incentives to optimize that metric directly rather than the underlying outcome it was meant to represent. This is rational behavior given the incentive structure, not necessarily dishonesty, though gaming can escalate into outright fraud.

Can Campbell's Law be avoided?

Not fully, but its effects can be reduced by rotating metrics regularly, using multiple independent measures, separating evaluators from those being evaluated, and keeping measures low-stakes until their validity is well-established.

Does Campbell's Law apply outside government and education?

Yes. It applies wherever quantitative metrics drive high-stakes decisions: sales quotas, hospital readmission rates, software engineering velocity metrics, academic citation counts, social media engagement scores, and police arrest quotas all show characteristic patterns of Campbell's Law in action.

What did Donald Campbell himself say was the solution?

Campbell argued that evaluating social programs requires what he called 'the experimenting society' — a commitment to rigorous randomized evaluation, willingness to abandon programs that don't work, and humility about the limits of any single indicator. He was pessimistic about whether institutions would actually adopt this.

How does Campbell's Law relate to fraud?

Campbell did not require fraud — ordinary optimization by rational actors under misaligned incentives produces the phenomenon. But when pressures are high enough and monitoring weak enough, gaming escalates into outright manipulation: falsifying test scores, cherry-picking hospital patients, or misreporting crime statistics.