A school district wants to improve student reading. It introduces a reading incentive program: students who read a certain number of books earn prizes. Book checkouts from the school library surge. On the surface, the program looks like a success.
But follow-up research finds something troubling. Students who participated in the reward program are reading fewer books on their own after the program ends than students who were never enrolled in it. And when they do read, they choose shorter, simpler books — books that can be finished quickly, not books they find genuinely interesting.
The incentive designed to promote reading has reduced the desire to read.
This pattern — an intervention that achieves its measured target while undermining its actual purpose — is the cobra effect applied to education. It occurs throughout educational systems, from early childhood through higher education, and understanding where and why it happens is essential for anyone trying to improve how learning works.
The Cobra Effect: A Quick Primer
The cobra effect is named for a story from colonial India: the British administration, alarmed by venomous snakes in Delhi, offered a bounty for dead cobras. Enterprising locals began breeding cobras to collect the bounty. When the administration discovered the scheme and cancelled the reward, breeders released their now-worthless snakes, leaving Delhi worse off than before.
The pattern: a metric is chosen to represent an important goal. Incentives are attached to the metric. People optimize for the metric. The metric and the goal diverge. The system is worse off than if no incentive had been introduced.
The economist Charles Goodhart formalized a version of this principle in 1975. Goodhart's Law states: "When a measure becomes a target, it ceases to be a good measure." Originally developed in the context of monetary policy, the principle applies with equal force to education.
The cobra effect and Goodhart's Law identify the same structural problem from slightly different angles. Goodhart's Law focuses on the epistemic degradation of the measure — it stops being informative once it becomes a target. The cobra effect focuses on the behavioral response — people find ways to hit the metric that do not serve the underlying goal, and sometimes actively undermine it. Education systems have created multiple versions of this dynamic.
Teaching to the Test
No educational cobra effect is more extensively documented than teaching to the test — the pattern in which high-stakes accountability for standardized test scores produces higher test scores without producing more learning.
The mechanism is straightforward. When schools face consequences — funding changes, public ranking, potential closure — based on standardized test results, every rational educational decision is pulled toward maximizing those scores. This affects:
Time allocation: Subjects not included in tested assessments lose instructional time. Art, music, physical education, history, and social studies have all documented significant reductions in school time since the introduction of high-stakes testing regimes in the US following the No Child Left Behind Act of 2002.
Instructional approach: Deep exploration of topics that develops genuine understanding gives way to efficient coverage of likely test content. Problem-solving practice is replaced by test-format practice.
Student selection and inclusion: In some accountability systems, schools face incentives to exclude low-performing students from testing through special education designations, administrative transfers, or simply discouraging their enrollment.
Teaching to items: In the most extreme cases, teachers have been provided with advance knowledge of test questions — either through leaked materials or through the practice of using previous years' tests as the entire curriculum.
A 2012 longitudinal analysis by Jennings and Bearak, published in Educational Researcher, examined how much instructional time had shifted toward tested subjects in a large urban district following No Child Left Behind. They found that social studies instruction had declined by an average of 75 minutes per week — approximately 45 hours per year — in elementary schools, while math and reading instruction increased. The exchange was efficient from an accountability standpoint and potentially damaging from the standpoint of civic education.
What the Research Found
The National Academy of Sciences convened a committee in 2011 to review over a decade of evidence from the No Child Left Behind era. Their report, Incentives and Test-Based Accountability in Public Education, reached a sobering conclusion: despite billions of dollars spent and enormous effort directed at test-based accountability, evidence of genuine improvement in student learning was weak and inconsistent.
Scores on state tests rose significantly. Scores on independent assessments — particularly the National Assessment of Educational Progress (NAEP), which could not be directly taught to — rose far less. The gap between state test score improvements and NAEP improvements is a signature of score inflation through teaching to the test.
| Assessment | Score Trend Post-NCLB |
|---|---|
| State standardized tests | Large, rapid improvements |
| NAEP (independent national assessment) | Modest, slower improvements |
| PISA international assessments | US performance stagnant or declining |
| SAT college readiness measures | Mixed, with some concerns about declining verbal skills |
| International Baccalaureate completion | Increasing, suggesting some schools shifting to depth-focused models |
Similar patterns have been documented in England following the introduction of league tables and test-based accountability in the 1990s, in Australia following national testing reform, and in various Asian education systems with high-stakes examination traditions.
A particularly striking finding came from economist Thomas Kane and colleagues' analysis of New York City test score gains in the mid-2000s. While state test scores rose sharply — representing, by some interpretations, years of accelerated learning — independent external assessments showed far smaller gains. Kane estimated that approximately two-thirds of the apparent gain in state test scores reflected test preparation rather than genuine skill improvement (Kane and Staiger, 2002).
"When a measure becomes a target, it ceases to be a good measure. In education, this is not a theoretical concern — it is documented, replicated, and still largely unaddressed." — Goodhart's Law applied to schooling
The Atlanta Cheating Scandal and Its Implications
The ultimate outcome of teaching-to-the-test pressures is outright score manipulation. The Atlanta Public Schools cheating scandal of 2009-2011, in which investigators found that teachers and administrators in 44 of 56 schools had erased and corrected answers on standardized tests, illustrates the endpoint of the accountability-optimization dynamic.
The scandal revealed not a few rogue actors but a systemic response to accountability pressure. Prosecutors and investigators found that the pressure to show score gains was so intense that educators at multiple levels of the system participated in or facilitated cheating. Atlanta was not unique — subsequent investigations in Philadelphia, Washington D.C., and other districts found similar patterns.
The Atlanta case is instructive not because it represents normal teacher behavior, but because it shows the direction of pressure that high-stakes accountability creates. Most teachers respond to that pressure by shifting instruction toward tested content, not by falsifying scores. But both responses are optimizing for the metric rather than the underlying goal.
Grade Inflation: The Credential That Means Less
Grade inflation — the long-term upward drift in the grades awarded by educational institutions — is another cobra effect, operating more slowly but with substantial consequences.
The mechanism: grades are intended to communicate something about student achievement. When grades are used for consequential purposes — college admissions, scholarship selection, job screening — there are strong incentives on both sides of the grading relationship to inflate them.
Students and families push for higher grades because the stakes are high. Faculty and institutions respond because grade complaints are unpleasant, grading leniently improves teaching evaluations, and maintaining rigorous standards puts graduates at a disadvantage relative to graduates from less rigorous institutions.
The result: across American colleges and universities, the average GPA has risen from approximately 2.5 in the 1950s to approximately 3.1-3.2 today, with elite institutions showing even more pronounced inflation. An A is now the most commonly awarded grade at most American colleges.
Stuart Rojstaczer and Christopher Healy, who compiled the most comprehensive dataset on grade inflation in higher education, found in a 2012 study published in Teacher's College Record that As accounted for approximately 43 percent of all grades at American four-year colleges and universities — up from 15 percent in 1940. The upward trend was consistent across public and private institutions, though faster at private colleges.
What does this mean for the signal value of grades? Precisely what you would expect: as everyone earns higher grades, the information content of any particular grade decreases. Employers and graduate schools respond by placing less weight on grades and more weight on other credentials — which then face their own inflation pressure.
The Institutional Trap
Grade inflation illustrates a key feature of educational cobra effects: they often emerge from individually rational behavior that produces collectively irrational outcomes.
A single professor maintaining rigorous standards while peers inflate grades is not rewarding their students' efforts — they are penalizing them in a competition where the currency has been devalued everywhere else. The rational response for a student-welfare-conscious professor is to match the inflation, even if it reduces the signal quality of their own grades.
This structure — where individual rationality produces collective irrationality — is sometimes called a coordination failure. Like a crowd where everyone stands to see better (and everyone ends up with a worse view than if everyone had remained seated), grade inflation is a collective action problem: individual actors have incentives that produce outcomes nobody prefers.
Breaking out of this trap requires coordinated action at the institutional level — something that is politically difficult and rarely sustained. MIT and a handful of other institutions have experimented with grade distribution requirements and transcript policies that contextualize grades against class distributions, but these remain exceptions.
International Comparisons
Grade inflation is not equally severe across national educational systems. Countries with external examination systems — where final grades are set by bodies independent of the teaching institution — show less grade inflation because the optimization pressure falls on independent exam boards rather than individual teachers. The UK's A-level system, the International Baccalaureate, and national examinations in France and Germany all show more stable grade distributions over time than the US higher education system, though they face their own forms of test-preparation distortion.
The comparison suggests that the structural solution to grade inflation involves separating the assessment function from the teaching function — allowing teachers to grade without having their students' grades affect their own evaluations or their institutions' reputations.
Performance Pay for Teachers: The Experiment That Failed
One of the most extensively tested incentive interventions in education is performance pay for teachers — financial bonuses linked to student test score improvements.
The policy intuition seems reasonable: doctors, lawyers, and salespeople face financial incentives linked to performance. Why shouldn't teachers? If higher pay can attract and retain better teachers and motivate greater effort, student outcomes should improve.
The evidence has been strikingly clear — and strikingly negative.
The Nashville POINT Study
The Program on Incentives in Teaching (POINT) study in Nashville, Tennessee was one of the most rigorous tests of teacher performance pay ever conducted. Vanderbilt University researchers ran a randomized controlled trial from 2007 to 2009, offering middle school math teachers bonuses of up to $15,000 (roughly a third of a typical teacher's annual salary) for demonstrated improvement in student test scores.
The result: no significant difference in student math achievement between the treatment and control groups. The bonuses did not improve outcomes, even in the subject where the incentive was most direct and the measurement most straightforward.
Similar results emerged from major studies in New York City, Israel, and India. A comprehensive review by economists at the London School of Economics found that the evidence across multiple countries and designs does not support the conclusion that teacher performance pay improves student outcomes.
A 2016 meta-analysis by Podgursky and Springer examining 27 studies of teacher performance pay found an average effect size near zero across studies, with the better-designed studies (randomized controlled trials) consistently showing smaller effects than the observational studies — suggesting that positive findings in earlier research were largely artifacts of selection effects rather than genuine incentive effects.
Why Performance Pay Fails
Several mechanisms explain why:
Teaching is already intrinsically motivated: Research on teacher motivation consistently finds that effective teachers are primarily motivated by student outcomes and professional pride, not financial reward. Adding extrinsic financial incentives does not increase motivation; it can actually displace intrinsic motivation (the "crowding out" effect, documented by Deci, Koestner, and Ryan in a 1999 meta-analysis of 128 studies).
The measurement problem: Student test score gains are influenced by many factors outside the teacher's control — student home environment, previous teachers, peer effects, socioeconomic factors. Using them as a measure of individual teacher performance is noisy and unreliable, making the incentive feel arbitrary to teachers. Research by Rothstein (2010) found that a substantial portion of apparent teacher value-added effects measured over one year were attributable to student sorting and prior achievement, not teacher effectiveness.
Collaboration vs. competition: High-performing schools are characterized by strong teacher collaboration — sharing approaches, co-planning lessons, providing peer feedback. Performance pay introduces competition for a limited bonus pool, reducing the cooperative behavior that actually drives school quality.
Teaching to the test compounds: When performance pay is linked to test scores, the teaching-to-the-test pressures described above intensify further, creating a double cobra effect.
What Does Work for Teacher Quality
The evidence that performance pay fails does not mean teacher quality is unimportant — the research consistently shows it is one of the most important school-level predictors of student outcomes. The question is what actually improves it.
Eric Hanushek's long-running research program on teacher effectiveness has found that the difference between a highly effective teacher and a poorly effective teacher is equivalent to approximately one year of additional schooling — one of the largest effect sizes in education research. The challenge is that this effectiveness is not well predicted by credentials or experience alone, which is what standard compensation schedules reward.
The interventions with the strongest evidence for improving teaching effectiveness are largely professional rather than financial: instructional coaching that provides ongoing feedback on classroom practice, professional learning communities that enable teachers to learn from each other, and structured curriculum materials that reduce the cognitive burden of lesson planning and allow teachers to focus on execution. These interventions improve what teachers know how to do — not just how hard they try to do it.
Credential Inflation: Education as Signaling
A longer-running cobra effect operates at the societal level through credential inflation — the progressive escalation of educational credentials required for jobs without corresponding escalation in the actual skills those jobs require.
Economist Bryan Caplan's controversial book The Case Against Education (2018) argues that a substantial portion of the returns to education in modern economies are not human capital returns (education made workers more productive) but signaling returns (education signals pre-existing traits that employers value — intelligence, conscientiousness, conformity).
If Caplan is right — and the evidence, while contested, is significant — then much of what education systems do is not build skills but generate costly signals in an escalating credential competition. As more people attain bachelor's degrees, the signal value of the degree declines. Employers raise the threshold — now a master's degree is needed. People pursue master's degrees. The signal value declines again.
This creates a positional arms race: education that serves the individual competing for position but does not serve society's interest in actual skill development.
Degree inflation — the requirement for college degrees in jobs that previously did not require them — has been documented by Harvard Business School researchers Joseph Fuller and Manjari Raman in a 2017 report, "Dismissed by Degrees." They found that in a significant fraction of job postings requiring college degrees, the degree requirement had been added in recent years without any change in the actual work content of the job. Among jobs requiring degrees, over 60 percent of hiring managers acknowledged that non-degree holders in the same role performed just as well as degree holders.
The data is consistent with a signaling interpretation in several labor markets:
- Jobs that once required a high school diploma now require a college degree
- Jobs that once required a college degree now require a master's
- Evidence that degree requirements in many fields exceed the actual skill requirements of the work
- Studies showing that employers often cannot articulate what specific skills the degree requirement is meant to screen for
The costs of this signaling race are not trivial. Rising credential requirements delay workforce entry, saddle young people with debt for credentials that may not reflect genuine skill development, and exclude capable people who cannot afford the time or cost of escalating credentials.
A 2019 analysis by the Federal Reserve Bank of New York found that approximately 40 percent of recent college graduates were working in jobs that did not require a college degree — a figure that has remained persistently elevated since 2000. This underemployment rate is consistent with a system in which credential escalation outpaces genuine skill demand.
The Human Capital vs. Signaling Debate
The signaling interpretation of education is not uncontested. Most economists accept a mixed model: education has some genuine human capital effects (it does make people more productive in some ways) combined with significant signaling effects. The question is the proportion.
Research by economists Enrico Moretti (2004) and others on human capital externalities — the spillover benefits of education to communities and economies — suggests that education does produce genuine productivity benefits beyond what signaling alone would predict. But these studies tend to examine average effects across entire economies, which is consistent with some education being genuinely productive while much of the marginal credential escalation is purely signaling.
The practical implication is the same regardless of how the debate resolves: the current system spends enormous resources on credential production that may serve competitive positioning more than genuine learning.
What Intrinsic Motivation Research Shows
The deepest insight into education's incentive problems comes from decades of research on intrinsic motivation — the drive to engage in activities for their own sake rather than for external rewards.
The foundational work was done by Edward Deci and Richard Ryan, whose Self-Determination Theory proposes that human beings have innate psychological needs for autonomy (feeling that one's behavior is self-chosen), competence (feeling effective at meaningful challenges), and relatedness (feeling connected to others). When these needs are met, intrinsic motivation is supported. When they are frustrated, intrinsic motivation declines and extrinsic motivation substitutes.
The Overjustification Effect
The most robust finding in this literature is the overjustification effect: when you add external rewards to an activity people are already intrinsically motivated to do, their intrinsic motivation for that activity declines.
The classic study (Lepper, Greene, and Nisbett, 1973): children who enjoyed drawing were divided into three groups. One group was promised a reward for drawing and received it. One group was not promised a reward but unexpectedly received one. One group received no reward. Two weeks later, children who had been promised and given a reward spent significantly less time drawing than children who had received no reward or an unexpected reward.
The reward had converted an enjoyable activity into work — something done for payment rather than pleasure.
This effect has been replicated hundreds of times across multiple domains, including reading, puzzle-solving, and academic learning. Deci, Koestner, and Ryan's 1999 meta-analysis of 128 studies confirmed that tangible, expected, contingent rewards reliably undermine intrinsic motivation. The effect is particularly strong when the reward is controlling in character — when it is explicitly contingent on performance — rather than informational.
Its implications for education are serious:
- Reward-based reading programs reduce reading motivation in the long run
- Grade-focused learning environments reduce curiosity and learning for its own sake
- Competitive academic rankings reduce collaborative learning and increase performance anxiety without improving deep understanding
A 2000 study by Gottfried and colleagues followed children from early childhood through adolescence and found that those who attended schools with more autonomous, less controlling learning environments maintained significantly higher intrinsic academic motivation through high school than those in more controlling environments, even after controlling for initial motivation levels and family background.
What Supports Intrinsic Learning Motivation
Research consistently identifies the following conditions as supporting genuine learning engagement:
Autonomy in approach: Letting students choose how to approach a learning task — even within a defined topic — increases engagement and learning quality. A study by Patall, Cooper, and Robinson (2008) meta-analyzed 41 experimental studies and found that providing choice significantly increased intrinsic motivation, effort, and task performance.
Optimal challenge: Tasks that are neither too easy (boring) nor too hard (overwhelming) — what Mihaly Csikszentmihalyi called the flow zone — produce the highest engagement and intrinsic motivation. Csikszentmihalyi's research on flow, based on experience sampling studies with thousands of participants, found that optimal challenge is among the most consistently reported conditions for deep engagement.
Competence feedback: Informational feedback about what was learned and what could be improved supports intrinsic motivation. Evaluative feedback (grades, comparisons to peers) is more likely to reduce it. Research by Deci and colleagues found that even positive feedback reduces intrinsic motivation when it is perceived as controlling rather than informational.
Purpose and connection: Learning that is connected to students' genuine interests, real-world applications, and meaningful questions they actually have motivates more deeply than learning disconnected from personal relevance.
Autonomy-supportive teaching: Research comparing "controlling" and "autonomy-supportive" teachers finds that students of autonomy-supportive teachers show higher intrinsic motivation, better conceptual learning, and lower anxiety — even when controlling for student ability. A longitudinal study by Vansteenkiste, Sierens, Soenens, Luyckx, and Lens (2009) followed students over a school year and found that autonomy-supportive teaching predicted not only motivation but also academic achievement and psychological well-being.
The Interaction Between Reward Systems and Learning Quality
A subtle but important dimension of educational incentive research is the effect of reward systems not just on motivation quantity (how much effort students invest) but on learning quality (how deeply and effectively they process material).
Research on learning goals versus performance goals by Dweck and colleagues found that students oriented toward performance goals — doing well relative to others, earning high grades, appearing competent — process material more superficially and show less creative, transfer-appropriate thinking than students oriented toward learning goals — understanding the material, developing skills, mastering challenges.
The concern is that grade-focused incentive systems systematically cultivate performance goals over learning goals. Students who learn primarily to earn grades ask "What will be on the test?" Students who learn primarily to understand ask "How does this connect to what I already know?" The former orientation produces higher grades in many educational contexts; the latter produces deeper understanding and better transfer to real-world applications.
Benware and Deci (1984) demonstrated this experimentally: students who were told they would need to teach material to others (a learning-goal frame) showed significantly better conceptual understanding on subsequent tests than students told they would be tested on the same material (a performance-goal frame). The content, time, and effort invested were identical; the orientation to the learning task produced different outcomes.
Fixing the Incentives: What Actually Works
Given what the evidence shows, what does actually work in educational incentive design?
Formative over summative assessment: Assessments used primarily to give students and teachers information about learning progress (formative) rather than to rank, grade, or sanction (summative) support learning without producing teaching-to-the-test dynamics.
Research by Black and Wiliam (1998), synthesized in their influential paper "Inside the Black Box," found that improving formative assessment practices in classrooms produced effect sizes of 0.4 to 0.7 standard deviations — equivalent to moving an average student from the 50th to roughly the 65th to 75th percentile. This is among the largest effect sizes recorded for low-cost educational interventions. The key features of effective formative assessment: it provides specific information about what to improve (not just a grade), it occurs during learning rather than at the end, and it is used by both teachers and students to adjust their next actions.
School-level rather than teacher-level accountability: When schools are held accountable collectively, the competition effects that reduce collaboration are avoided while incentives for system-level improvement are preserved.
Professional development over performance pay: Investment in teachers' skills, knowledge, and working conditions shows substantially better evidence for improving outcomes than financial incentives. The most important determinant of teacher effectiveness is expertise, which grows through professional support and learning communities. A 2017 review by Darling-Hammond and colleagues found that sustained professional development programs (typically 40+ hours over a school year) produced effect sizes averaging 0.54 on student achievement, substantially larger than any financial incentive study has found.
Intrinsic motivation design: Curriculum and classroom environments designed around the conditions that support intrinsic motivation — autonomy, challenge, feedback, relevance — produce more durable learning without the backfire effects of external reward systems.
Credential reform: Experiments with competency-based credentials, skill certifications, and portfolio assessment represent alternatives to degree credential inflation, though their adoption faces significant institutional and cultural resistance. The success of professional certifications in fields like information technology and project management — where specific skill verification has replaced degree requirements for many roles — suggests that alternatives are viable when there is sufficient external demand pressure.
Reduced reliance on a single metric: Accountability systems that use multiple measures — attendance, teacher quality indicators, graduation rates, college persistence, and student growth measures alongside test scores — create less concentrated optimization pressure than those relying primarily on a single test score outcome.
Why Incentive Problems Persist Despite Evidence
If the evidence is clear that many educational incentive systems produce cobra effects, why do they persist?
Several political and institutional forces explain the persistence:
Visibility asymmetry: Test score improvements are visible and measurable; the long-term decline in intrinsic motivation and deeper learning are not. Politicians and administrators respond to visible metrics.
Blame assignment: When students and schools perform poorly, there is political demand to identify and hold someone accountable. Incentive systems satisfy this demand even when they do not improve outcomes.
Alternative difficulty: Formative assessment requires professional judgment that is harder to standardize than test scores. Professional development is expensive and its effects are diffuse. The alternatives to simple incentive systems are genuinely harder to implement at scale.
Temporal mismatch: The negative effects of incentive misalignment often appear over years or decades, while political incentives operate on electoral cycles. Decision-makers have weak incentives to solve problems whose costs will be borne after their tenure.
The cobra effect in education is not inevitable. But addressing it requires resisting the intuition that more accountability and stronger incentives always produce better results. Sometimes they do. Often, they produce higher numbers on the metrics we are measuring and less of the thing we actually care about.
Naming that dynamic is the first step toward designing educational systems that genuinely serve learning rather than the appearance of it.
"The most important problem is not teaching the wrong things. It is rewarding the wrong behaviors in the process of teaching the right things, and thereby replacing the desire to know with the desire to score." — synthesized from Alfie Kohn, Punished by Rewards (1993)
Frequently Asked Questions
What is the cobra effect in education?
The cobra effect in education occurs when an educational incentive achieves its measured target while destroying the underlying goal. The classic example is standardized testing: when schools are rewarded or punished based on test scores, teachers teach to the test — producing higher scores without producing more learning. The measure becomes the goal, and the actual goal (learning and development) is crowded out.
What does research say about teaching to the test?
Research consistently shows that heavy test-based accountability narrows the curriculum (reducing time for subjects not tested), increases test preparation at the expense of deeper instruction, produces score inflation that does not transfer to other measures of learning, and increases student anxiety without improving genuine competency. A 2011 National Academy of Sciences report found little evidence that test-based accountability had improved educational outcomes.
Does performance pay for teachers improve student outcomes?
The evidence on teacher performance pay is largely negative. Major randomized controlled trials, including the POINT study in Nashville (Vanderbilt University, 2010) and studies in India and Israel, found that performance bonuses for teachers did not reliably improve student test scores. In some cases, performance pay disrupted teacher collaboration and reduced intrinsic motivation. The research suggests teaching quality is more responsive to professional support and working conditions than to financial incentives.
What is credential inflation and how does it affect education?
Credential inflation occurs when educational credentials that were previously sufficient for certain roles are replaced by higher credentials, with no corresponding increase in the actual skills required for the job. When employers use degrees as screening tools without requiring degree-specific skills, education becomes a costly signaling mechanism rather than a skills-building process. This drives credential escalation — more years of education required for the same work — without commensurate gains in productivity.
What does intrinsic motivation research say about learning incentives?
Decades of research, beginning with Edward Deci and Richard Ryan's Self-Determination Theory, shows that external rewards for intrinsically interesting activities consistently reduce long-term engagement and learning quality. When students are rewarded for reading, curiosity about reading itself declines. Conditions that support intrinsic motivation — autonomy, competence, and connection to purpose — produce more durable learning, greater creativity, and better transfer to novel problems.