In the summer of 1998, Anthony Greenwald, Debbie McGhee, and Jordan Schwartz published a paper in the Journal of Personality and Social Psychology that would become one of the most cited — and most contested — articles in the history of social psychology. The paper, "Measuring Individual Differences in Implicit Cognition: The Implicit Association Test," introduced a behavioral measure built on a deceptively simple insight: the time it takes to categorize two concepts together reveals how strongly those concepts are associated in memory. Participants sat before a computer and sorted words into categories using two keys. When the pairing was intuitive — flowers with pleasant words, insects with unpleasant words — responses came fast. When the pairing ran against deeply embedded associations — flowers with unpleasant words, insects with pleasant words — responses slowed. That latency gap, the authors argued, was not an artifact of motor awkwardness. It was a window into implicit attitudes: evaluations that operate below conscious control, outside introspective reach, yet active enough to influence behavior.
The immediate reaction in the field was not applause so much as alarm mixed with fascination. Greenwald and Banaji had been arguing since their 1995 Psychological Review paper that implicit cognition was a serious construct — that people harbor attitudes they cannot or will not report, and that these attitudes matter for behavior. But explicit attitude measures had dominated social psychology for decades, and the profession was not prepared for a tool that claimed to circumvent self-report entirely. Critics asked the obvious question almost immediately: what, exactly, does the IAT measure? Does a slower response to the flower+unpleasant pairing indicate a genuine negative attitude toward flowers, or does it merely reflect the statistical structure of word associations in the language, the familiarity of concepts, the relative arousal levels of stimuli? The controversy that erupted in the years following that 1998 publication has never fully resolved, and the IAT remains both the most widely used implicit measure in psychology and the most vigorously disputed.
Project Implicit, the research initiative Greenwald and Banaji launched at Harvard in 1998 and later expanded to Yale and the University of Washington, gathered data at a scale that dwarfed anything previously possible in experimental psychology. Millions of participants took IATs online — on race, gender, age, sexuality, weight, disability, and dozens of other social categories. The sheer volume of data was astonishing. By the mid-2000s, the project had collected IAT scores from hundreds of thousands of Americans, and patterns emerged that explicit self-reports had never reliably captured: the majority of white participants showed a pro-white implicit preference even when they explicitly endorsed racial equality; a substantial proportion of women showed implicit associations between men and science that their stated beliefs contradicted. The IAT had become a cultural phenomenon as much as a scientific instrument, appearing in diversity trainings, courtrooms, and national media. Understanding what it actually measures — and what it does not — requires going back to the cognitive science that underlies it.
Explicit vs. Implicit Measures: A Structural Comparison
The IAT was designed to address specific limitations of explicit attitude measures. The differences between the two approaches are not merely methodological; they reflect fundamentally different theories about the architecture of evaluation.
| Dimension | Explicit Measures | Implicit Measures (IAT) |
|---|---|---|
| Method | Self-report scales; participants consciously rate attitudes (e.g., "How favorable are you toward X?") | Response latency to paired categorization tasks; no conscious attitude report required |
| Response time | Unlimited; participants deliberate as long as needed | Milliseconds; speeded responses prevent deliberative correction |
| Social desirability effects | High; participants strategically manage self-presentation, especially on sensitive topics | Low; speeded task design limits opportunity for motivated responding |
| Predictive validity | Strong for deliberative behaviors (voting intentions, product choice) in low-pressure contexts | Moderate for spontaneous behaviors, nonverbal cues, snap judgments; weaker for controlled discriminatory acts |
| Test-retest reliability | Generally high (r = .80+) for stable constructs across multiple administrations | Moderate (r = .40-.60 in many studies); context and mood sensitivity reduce consistency |
| What they predict | Stated intentions, considered decisions, behaviors that allow reflection | Spontaneous reactions, physiological responses, behaviors in time-pressured or ambiguous situations |
The Cognitive Science of Implicit Cognition
The IAT did not invent the concept of implicit mental processes. It inherited a rich intellectual tradition in cognitive psychology and social cognition that had been accumulating for two decades before the 1998 paper appeared.
Spreading Activation and Associative Memory
The theoretical foundation of the IAT rests on associative network models of memory, particularly the spreading activation model developed by Allan Collins and Elizabeth Loftus in their 1975 paper in Psychological Review, "A Spreading-Activation Theory of Semantic Processing." In this framework, concepts in memory are linked by associative pathways, and activating one concept spreads activation to semantically related concepts. If "flower" is strongly associated with "pleasant" in the network, activating "flower" will prime "pleasant," making it easier and faster to categorize. If the association is weak or contradictory, categorization requires additional processing time. The IAT operationalizes this latency difference as a measure of relative association strength. The logic is clean, but the inference from latency to attitude is where the complications begin.
Fazio's Evaluative Priming Paradigm
Russell Fazio had been measuring implicit attitudes through evaluative priming since the late 1980s, and his work provides both a useful comparison and a critical alternative to the IAT. In evaluative priming (described in Fazio, Sanbonmatsu, Powell, and Kardes, 1986, Journal of Personality and Social Psychology), a prime word is briefly presented, and participants then evaluate a target word as good or bad. Priming with a congruently valenced word (e.g., a positive prime before a positive target) speeds evaluation; incongruent priming slows it. The attitude is inferred from these facilitation and interference effects.
In their 2003 review in Annual Review of Psychology, Fazio and Michael Olson drew a conceptually important distinction between evaluative priming and the IAT. Evaluative priming captures automatic evaluation — the spontaneous affective tag that fires when a stimulus is encountered. The IAT, they argued, may capture something different: not just automatic evaluation, but the relative ease of performing a categorization task under a particular mapping rule. The distinction matters because factors that affect task performance — such as the salience of category labels, participants' familiarity with the classification rules, or the arousal differential between stimulus categories — can influence IAT scores without indicating anything about underlying attitudes. The IAT's score reflects a response to a task structure; whether that structure faithfully mirrors the structure of implicit memory is an open empirical question.
Wentura and Rothermund's Counter-Associative Account
Dirk Wentura and Klaus Rothermund developed one of the most technically sophisticated critiques of the IAT's validity through a series of studies in the 2000s. Their counter-associative account, detailed in Rothermund and Wentura (2004) in the Journal of Experimental Psychology: General, proposed that IAT effects are driven not by the strength of associations between attitude objects (flowers, insects) and valence (pleasant, unpleasant), but by the structural salience of figure and ground within the categorization task. Specifically, Rothermund and Wentura argued that the IAT measures the relative distinctiveness of categories: when a concept shares the feature of being a "figure" (salient, distinctive) in its respective task block, pairing it with the other figure category produces faster responses. This account makes predictions that differ from the attitude account in cases where valence and salience are dissociated. In studies using low-valence but high-salience stimuli, they found IAT-like effects that could not be explained by attitude strength. Greenwald and colleagues disputed the generalizability of these findings, but the exchange established that the IAT's latency effects are not straightforwardly interpretable as attitude strength without ruling out competing structural explanations.
Four Case Studies in IAT Research
Case Study 1: The Race IAT and Police Shooting Decisions
Keith Payne, Aimee Hall, Cameron Carter, and Ahmad Bishara published "A Process Dissociation Approach to Investigating Implicit and Explicit Processes in Racial Shooting Decisions" in 2010 in the Journal of Experimental Social Psychology. The study addressed a crucial applied question: does implicit racial bias, as measured by the IAT, predict the decision to shoot an unarmed Black target versus an armed white target in a simulated police scenario? Using process dissociation modeling to separate automatic and controlled components of the shooting decision, Payne and colleagues found that the automatic (implicit) component of bias predicted shooting errors in ways that the explicit racial attitude measure did not. Importantly, the relationship between IAT scores and shooting decisions was moderated by cognitive load: under time pressure and when cognitive resources were taxed, implicit associations exerted more influence. The study is frequently cited as evidence for the IAT's real-world relevance, but it also illustrates a recurrent qualification: implicit bias predicts behavior most reliably when deliberative correction is impaired — a condition that applies inconsistently across real-world encounters.
Case Study 2: Nosek, Greenwald, and Banaji's Large-Scale Demographic Findings
Brian Nosek, Greenwald, and Mahzarin Banaji analyzed Project Implicit data in a 2007 paper in the Journal of Personality and Social Psychology, "Implicit-Explicit Relations." Examining over half a million IAT administrations across multiple attitude domains, they found that the correlation between implicit and explicit attitudes varied substantially by topic. For socially sensitive topics — race, gender, age — the implicit-explicit correlation was weak or moderate, ranging from r = .12 to r = .36. For less sensitive topics — preferences for fruit versus vegetables — the correlation was substantially higher. The authors interpreted this dissociation as evidence that explicit measures are distorted by social desirability on sensitive topics, while the IAT captures the underlying attitude more accurately. Critics offered the alternative interpretation: on sensitive topics, implicit and explicit measures capture genuinely different constructs, and there is no independent standard for determining which is "more accurate." The ambiguity is foundational to most subsequent debates about what IAT scores mean.
Case Study 3: Greenwald et al.'s 2009 Meta-Analysis and the Predictive Validity Question
Greenwald, Poehlman, Uhlmann, and Banaji published "Understanding and Using the Implicit Association Test: III. Meta-Analysis of Predictive Validity" in the Journal of Personality and Social Psychology in 2009. Analyzing 122 research reports, they found that the IAT was a statistically significant predictor of criterion behaviors across domains, with an average effect size of r = .274. Critically, the IAT showed incremental predictive validity beyond explicit measures for socially sensitive criteria (e.g., racial bias in hiring simulations, nonverbal behavior in interracial interactions), but not for non-sensitive criteria, where explicit measures performed comparably or better. This meta-analysis represented the field's most comprehensive empirical defense of the IAT's validity at the time. The moderate effect size (accounting for roughly 7.5% of variance in criterion behaviors) was interpreted by the authors as meaningful given the complexity of predicting specific behaviors from general implicit attitudes. Critics would later argue that 7.5% of variance is a thin basis for the applied uses to which the IAT had already been put.
Case Study 4: Oswald et al.'s Reanalysis and the Discriminatory Behavior Problem
Frederick Oswald, Gregory Mitchell, Hart Blanton, James Jaccard, and Philip Tetlock published a direct challenge to the Greenwald et al. meta-analysis in the Journal of Personality and Social Psychology in 2013: "Predicting Ethnic and Racial Discrimination: A Meta-Analysis of IAT Criterion Studies." Using a subset of the same literature but restricting to studies with behavioral discrimination criteria (rather than self-report or physiological measures), Oswald and colleagues found that the IAT predicted discriminatory behavior far less reliably than the Greenwald meta-analysis suggested — and performed no better than simple explicit attitude measures in many domains. They identified several methodological problems in the accumulated literature, including selection bias in which studies were included, publication bias toward positive results, and inconsistency in how criterion behaviors were measured and validated. Greenwald's team responded with a 2015 paper (Journal of Personality and Social Psychology) disputing the inclusion criteria and statistical approach, and the exchange produced one of the most sustained methodological debates in social psychology. The core disagreement — whether the IAT predicts real discrimination meaningfully better than asking someone directly — has not been resolved.
Intellectual Lineage: Who Influenced Whom
The IAT did not emerge in isolation. Its intellectual architecture can be traced through several converging lines of influence.
Greenwald and Banaji's 1995 Psychological Review paper, "Implicit Social Cognition: Attitudes, Self-Esteem, and Stereotypes," synthesized three decades of work on implicit memory — beginning with Daniel Schacter's distinction between explicit and implicit memory in the 1980s and the earlier work of Endel Tulving on episodic and semantic memory — and extended these concepts from memory to social evaluation. The claim that attitudes, like memories, can operate implicitly was radical at the time: social psychology had built its methodological infrastructure entirely on self-report.
Behind that synthesis was the influence of Zajonc's 1980 primacy-of-affect claim and the evaluative priming tradition Fazio had developed independently. Gordon Allport's 1954 The Nature of Prejudice provided the conceptual vocabulary of prejudice as a multi-level phenomenon that does not always correspond to consciously held beliefs. Henri Tajfel's social identity theory contributed the insight that in-group favoritism is a deep structural feature of social cognition, not a personality aberration.
The IAT also drew on the cognitive psychology of automaticity developed by John Bargh, whose work on automatic and controlled processing throughout the 1980s and 1990s — particularly his 1994 "auto-motive model" in Handbook of Mental Control — established that social stereotypes can activate without intention and without awareness. Bargh's 1996 study ("The Automaticity of Everyday Life") demonstrated that priming participants with elderly stereotypes caused them to walk more slowly in the hallway — an effect of implicit activation on motor behavior that helped establish the behavioral relevance of automatic social cognition. (That specific study has since faced replication difficulties, but the broader framework it exemplified remained influential.)
Empirical Research on Reliability, Validity, and Reduction
The empirical record on the IAT's psychometric properties is more mixed than early advocates suggested and more ambiguous than critics sometimes imply. Test-retest reliability, measured in multiple studies by Nosek and colleagues, typically falls in the range of r = .40 to r = .60 across one-week intervals, which is adequate for group-level comparisons but problematic for using IAT scores diagnostically at the individual level. Psychometric tradition generally requires reliabilities above r = .80 for clinical or applied individual assessment. An IAT score taken today may differ substantially from a score taken two weeks from now, without any intervention, because the score is sensitive to context, mood, the order in which tasks are completed, and which exemplars of each category appear in a given administration.
The question of whether implicit bias can be changed — and whether IAT scores track such change — has generated a large intervention literature. Lai et al. (2014) in the Journal of Experimental Psychology: General tested nine interventions on racial IAT scores in a large randomized experiment and found that several produced immediate score reductions. However, follow-up work (Lai et al., 2016) tested which of those interventions produced effects that persisted over days. Almost none did. Implicit attitudes, as measured by the IAT, appear highly malleable in the moment and highly resistant to durable change. The finding is important for applied contexts: diversity trainings that produce immediate IAT score improvements may be producing temporary performance effects rather than changes in underlying associative structure.
Limits, Critiques, and Nuances
The most structurally important critique of the IAT may be Blanton and Jaccard's 2006 paper in Psychological Review, "Arbitrary Metrics in Psychology." Their argument was not that the IAT measures nothing — it was that the IAT's scoring metric, the D-score, has no meaningful zero point. The D-score is calculated from the difference in response latencies between two task blocks divided by the standard deviation of all response latencies. A positive D-score is conventionally interpreted as indicating an implicit preference for concept A over concept B. But what does a D-score of zero mean? The scoring formula does not place its midpoint at "no preference" in any theoretically grounded sense. It places it at "equal response latency in both blocks," which is not the same thing. Blanton and Jaccard argued that without an established ratio scale — where zero means genuine neutrality — the IAT cannot be used to classify individuals as biased or unbiased at any particular cutoff. The widespread practice of labeling participants with positive D-scores as showing "implicit bias" is, on this analysis, methodologically indefensible. Greenwald and colleagues have disputed elements of this argument, but the metric question has never been satisfactorily answered by defenders of the IAT's applied uses.
Ulrich Schimmack's 2021 paper "The Implicit Association Test: A Method in Search of a Construct," published in Perspectives on Psychological Science, sharpened the methodological critique through a psychometric lens. Schimmack applied a measurement model approach to the IAT's validity evidence, arguing that when the IAT is treated as a measurement instrument and evaluated against standard psychometric criteria — convergent validity with other implicit measures, discriminant validity from explicit measures, criterion validity for behavioral outcomes — its performance is substantially weaker than advocates have claimed. Schimmack's analysis suggested that the IAT is better described as a reliable measure of a cognitive performance variable (response latency under a specific task structure) than as a valid measure of implicit attitudes toward social groups. The distinction matters because a measure can be reliable (stable over time within a session) while still measuring the wrong thing.
None of this means the IAT has revealed nothing. The empirical pattern that participants who report strong explicit in-group preferences also tend to show faster responses in same-valence pairings is real and replicable. The pattern that cognitive load amplifies bias effects in behavioral tasks is consistent with a role for automatic processes. The dissociation between explicit self-report and behavioral outcomes in ambiguous interracial interactions has been documented enough times to warrant attention. What remains contested is whether the IAT is measuring that construed implicit attitude, or a correlated cognitive variable, or both, and in what proportions. The IAT unlocked a research program of enormous scope and importance. Whether it measures what it is said to measure — a genuine implicit attitude, separate from explicit belief, with independent predictive power over real discriminatory behavior — remains a question that forty million data points have not definitively settled.
References
Greenwald, A. G., McGhee, D. E., & Schwartz, J. L. K. (1998). Measuring individual differences in implicit cognition: The Implicit Association Test. Journal of Personality and Social Psychology, 74(6), 1464–1480.
Greenwald, A. G., & Banaji, M. R. (1995). Implicit social cognition: Attitudes, self-esteem, and stereotypes. Psychological Review, 102(1), 4–27.
Nosek, B. A., Greenwald, A. G., & Banaji, M. R. (2007). The Implicit Association Test at age 7: A methodological and conceptual review. In J. A. Bargh (Ed.), Automatic processes in social thinking and behavior (pp. 265–292). Psychology Press.
Greenwald, A. G., Poehlman, T. A., Uhlmann, E. L., & Banaji, M. R. (2009). Understanding and using the Implicit Association Test: III. Meta-analysis of predictive validity. Journal of Personality and Social Psychology, 97(1), 17–41.
Oswald, F. L., Mitchell, G., Blanton, H., Jaccard, J., & Tetlock, P. E. (2013). Predicting ethnic and racial discrimination: A meta-analysis of IAT criterion studies. Journal of Personality and Social Psychology, 105(2), 171–192.
Greenwald, A. G., Banaji, M. R., & Nosek, B. A. (2015). Statistically small effects of the Implicit Association Test can have societally large effects. Journal of Personality and Social Psychology, 108(4), 553–561.
Blanton, H., & Jaccard, J. (2006). Arbitrary metrics in psychology. Psychological Review, 113(1), 62–72.
Fazio, R. H., & Olson, M. A. (2003). Implicit measures in social cognition research: Their meaning and use. Annual Review of Psychology, 54, 297–327.
Rothermund, K., & Wentura, D. (2004). Underlying processes in the Implicit Association Test: Dissociating salience from associations. Journal of Experimental Psychology: General, 133(2), 139–165.
Payne, B. K., Hall, D. L., Cameron, C. D., & Bishara, A. J. (2010). A process dissociation approach to investigating implicit and explicit processes in racial shooting decisions. Journal of Experimental Social Psychology, 46(6), 1049–1056.
Schimmack, U. (2021). The Implicit Association Test: A method in search of a construct. Perspectives on Psychological Science, 16(2), 396–414.
Lai, C. K., Marini, M., Lehr, S. A., Cerruti, C., Shin, J. L., Joy-Gaba, J. A., ... & Nosek, B. A. (2014). Reducing implicit racial preferences: I. A comparative investigation of 17 interventions. Journal of Experimental Psychology: General, 143(4), 1765–1785.
Frequently Asked Questions
What is the Implicit Association Test and how does it work?
The Implicit Association Test (IAT), introduced by Anthony Greenwald, Debbie McGhee, and Jordan Schwartz in their 1998 Journal of Personality and Social Psychology paper, measures the strength of automatic associations between concepts by recording response latencies in a categorization task. In the race IAT, participants sort words and images into four categories — European American faces, African American faces, pleasant words, and unpleasant words — using two response keys. In one block, European American faces share a key with pleasant words; in another, African American faces share a key with pleasant words. If a person has stronger automatic associations between 'European American' and 'pleasant' than between 'African American' and 'pleasant,' they will respond faster in the first configuration. The difference in mean response latency between the two blocks, standardized as the D-score, quantifies the strength and direction of the implicit association. Project Implicit, launched by Greenwald, Mahzarin Banaji, and Brian Nosek at Harvard, has administered the IAT to over 20 million respondents worldwide, making it the largest psychological dataset of its kind.
What does the IAT actually predict?
Anthony Greenwald, Terence Poehman, Eric Uhlmann, and Mahzarin Banaji's 2009 Journal of Applied Psychology meta-analysis of 122 IAT studies found that the IAT predicted criterion measures with a mean r of .274, compared to .361 for explicit (self-report) measures in the same studies. The IAT predicted implicit criteria (non-verbal behavior, spontaneous responses, subtle discrimination) better than explicit criteria (deliberate decisions, overt behavior), while explicit measures predicted deliberate behavior better. The key practical question — whether the race IAT predicts racially discriminatory behavior in real decisions — received its most critical examination in Frederick Oswald, Gregory Mitchell, Hart Blanton, James Jaccard, and Philip Tetlock's 2013 Journal of Personality and Social Psychology re-analysis: using different meta-analytic coding choices, Oswald and colleagues found that the race IAT accounted for a negligible portion of variance in discriminatory behavior, and that explicit measures typically outperformed it. Greenwald and colleagues disputed the coding choices, and the debate over which meta-analytic procedures are appropriate remains unresolved.
What is the problem with the D-score and what do the methodological critiques say?
Hart Blanton and James Jaccard's 2006 Psychological Review paper identified a fundamental conceptual problem with the IAT's scoring: the D-score has no natural zero point. A D-score of zero does not mean the person has no implicit association — it means their associations are equally fast in both directions. But whether a given numerical D-score should be interpreted as 'biased,' 'unbiased,' or 'counter-biased' requires a principled benchmark that does not exist. Blanton and Jaccard argued that calling people with positive D-scores 'implicitly biased' makes no more sense than calling people 'short' without defining what height counts as short. Ulrich Schimmack's 2021 Perspectives on Psychological Science paper titled 'The Implicit Association Test: A Method in Search of a Construct' argued more broadly that IAT scores reflect a mixture of implicit attitudes, response strategy, and general processing speed rather than a pure measure of any single implicit construct, and that the construct validity of the IAT has never been adequately established despite 23 years of use.
Can implicit bias as measured by the IAT be changed — and does changing it matter?
Calvin Lai, Maddalena Marini, and colleagues' two major intervention studies (2014 Journal of Experimental Psychology: General with 17 interventions; 2016 Journal of Experimental Psychology: General with 8 interventions) found that several procedures could reduce IAT scores in the laboratory — evaluative conditioning, exposure to counter-stereotypic exemplars, implementation intentions, and others. However, the reductions were typically small, were measured immediately after the intervention, and showed little or no persistence at even a one-week follow-up in a subset of studies. More importantly, whether IAT score reductions produced reductions in discriminatory behavior was not established in these intervention studies. The combination of modest, non-persistent IAT change and uncertain behavioral relevance creates a significant problem for the policy claim that implicit bias training — now a major organizational intervention industry — produces lasting reductions in discriminatory behavior. Patricia Devine and colleagues have argued that motivation and awareness together can produce lasting change, but this claim awaits rigorous longitudinal behavioral testing.
What alternatives and complements to the IAT exist?
Russell Fazio and Michael Olson's 2003 Annual Review of Psychology paper distinguished the IAT from evaluative priming measures — procedures in which exposure to a stimulus speeds responses to evaluatively congruent targets. Evaluative priming (the Affect Misattribution Procedure and Weapon Identification Task) is thought to measure automatic attitude activation more directly than the IAT, which also measures associative structure and categorization strategy. Klaus Rothermund and Dirk Wentura's 2004 Psychological Science counter-associative account argued that the IAT measures salience-based figure-ground asymmetries rather than genuine attitude associations, which would undermine its interpretation as a measure of implicit bias. B. Keith Payne's 2010 research using the Weapons Identification Task — in which participants are faster to identify guns after priming with Black faces than White faces — provided evidence of implicit race-weapon associations that correlate with aspects of police decision-making, offering an alternative implicit measure with a more direct theoretical connection to discriminatory outcomes.