In 1951, a psychology experiment at Swarthmore College asked participants to complete what appeared to be a simple visual perception task: which of three lines on a card matched a reference line in length? The answer was, in most cases, obviously clear. The catch was that each participant was placed in a group where every other person was a confederate of the experimenter, instructed to give the same wrong answer. Solomon Asch found that approximately 75 percent of participants gave the wrong answer at least once over the course of the experiment, and that about a third of all answers given in the pressure condition were wrong answers that matched the group rather than the clearly correct answer. When participants were asked afterward why they had answered as they did, many reported genuine uncertainty about whether their perceptions were correct, not simply that they had caved to social pressure.
This experiment haunts the study of human reasoning because it demonstrates something deeply uncomfortable: the process we experience as "seeing clearly" and "thinking carefully" is, under certain social conditions, nothing of the kind. The feeling of certainty accompanies wrong answers just as readily as right ones. The experience of reasoning, the internal sense of having thought something through, provides no reliable guarantee that the output of that reasoning is sound.
Forty years of subsequent cognitive psychology, culminating in the work of Daniel Kahneman, Amos Tversky, and their successors, has mapped the landscape of systematic reasoning failure with increasing precision. The findings are humbling, but they are also, crucially, actionable. Knowing specifically how and when reasoning tends to fail is the beginning of being able to do something about it.
"The confidence that people have in their beliefs is not a measure of the quality of evidence but of the coherence of the story that the mind has managed to construct." — Daniel Kahneman, Thinking, Fast and Slow (2011)
Key Definitions
Confirmation bias: The tendency to search for, interpret, favor, and recall information in a way that confirms prior beliefs. Raymond Nickerson's 1998 comprehensive review in Review of General Psychology identified it as one of the most pervasive and consequential biases in human reasoning.
Motivated reasoning: The cognitive process, described by Ziva Kunda in her 1990 Psychological Bulletin paper, by which the desire to reach a particular conclusion distorts the reasoning process used to evaluate evidence. We apply more critical scrutiny to evidence that challenges our preferred conclusions than to evidence that supports them.
Calibration: The alignment between a person's expressed confidence in a belief and their actual accuracy rate. A well-calibrated person who expresses 70 percent confidence in claims is right about 70 percent of the time. Most people are overconfident, meaning their stated confidence exceeds their accuracy rate.
Steel-manning: The practice of constructing the strongest possible version of an opposing argument before evaluating or criticizing it, as opposed to strawmanning, which involves attacking a weakened or misrepresented version.
Epistemic humility: Recognition of the limits and potential fallibility of one's own knowledge, combined with genuine openness to revising beliefs in light of evidence. Distinct from epistemic cowardice, which uses uncertainty as a reason to avoid taking any position at all.
Lateral reading: The practice of evaluating a source by searching for what other sources say about it, rather than evaluating it purely through internal reading. Developed and validated by the Stanford History Education Group.
System 1, System 2, and the Feeling of Reasoning
Daniel Kahneman's dual-process model, developed with Amos Tversky and synthesized for general audiences in Thinking, Fast and Slow (2011), provides the most useful framework for understanding why careful reasoning is so difficult to achieve in practice.
System 1 produces the feeling of understanding without the work of analysis. It generates quick, confident assessments based on pattern recognition, emotional association, and heuristic shortcuts. It is the cognitive system that has an immediate reaction to a headline, that feels certain about a political claim before examining any evidence, and that produces the experience of insight without verifiable process.
System 2 is capable of logical analysis, but it is effortful, slow, and easily overridden by System 1. Critically, it is often engaged not to evaluate System 1's conclusions but to justify them. Jonathan Haidt's research on moral reasoning, particularly his 2001 paper on the social intuitionist model, found that when people are presented with moral dilemmas, they reach their conclusions almost immediately through emotional intuition, and then deploy reasoning primarily to construct justifications for those conclusions. This phenomenon, which Haidt calls "moral dumbfounding," appears to extend well beyond the moral domain.
The practical implication is that the feeling of having reasoned carefully is not evidence that you have. System 2 engagement can be self-deception as easily as it can be genuine analysis. What distinguishes actual critical thinking from sophisticated rationalization requires external tests: consistency across cases, willingness to specify what evidence would change your view, and performance on calibration measures over time.
Confirmation Bias: The Most Studied Failure Mode
Raymond Nickerson's 1998 review in Review of General Psychology, titled "Confirmation Bias: A Ubiquitous Phenomenon in Many Guises," remains the most comprehensive account of a bias that appears in virtually every domain of human reasoning that has been studied.
The core phenomenon is simple: people are more likely to search for, notice, and accept information that confirms their current beliefs than information that challenges them. But Nickerson's review shows the mechanism is more complex than simple selective attention. Confirmation bias shapes how questions are framed (people ask leading questions that assume the answer they expect), how evidence is evaluated (disconfirming evidence is subjected to more rigorous scrutiny than confirming evidence), and how memory works (confirming evidence is better recalled than disconfirming evidence).
Wason's 2-4-6 task, a classic experiment, demonstrates the deepest form of this bias. Participants are given the sequence 2-4-6 and told it conforms to a rule. They must discover the rule by proposing their own number sequences and being told whether each sequence conforms. Most participants quickly form a hypothesis (even numbers, or ascending by twos), then test it exclusively by proposing sequences consistent with their hypothesis. They fail to test disconfirming sequences. They never discover that the actual rule is "any ascending sequence." The result is that participants develop high confidence in a wrong rule through a process of testing that felt rigorous but was systematically biased toward confirming what they already believed.
The corrective is the habit of actively seeking disconfirmation: deliberately generating the best evidence against your current belief before evaluating it. This is psychologically uncomfortable, which is why it requires deliberate practice to make it habitual.
Motivated Reasoning and When We Want to Be Wrong
Ziva Kunda's 1990 paper in Psychological Bulletin, "The Case for Motivated Reasoning," systematized an insight that has since been extensively replicated: the conclusions we want to reach shape the reasoning processes we use to reach them.
Kunda's key distinction is between accuracy goals (wanting to reach correct conclusions) and directional goals (wanting to reach particular conclusions). When we have strong directional goals, we unconsciously constrain which evidence we consider, which interpretations we entertain, and which standards we apply. The result looks and feels like reasoning, but the conclusion was largely determined before the process began.
Motivated reasoning is strongest in domains where beliefs are connected to identity, group membership, or self-esteem. This is why political and religious beliefs are particularly resistant to evidence: they are not primarily epistemic claims (beliefs about what is true) but identity claims (statements about who you are and what group you belong to). Telling someone their political belief is factually wrong does not feel like correcting an error; it feels like an attack on their person and community.
Research by Dan Kahan at Yale on cultural cognition finds that higher education and higher scientific literacy do not reduce motivated reasoning on politicized scientific topics. If anything, more educated and scientifically literate people are better at deploying sophisticated arguments in defense of their culturally preferred positions. The relevant variable is not intelligence but rather whether someone has the skill and the motivation to engage their intelligence in the service of accuracy rather than tribal loyalty.
The Dunning-Kruger Effect
In 1999, Cornell psychologists David Dunning and Justin Kruger published a paper titled "Unskilled and Unaware of It," reporting a striking finding: students who scored in the lowest quartile on tests of logical reasoning, grammar, and humor ability overestimated their performance dramatically, typically placing themselves in the 60th percentile despite scoring in the 10th. Students who scored in the top quartile slightly underestimated their performance.
The mechanism Dunning and Kruger proposed was that the metacognitive skills required to recognize competence in a domain are largely the same skills required to produce competence. A person who lacks logical reasoning skills also lacks the ability to detect when their reasoning is flawed. A person who cannot identify a well-constructed argument cannot recognize that their own arguments are poorly constructed.
Subsequent research has replicated the core finding with important nuances. A 2020 study by Gilles Gignac and Marcin Zajenkowski found the effect is real but smaller in magnitude than the original paper suggested, with the pattern driven partly by statistical regression to the mean. The practical takeaway stands: incompetence tends to produce false confidence, and the people most certain they are right in domains where they have limited knowledge should be treated with more skepticism, including when the person is yourself.
The corollary, less often discussed, is that genuine expertise tends to produce calibrated humility. Experts who have grappled seriously with a difficult domain typically know how much they do not know, how many unresolved questions remain, and where the limits of current knowledge fall. This is why domain experts often sound less certain than pundits, and why their uncertainty should be read as evidence of reliability rather than weakness.
Philip Tetlock and the Superforecasters
Philip Tetlock's Good Judgment Project, launched in 2011 and reported in Superforecasting (2015, with Dan Gardner), is the most rigorous empirical study of forecasting ability ever conducted. Tens of thousands of ordinary volunteers competed to forecast geopolitical and economic events, with their predictions tracked, scored, and compared over time.
The finding that attracted most attention was that a small group of forecasters, the "superforecasters," dramatically outperformed both chance and the predictions of intelligence analysts with access to classified information. But Tetlock and Gardner's more important finding was identifying the specific cognitive habits that separated the best forecasters from the rest.
Superforecasters share several characteristics. They maintain beliefs as explicit probability estimates rather than categorical positions, which forces them to be precise about their uncertainty and makes updating easier. They update frequently and in small increments as new evidence arrives, rather than making large revisions only when accumulated evidence is overwhelming. They decompose complex questions into smaller, tractable components. They actively seek disconfirming information rather than gravitating toward sources that confirm existing views.
Most importantly, superforecasters exhibit what Tetlock calls active open-mindedness: they treat their current beliefs as hypotheses to be tested, not possessions to be defended. They distinguish between the quality of the reasoning process and the correctness of the outcome, understanding that good reasoning sometimes produces wrong conclusions due to genuinely unpredictable events.
The thinking style that emerges is roughly Bayesian: treating all beliefs as probability estimates derived from prior probability and accumulated evidence, and updating them systematically as new information arrives. You do not have to be a statistician to reason in this mode; the core habit is simply being willing to say "I'm about 65 percent confident in this" rather than "I'm sure" or "I don't know."
The Cognitive Reflection Test
Shane Frederick's Cognitive Reflection Test (CRT), introduced in a 2005 Journal of Economic Perspectives paper, is a three-question instrument designed to measure the tendency to override intuitive but incorrect responses in favor of reflective correct answers. The most famous item: "A bat and a ball cost $1.10 in total. The bat costs $1.00 more than the ball. How much does the ball cost?"
The intuitive answer is 10 cents, and it is wrong. (If the ball costs 10 cents and the bat costs $1 more, the bat costs $1.10 and total is $1.20.) The correct answer is 5 cents. Even among students at elite universities, a significant proportion give the intuitive wrong answer. The test has since been expanded and validated as a measure of the tendency to engage System 2 rather than relying on System 1 in problem-solving.
CRT performance correlates with a range of real-world reasoning outcomes: susceptibility to cognitive biases, performance on probabilistic reasoning tasks, and, in some studies, resistance to misinformation. The relevance is not to demonstrate that some people are fundamentally better reasoners than others (the test is learnable) but to illustrate that the disposition to pause and check intuitive answers is itself a variable, and an important one.
Lateral Reading and the SIFT Method
One of the most practically consequential developments in critical thinking education in recent years came from an unexpected source: studies of how professional fact-checkers evaluate online information compared to professional historians and university students.
The Stanford History Education Group, led by Sam Wineburg, conducted a series of studies in the mid-2010s examining how different groups evaluated the credibility of websites and social media content. The results were surprising: trained historians, who had spent careers developing sophisticated text-analysis skills, were outperformed by professional fact-checkers in accurately assessing source credibility, despite the historians' far greater domain knowledge. The fact-checkers were also significantly faster.
The key difference was method. Historians, and students, read the source deeply, looking for internal cues about credibility: Does the writing sound authoritative? Are there citations? What does the About page say? Fact-checkers immediately left the source and opened multiple new tabs to see what others said about it. They read laterally, not deeply.
Mike Caulfield at Washington State University codified this approach as the SIFT method: Stop (before sharing or believing), Investigate the source, Find better coverage, Trace claims to their original context. The method's central insight is that you cannot reliably evaluate a source from the inside; context and independent assessment are necessary. A professionally designed website with confident prose can be a front for a conspiracy operation. A rough-looking academic blog can be written by a world expert. You cannot tell from internal examination.
Inoculation Theory: Protecting Reasoning in Advance
John Cook at George Mason University and Sander van der Linden at Cambridge University have developed what they call inoculation theory as a cognitive vaccine against misinformation and manipulative reasoning. The analogy to medical inoculation is precise: just as a weakened pathogen can be used to stimulate immune response, exposure to weakened forms of misleading argument can build resistance to the stronger versions.
Inoculation works by explaining in advance the techniques used to manipulate reasoning: false balance, cherry-picking data, misrepresenting scientific consensus, using emotionally compelling anecdotes as substitutes for statistical evidence. When people encounter these techniques in the wild after having learned to recognize them, the techniques lose much of their force.
Cook and colleagues developed the Bad News game, a browser-based experience in which players practice the techniques of misinformation production in order to inoculate against them. Randomized controlled trials found it improved ability to identify manipulation strategies. Van der Linden's research found that brief prebunking interventions, which exposed subjects to a weakened version of a misleading argument along with the technique being used, significantly reduced the persuasive effect of the real argument when encountered later.
The practical implication: learning to recognize manipulation techniques is more protective than fact-checking specific claims, because techniques generalize while specific claims are infinite.
Intellectual Humility and Epistemic Courage
Mark Leary at Duke University and colleagues have developed the most systematic research program on intellectual humility as a stable psychological trait, distinct from low confidence or self-deprecation. Intellectual humility, in Leary's model, involves recognizing that one's beliefs may be fallible and unintentionally distorted, being open to encountering information that challenges current views, and being genuinely curious about one's own errors.
Research consistently finds that intellectual humility predicts better reasoning outcomes, including less confirmation bias, more accurate self-assessment of knowledge, and greater openness to updating beliefs. It is also a predictor of quality of interpersonal relationships, because intellectually humble people are easier to have honest conversations with.
The complementary concept is epistemic courage: the willingness to say what you actually believe, to challenge consensus when you have good reason to, and to admit uncertainty or error publicly. Social environments often reward epistemic cowardice, using apparent open-mindedness as cover for avoiding positions that might generate conflict. The philosopher G.A. Cohen's term "bullshit" (following Harry Frankfurt) covers the category of speech that is made without regard for its truth value, not necessarily lying but simply not caring whether what you say is true. Epistemic cowardice produces a lot of bullshit.
Practical Takeaways
Treat your beliefs as probability estimates, not possessions. The habit of expressing your confidence as a number ("I'm about 70 percent sure") rather than a category ("I'm sure" or "I don't know") makes both initial calibration and subsequent updating easier. It also makes it harder to defend your beliefs as identity rather than evaluating them as evidence.
Practice active disconfirmation. Before evaluating any significant claim you find appealing, spend five minutes generating the strongest evidence against it. This is the single most effective anti-confirmation-bias practice identified in research.
Steel-man before you critique. If you cannot articulate the strongest version of a position you disagree with, you do not understand it well enough to critique it accurately. This practice also frequently produces genuine updating.
Read laterally. When evaluating any online source, the first step is to open new tabs and search for what others say about it, not to read it more carefully. Credibility is established outside a source, not inside it.
Apply the Dunning-Kruger correction. In domains where you have high confidence but limited track record, actively seek evidence of what you do not know. In domains where you feel uncertain but have substantial experience, that uncertainty may itself be a marker of genuine expertise.
Develop calibration as an explicit goal. Record predictions about significant claims and track your accuracy. The feedback loop is the mechanism by which reasoning actually improves.
References
- Kahneman, D. (2011). Thinking, Fast and Slow. Farrar, Straus and Giroux.
- Nickerson, R. S. (1998). Confirmation bias: A ubiquitous phenomenon in many guises. Review of General Psychology, 2(2), 175–220.
- Kunda, Z. (1990). The case for motivated reasoning. Psychological Bulletin, 108(3), 480–498.
- Dunning, D., & Kruger, J. (1999). Unskilled and unaware of it: How difficulties in recognizing one's own incompetence lead to inflated self-assessments. Journal of Personality and Social Psychology, 77(6), 1121–1134.
- Tetlock, P. E., & Gardner, D. (2015). Superforecasting: The Art and Science of Prediction. Crown.
- Frederick, S. (2005). Cognitive reflection and decision making. Journal of Economic Perspectives, 19(4), 25–42.
- Haidt, J. (2001). The emotional dog and its rational tail: A social intuitionist approach to moral judgment. Psychological Review, 108(4), 814–834.
- Wineburg, S., & McGrew, S. (2019). Lateral reading and the nature of expertise. Teachers College Record, 121(11), 1–40.
- Cook, J., Lewandowsky, S., & Ecker, U. K. H. (2017). Neutralizing misinformation through inoculation. PLOS ONE, 12(5), e0175799.
- Leary, M. R., et al. (2017). Cognitive and interpersonal features of intellectual humility. Personality and Social Psychology Bulletin, 43(6), 793–813.
- Kahan, D. M. (2013). Ideology, motivated reasoning, and cognitive reflection. Judgment and Decision Making, 8(4), 407–424.
- Wason, P. C. (1960). On the failure to eliminate hypotheses in a conceptual task. Quarterly Journal of Experimental Psychology, 12(3), 129–140.
Related reading: how to spot misinformation, why people believe conspiracy theories, how to make better decisions
Frequently Asked Questions
What is critical thinking and why is it hard?
Critical thinking is the disciplined practice of evaluating claims, arguments, and evidence according to standards of logic, consistency, and evidential quality, rather than according to whether the conclusion is comfortable or socially desirable. It is hard for several compounding reasons. The cognitive processes that feel like careful reasoning are often post-hoc rationalization of conclusions reached quickly by intuition. Motivated reasoning, identified by Ziva Kunda in 1990, means that when we want a particular conclusion to be true, we unconsciously search harder for confirming evidence and apply more critical scrutiny to disconfirming evidence. Social pressures reward agreement and punish dissent. And many of the contexts in which reasoning matters most, politics, religion, identity, are precisely the ones where motivated reasoning is strongest.
What cognitive biases most affect our reasoning?
Confirmation bias, the tendency to search for, favor, and recall information that confirms prior beliefs, is the most comprehensively studied and pervasive bias in human reasoning. Raymond Nickerson's 1998 review covered decades of research and concluded it is one of the most significant sources of reasoning failure in everyday life. Availability bias causes us to overweight vivid, memorable, or recent examples when estimating frequency or probability. Anchoring makes the first number we encounter disproportionately influential on all subsequent estimates. The Dunning-Kruger effect describes the tendency for people with limited knowledge in a domain to overestimate their competence in it. Overconfidence bias, the tendency to be more certain of our beliefs than the evidence actually warrants, affects virtually everyone across almost all domains studied.
What is the Dunning-Kruger effect?
The Dunning-Kruger effect, described by David Dunning and Justin Kruger in their 1999 paper in the Journal of Personality and Social Psychology, is the finding that people with limited competence in a domain tend to overestimate their competence, while highly competent people often underestimate theirs. The mechanism Dunning and Kruger proposed is that the skills needed to recognize good performance in a domain are largely the same skills needed to produce good performance: a person who cannot identify logical fallacies also cannot recognize when their own arguments contain them. The effect has been replicated across domains from logical reasoning to chess to medical diagnosis, though some subsequent research has debated its magnitude and exact mechanism.
How do superforecasters think differently?
Philip Tetlock and Dan Gardner's 2015 book Superforecasting identified a set of cognitive habits that separated the top forecasters in Tetlock's Good Judgment Project from the rest. Superforecasters update their beliefs frequently and in small increments in response to new evidence, rather than making large revisions only when forced to. They express beliefs as probabilities rather than categorical yes or no, which forces them to confront uncertainty honestly. They are actively open-minded, genuinely seeking out disconfirming information. They decompose complex questions into tractable sub-questions. And they maintain calibration, the alignment between their confidence and their accuracy rate, as an explicit goal. The thinking style that emerges is roughly Bayesian: treating beliefs as probability estimates and updating them systematically as evidence accumulates.
How do you update your beliefs when you're wrong?
The psychological challenge of belief revision is that the same motivated reasoning that distorts initial belief formation also resists updating. The most evidence-backed approach is to maintain beliefs as explicit probability estimates rather than fixed positions, so that incoming evidence shifts a probability rather than defeating an identity. Philip Tetlock's research shows that people who describe their views in probabilistic terms are significantly better at updating them than people who hold categorical positions. Scott Alexander's concept of the Stag Hunt also applies: publicly committing to update when evidence warrants it, and building relationships with others who hold you to that commitment, creates social accountability for intellectual consistency.
What is lateral reading and how does it help evaluate sources?
Lateral reading is a technique developed and validated by the Stanford History Education Group, popularized by digital literacy educator Mike Caulfield as part of the SIFT method. Instead of reading deeply within a source to evaluate it (which is how most people were taught to evaluate information), lateral reading involves opening multiple new browser tabs to search for what others say about the source, before committing time to the source itself. Professional fact-checkers, who were shown to be dramatically faster and more accurate than professional historians and college students in evaluating web sources in Stanford studies, all used lateral reading as their primary technique. The key insight is that you cannot reliably evaluate a source from the inside; you need outside perspective.
How do you separate good arguments from bad ones?
The core skills are distinguishing factual claims from evaluative claims, checking whether conclusions actually follow from stated premises, identifying what evidence would be required to establish the claim and whether that evidence is actually present, and recognizing common informal fallacies including ad hominem, strawmanning, and false dichotomy. Steel-manning, the practice of constructing the strongest possible version of an argument you are evaluating or disagreeing with, is one of the most useful practical tools: if you cannot construct the strongest version of the opposing view, you probably do not understand it well enough to critique it. Jonathan Rauch's concept of the constitution of knowledge emphasizes that the quality of an argument is determined by whether it has survived challenge by people who were genuinely trying to find its weaknesses.