In 1987, Philip Tetlock began one of the most ambitious studies ever conducted on expert judgment. He recruited 284 political scientists, economists, and policy analysts — people who made their living thinking about politics and giving advice to governments and institutions — and asked them to make probability forecasts about future political and economic events. They would predict election outcomes, economic turning points, international crises, the stability of governments. The study ran for twenty years. By the time Tetlock analyzed his results, he had collected 82,361 forecasts. His conclusion, published as Expert Political Judgment in 2005, was damning: the experts barely beat chance. They did worse than simple statistical extrapolations. More prominent experts with larger media platforms performed worse than less prominent ones. The more confident an expert, the less reliable their predictions tended to be.

This was not a finding about a fringe of incompetents. These were credentialed professionals at respected institutions, paid for their expertise, cited in policy documents, quoted in newspapers. The finding that they could not reliably predict events in their own domain raises an urgent and practical question: if expert judgment is often unreliable, how should we interpret the persistent, visible, and often acrimonious disagreements among experts that shape public policy on climate, nutrition, economics, and medicine? Is all expert disagreement equally troubling? Are there types that warrant deference and types that warrant skepticism? The stakes are real: how citizens and policymakers interpret expert disagreement determines responses to pandemics, climate legislation, drug regulation, and economic reform.

The answer, it turns out, requires a more sophisticated taxonomy than most people use. Expert disagreement is not a single phenomenon. It encompasses genuine frontier uncertainty, values embedded in empirical questions, motivated reasoning by researchers, and in some cases the deliberate manufacture of the appearance of scientific controversy. Understanding which type you are looking at is the first and most important analytical step. Treating all expert disagreement as equivalent — either dismissing it all as corruption or treating it all as legitimate uncertainty — produces errors in both directions.

"The most important thing I can tell you about being a good forecaster is this: start with a blank slate. Don't just consult the people who agree with you." — Philip Tetlock, Superforecasting (2015)


Reason Experts Disagree Domain Example
Genuine empirical uncertainty Science Optimal macroeconomic policy; nutrition research
Value differences behind technical claims Policy Cost-benefit analysis with different equity weights
Conflicting theoretical frameworks Social science Micro vs. macro approaches in economics
Incentive and funding bias Research Industry-funded studies vs. independent research
Different definitions All fields What counts as "poverty," "intelligence," or "recession"
Complexity and model dependence Climate, economics Small parameter changes produce large result differences

Key Definitions

Manufactured doubt — A deliberate strategy of creating the appearance of scientific uncertainty to delay regulatory or policy action, even when a strong scientific consensus exists. Distinguished from genuine scientific uncertainty by its external commercial funding, political motivation, and focus on amplifying doubt rather than producing evidence. The strategy was documented in detail by Oreskes and Conway in Merchants of Doubt (2010).

Cultural cognition — Dan Kahan's term for the tendency of people to process risk information in ways that reinforce their cultural identity and group memberships, rather than updating beliefs proportionally to the evidence. Higher scientific literacy is associated with greater polarization, not less, on culturally contested questions — because educated people are more effective rationalizers.

Replication crisis — The discovery, following the Open Science Collaboration's 2015 Reproducibility Project, that a large proportion of published scientific findings cannot be reproduced using the original methodology. Only 36% of 100 psychology studies selected for replication produced results consistent with the original at the same significance level.

p-hacking — The practice of running multiple statistical tests or analytical variations and reporting only those that achieve conventional significance thresholds (p < 0.05), inflating the false-positive rate of the published literature without any individual act of fraud.

HARKing — Hypothesizing After Results are Known: presenting post-hoc explanations as if they were pre-registered hypotheses, a form of scientific misconduct that is often entirely unconscious and operates through motivated memory.

Hedgehogs vs foxes — Philip Tetlock's adaptation of Isaiah Berlin's distinction: hedgehog experts who know one big thing and apply it confidently everywhere generate compelling media narratives but forecast less accurately than fox experts who draw on many ideas, maintain calibrated uncertainty, and acknowledge the limits of their models.

Pre-registration — A reform practice in which researchers publicly register their hypotheses and planned analytical methods before data collection, preventing post-hoc adjustment and making HARKing detectable.

Superforecasting — The finding, from Tetlock and Gardner's Good Judgment Project, that a small subset of non-expert forecasters consistently outperforms expert analysts through disciplined application of probabilistic reasoning, active evidence-seeking, and willingness to update on new information.

Publication bias — The tendency of journals to accept positive results (confirming a hypothesis) over null or negative results, creating a published literature that overstates the prevalence of real effects.

Identity-protective cognition — Kahan's term for the process by which people engage their cognitive capacities to protect culturally important beliefs rather than to update them in response to evidence, explaining why information campaigns on culturally charged scientific topics often backfire.


Types of Expert Disagreement

Not all expert disagreement is alike, and conflating its different varieties is the most common error in public reasoning about science. There are at least four distinct types, each with different implications for how much weight a non-expert should give to the disagreement.

Genuine Frontier Uncertainty

The first and most legitimate form is genuine scientific uncertainty at the frontier of knowledge. When data are sparse, methodologies contested, or causal mechanisms poorly understood, reasonable scientists working in good faith can reach different conclusions from the same evidence. This is not a failure of science — it is science operating normally. The history of science is a history of frontier uncertainty gradually being resolved as evidence accumulates.

The debate about minimum wage employment effects is largely of this type. When David Card and Alan Krueger published their 1994 natural experiment comparing fast-food employment in New Jersey and Pennsylvania after New Jersey raised its minimum wage, they found no employment decrease. Critics attacked the methodology; defenders replicated and extended the findings. David Card received the Nobel Prize in Economics in 2021 partly for this work. The debate continues because the evidence genuinely supports different interpretations under different model assumptions about labor market structure, the magnitude and timing of wage increases, and geographic variation.

Recognizing frontier uncertainty matters because it sets the appropriate epistemic posture: calibrated confidence with acknowledged error bars, not false certainty. The Intergovernmental Panel on Climate Change has developed formal likelihood language for this purpose — "virtually certain" means greater than 99% probability; "likely" means greater than 66% — and when applied rigorously, this precision is a model of honest scientific communication.

Values Embedded in Scientific Questions

The second type of disagreement is often misdiagnosed as factual dispute when it actually reflects disagreement about values. Science can establish facts about probabilities and magnitudes. It cannot, by itself, determine what level of risk is acceptable, how to weigh competing harms, or whose interests should be prioritized when they conflict.

The mammography screening debate illustrates this clearly. The United States Preventive Services Task Force and the American Cancer Society have issued contradicting guidelines on screening frequency and starting age. They share largely the same scientific information. They weight the evidence differently because screening involves a genuine tradeoff: earlier detection of real cancers against the costs of false positives — unnecessary biopsies, overdiagnosis of indolent cancers that would never have caused harm, and radiation exposure. How to weigh the benefit of catching a true cancer against the harm of treating a cancer that would never have progressed involves value judgments, not probability estimates alone. Experts who weight these differently will recommend differently, and both can be scientifically rigorous.

Roger Pielke Jr. argued in The Honest Broker (2007) that this conflation of fact and value is a pervasive problem at the science-policy interface. Scientists who present value-laden recommendations as if they were purely factual conclusions misrepresent both the science and their own role.

Motivated Reasoning Among Researchers

The third form is less comfortable to acknowledge: experts are not immune to the motivated reasoning that affects all human cognition. Scientists have careers, funding dependencies, theoretical commitments, and cultural worldviews that influence which hypotheses they test, which results they publish, and how they interpret ambiguous findings.

The evidence for systematic motivated reasoning among researchers is now extensive. John Ioannidis and colleagues have conducted multiple meta-analyses showing that industry-funded studies of pharmaceuticals, nutrition, and chemicals produce results favorable to the funder at rates far above what chance would predict. A 2003 analysis in the Journal of the American Medical Association found that industry-sponsored clinical trials were significantly more likely to report favorable outcomes than non-industry-sponsored trials. The effect operates not primarily through overt fraud but through subtler mechanisms: choice of comparators, analysis endpoints, publication decisions, and the quiet abandonment of unfavorable trials.

Career incentives produce similar distortions. Uri Simonsohn, Leif Nelson, and Joseph Simmons documented what they called "false-positive psychology" — researchers engage in exploratory analyses, select the most interesting findings to report, and frame them as confirmatory while being entirely sincere about their scientific intentions. The aggregate effect is a published literature with much higher error rates than the nominal 5% significance threshold implies.

Manufactured Controversy

The fourth type is the most troubling: the deliberate manufacture of the appearance of scientific uncertainty to serve commercial or political interests.

Naomi Oreskes and Erik Conway documented this in forensic detail in Merchants of Doubt (2010). Their central finding was that the same small group of scientists — many with genuine credentials — and the same network of public relations firms appeared across multiple major scientific controversies over four decades: tobacco and cancer, acid rain, the ozone hole, secondhand smoke, and climate change. The tobacco industry's playbook, developed in the 1950s and documented in internal corporate records obtained through litigation, was explicit: the goal was not to disprove the evidence that smoking caused cancer but to manufacture sufficient public uncertainty to prevent regulatory action. Internal documents put it directly: "Doubt is our product."

The same firms and many of the same individuals subsequently worked on campaigns to manufacture doubt about ozone depletion, acid rain, and climate change. Oreskes and Conway identified signature patterns that distinguish manufactured from genuine controversy: the disagreement exists primarily in media and policy forums rather than in peer-reviewed literature; a small group of contrarian scientists appear across multiple unrelated fields; the contrarians consistently align with regulated industries. These patterns do not reflect scientific debate. They reflect a replicable strategy applied by the same network across different industries facing similar regulatory threats.


Tetlock's Forecasting Research: Foxes and Hedgehogs

Tetlock's twenty-year study was not simply a negative finding about expert failure. It was also a positive finding about what distinguishes better from worse forecasters.

The most accurate forecasters in his dataset — what he called foxes, following Isaiah Berlin's distinction — shared a cluster of characteristics. They drew on many different analytical frameworks rather than one overarching theory. They expressed calibrated uncertainty, using probabilistic language rather than projecting confidence. They were willing to update their forecasts incrementally as new information arrived. They held their own views with intellectual humility, actively seeking disconfirming evidence rather than avoiding it.

The least accurate forecasters — hedgehogs — had one big idea that they applied everywhere with confidence. Hedgehog experts generated compelling media narratives and achieved high public prominence. They also forecast less accurately than foxes, and more prominent hedgehogs performed worse than less prominent ones. The implication is unsettling: the qualities that make an expert quotable in media — confident, articulate, committed to a coherent framework — are negatively correlated with forecasting accuracy.

Tetlock's subsequent work, conducted with co-author Dan Gardner and reported in Superforecasting (2015), went further. In the Good Judgment Project, run as part of a U.S. government-sponsored forecasting tournament, Tetlock identified a subset of non-expert forecasters — educated civilians with no special access to classified information — who consistently outperformed professional intelligence analysts with access to classified data. These "superforecasters" used the same cognitive strategies Tetlock's foxes had used: probabilistic thinking, active evidence-seeking, willingness to update, decomposition of complex questions into components. The finding suggests that forecasting accuracy is substantially a matter of cognitive style and discipline, not just domain knowledge.


Cultural Cognition: Why More Information Can Deepen Disagreement

Dan Kahan's research at Yale has documented one of the most counterintuitive findings in science communication. On culturally contested topics — climate change, nuclear waste disposal, gun control — cultural worldview is a stronger predictor of risk perception than scientific literacy or numeracy. More troublingly, higher scientific literacy is associated with greater polarization on these topics, not with convergence.

Kahan and colleagues demonstrated this in a series of studies, including a 2012 paper in Nature Climate Change (doi: 10.1038/nclimate1547) that showed that people with higher science comprehension scores showed wider gaps between cultural groups in climate risk perception, not narrower ones. People with high science literacy are better equipped to find, evaluate, and deploy evidence that confirms their cultural group's position. They are more sophisticated rationalizers, not more rational processors of evidence.

Kahan calls this "identity-protective cognition": because cultural identity is bound up with certain positions on contested empirical questions, receiving scientific information that challenges those positions functions as an identity threat. The response is to engage sophisticated cognitive machinery in defending the position rather than updating it. The result is that information campaigns aimed at correcting scientific misconceptions on culturally charged topics have limited effects and can even backfire, increasing polarization by giving people better tools for motivated reasoning.

This finding does not mean that knowledge is irrelevant or that nothing can be done. Kahan's research suggests that how information is framed, who delivers it, and whether it affirms or threatens cultural identity significantly changes its reception. The naive information-deficit model — that public disagreement about science stems from ignorance and that better science education will fix it — is not supported by the evidence.


The Replication Crisis and What It Means for Consensus

The replication crisis has further complicated the relationship between published literature and reliable knowledge. The Open Science Collaboration's 2015 Reproducibility Project (doi: 10.1126/science.aac4716) attempted to replicate 100 published psychology studies using the original methodology. Only 36% produced results consistent with the original at the same significance level. Subsequent replication efforts in medicine, economics, and nutrition research found similar patterns.

The mechanisms are now well understood. Publication bias means that journals preferentially accept positive results, so the published literature systematically overstates the prevalence of real effects — a null result finding no difference between an intervention and control condition is substantially less likely to be published than a positive finding, even if both were equally well-conducted. P-hacking inflates false-positive rates. HARKing — presenting post-hoc hypotheses as pre-registered — makes exploratory analyses look like confirmatory ones.

This matters for expert consensus because consensus is often built on published literature. The case of dietary fat illustrates the problem. Ancel Keys's work linking dietary saturated fat to heart disease shaped nutritional science and public health guidelines for decades. Critics like John Yudkin, who argued sugar was the more important culprit, were marginalized partly because of Keys's institutional dominance and partly because of undisclosed sugar industry funding of research designed to deflect attention from sucrose. More recent evidence has produced a more nuanced picture, and the strong anti-fat consensus is now recognized as substantially overstated. The consensus was built on a literature with systematic problems.

Pre-registration and registered reports — in which journals agree to publish a study regardless of results, contingent on pre-specified methodology — are the primary structural reform proposed and implemented in response to the replication crisis. Evidence from journals that adopted registered reports shows that the proportion of null results rises substantially, confirming that publication bias had been systematically inflating the published literature's positive-result rate.


When Expert Intuition Can Be Trusted

Against this backdrop, Kahneman and Klein's 2009 collaborative paper in Psychological Review (doi: 10.1037/a0016755) asked a pointed question: when should expert intuition be trusted? Coming from intellectual traditions that had reached opposite conclusions about expert judgment — Kahneman emphasizing its failures, Klein emphasizing its successes in naturalistic settings — they converged on a surprisingly clear answer.

Expert intuition is reliable when two conditions are met: the environment must provide regular, clear, and timely feedback on the expert's judgments, and the expert must have had adequate opportunity to accumulate experience with a large number of comparable cases. A chess grandmaster satisfies both conditions — the rules are stable, feedback is immediate and unambiguous, and relevant patterns recur reliably. A firefighter reading a burning building satisfies both as well. A clinical psychologist predicting a patient's long-term outcome satisfies neither: feedback is delayed, confounded by other factors, and often never received. A financial analyst predicting market movements fails even more clearly — markets are too noisy and too adaptive to allow reliable pattern learning.

The domain, not the credential, determines the reliability of expert intuition. This has an important implication: in low-validity environments like geopolitical forecasting, macroeconomic prediction, and long-term clinical prognosis, the confident intuitions of experienced experts are not more reliable than statistical models and may be less reliable. Years of practice in a domain without reliable feedback builds confidence, not accuracy.


How Non-Experts Should Navigate Expert Disagreement

The practical question that follows from all of this is how a non-expert should evaluate expert disagreement. Several principles emerge from the research.

Look at where the disagreement lives. If disagreement exists primarily in peer-reviewed literature, with competing factions producing and responding to evidence, it is likely genuine. If it exists primarily in media, think tanks, and policy debates while the peer-reviewed literature shows substantial consensus, manufactured doubt is more probable. Consensus among major scientific institutions — not unanimity, but the weight of the evidence as expressed in systematic reviews and meta-analyses — carries more epistemic weight than disagreement among media commentators.

Distinguish facts from values. When experts with access to the same evidence recommend different policies, the disagreement may be about values embedded in the scientific question, not about the evidence itself. Identifying which components of a controversy are empirical and which are normative clarifies the nature of the dispute.

Examine incentives and funding. Industry-funded studies in multiple domains show systematic bias toward sponsor-favorable outcomes. Calibrate confidence in individual studies accordingly, and look for independent replication.

Prefer calibrated uncertainty. Experts who express graded confidence — who use probabilistic language and acknowledge what they do not know — are generally more reliable than those who project certainty. The appeal of the confident hedgehog is a cognitive bias, not a guide to truth.

Distinguish domain expertise from adjacent claims. A climate physicist's expertise on the greenhouse effect does not automatically extend to optimal carbon pricing mechanisms. An immunologist's expertise in vaccine efficacy does not automatically extend to vaccine mandate policy. Domain boundaries matter.

For a deeper look at the psychological mechanisms that create and sustain false certainty, see why most published research is wrong. For the social and cultural dynamics that amplify expert disagreement in public discourse, see why political polarization increases. For the cognitive biases that affect non-expert evaluation of evidence, see how to think clearly.


References

  • Tetlock, P. E. (2005). Expert Political Judgment: How Good Is It? How Can We Know? Princeton University Press.
  • Tetlock, P. E., & Gardner, D. (2015). Superforecasting: The Art and Science of Prediction. Crown.
  • Oreskes, N., & Conway, E. M. (2010). Merchants of Doubt. Bloomsbury Press.
  • Kahneman, D., & Klein, G. (2009). Conditions for intuitive expertise: A failure to disagree. Psychological Review, 116(4), 515–526. https://doi.org/10.1037/a0016755
  • Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716. https://doi.org/10.1126/science.aac4716
  • Kahan, D. M., Peters, E., Wittlin, M., Slovic, P., Ouellette, L. L., Braman, D., & Mandel, G. (2012). The polarizing impact of science literacy and numeracy on perceived climate change risks. Nature Climate Change, 2(10), 732–735. https://doi.org/10.1038/nclimate1547
  • Ioannidis, J. P. A. (2005). Why most published research findings are false. PLOS Medicine, 2(8), e124. https://doi.org/10.1371/journal.pmed.0020124
  • Pielke, R. A. (2007). The Honest Broker: Making Sense of Science in Policy and Politics. Cambridge University Press.
  • Card, D., & Krueger, A. B. (1994). Minimum wages and employment: A case study of the fast-food industry in New Jersey and Pennsylvania. American Economic Review, 84(4), 772–793.

Frequently Asked Questions

Why do scientific experts disagree about established topics?

Expert disagreement has several distinct sources that are important to separate. First, some disagreement reflects genuine scientific uncertainty — frontier areas where the evidence is still accumulating and multiple interpretations are defensible. This is normal science at work. Second, some disagreement reflects values embedded in science: what counts as an acceptable risk, how to weigh competing harms and benefits, what confidence threshold justifies policy action. The mammography screening debate, for instance, involves genuine factual uncertainty about harms vs benefits and genuine value disagreements about how to weigh them. Third, some expert disagreement reflects motivated reasoning: researchers are not immune to funding incentives, career pressures, and confirmation bias. Meta-analyses consistently show that industry-funded studies produce more industry-favorable results. John Ioannidis's 2017 analysis of industry-funded nutrition research found strong systematic bias. Fourth, some apparent scientific controversy is manufactured: the tobacco industry pioneered the strategy of deliberately creating the appearance of uncertainty about established science to delay regulation. Naomi Oreskes and Erik Conway documented in 'Merchants of Doubt' how the same network of scientists and PR firms manufactured doubt about tobacco, acid rain, the ozone hole, and climate change. Understanding which type of disagreement you are looking at is the most important first step in evaluating any scientific controversy.

What is the difference between genuine scientific uncertainty and manufactured doubt?

Genuine scientific uncertainty is a normal feature of the scientific process. When researchers at the frontier of knowledge disagree, it typically reflects competing interpretations of incomplete data, different methodological choices, or different views about what evidence would be decisive. These disagreements are resolved, eventually, through accumulation of evidence and independent replication. Manufactured doubt is a deliberate strategy to exploit the appearance of scientific debate for commercial or political purposes. The key features that distinguish manufactured doubt include: the controversy exists almost entirely in public and political discourse rather than in peer-reviewed literature; a small number of the same contrarian scientists appear across multiple unrelated controversies; the funding sources behind the contrarian position have financial interests in the outcome; the focus is on generating uncertainty rather than producing alternative positive evidence; and the controversy persists despite overwhelming consensus in the relevant scientific community. Naomi Oreskes identified that the same handful of scientists who denied the harms of tobacco smoke later denied evidence on acid rain, the ozone hole, secondhand smoke, and climate change. This pattern — the same people, the same tactics, the same funders — is the signature of manufactured controversy, distinct from the self-correcting disagreements of genuine science.

How accurate are expert predictions and how should we use expert opinion?

Philip Tetlock's landmark study, published as 'Expert Political Judgment' in 2005, tracked 82,361 forecasts made by 284 experts over two decades. The results were humbling: experts performed barely better than chance, and, strikingly, more famous experts with more media presence performed worse than their less prominent peers. Tetlock identified two types of expert thinkers: 'hedgehogs' who organize their worldview around one big idea and apply it confidently across domains, and 'foxes' who draw on many small ideas and maintain calibrated uncertainty. Foxes outperformed hedgehogs significantly. Subsequent work, including Tetlock's 'Superforecasting' (2015), showed that performance could be improved through explicit training in probabilistic thinking, Bayesian updating, and epistemic humility. For non-experts seeking to use expert opinion well, the key lessons are: prefer experts who express calibrated uncertainty (confidence levels, probability estimates) over those who express absolute certainty; be skeptical of experts who perform well in media but may represent a narrow theoretical tradition; seek consensus views from multi-expert bodies like the IPCC rather than outlier individual opinions; distinguish between an expert's core domain and adjacent questions where their expertise does not directly apply; and track experts' predictions over time to see how well-calibrated their confidence has actually been.

Why does more scientific literacy sometimes make people more — not less — polarized?

Dan Kahan of Yale Law School has produced some of the most counterintuitive findings in science communication. His 'cultural cognition' research shows that on politically contested scientific questions — climate change, nuclear waste disposal, gun control — people's risk perceptions are better predicted by their cultural worldviews than by their scientific literacy or numeracy. More troublingly, Kahan found that people with higher scientific literacy and numeracy showed greater polarization, not less, on culturally contested topics. The mechanism appears to be what Kahan calls 'identity-protective cognition': highly educated people are better equipped to find, evaluate, and deploy evidence that supports their cultural group's position. They are more sophisticated at rationalization, not more rational. This finding has profound implications for science communication strategies. The naive model — that public disagreement about science stems from ignorance and that providing information will resolve it — does not hold for culturally charged topics. What predicts greater convergence is not more scientific information but rather how the scientific information is framed, who delivers it, and whether receiving the information threatens or affirms the recipient's cultural identity. Kahan's work does not mean that knowledge is irrelevant — it means that the mechanism by which knowledge operates is more socially complex than a simple information-deficit model suggests.

What is the replication crisis and how does it affect expert credibility?

The replication crisis refers to the discovery, accelerating from around 2011, that a substantial proportion of published scientific findings cannot be reproduced when independent researchers attempt to repeat the original studies. The Open Science Collaboration's Reproducibility Project (2015) attempted to replicate 100 psychology studies and found that only 36% produced results consistent with the original at conventional significance thresholds. Similar replication failures have been documented in medicine, economics, and nutritional science. The causes are now well understood: publication bias (journals preferentially publish positive results, so the literature systematically overstates effect sizes); p-hacking (researchers unconsciously or deliberately search for statistical significance through multiple analyses); HARKing — Hypothesizing After Results are Known (presenting post-hoc hypotheses as pre-registered); underpowered studies (too few participants to detect effects reliably); and file-drawer effects (negative results never published). The crisis matters for expert credibility because expert consensus is often built on published literature, which these processes systematically distort. Structural responses include pre-registration (registering hypotheses and analysis plans before collecting data), registered reports (peer review before data collection), and data sharing mandates. The crisis does not mean science is broken — it means the incentive structures of academic publishing create systematic biases that the field is now working to correct. Understanding this helps calibrate appropriate trust in single studies versus replicated findings.

How should non-experts decide which experts to trust?

Several evidence-based heuristics help non-experts navigate expert disagreement more effectively. First, look for consensus from multi-expert institutions rather than individual outlier opinions: the IPCC, the National Academies of Sciences, major professional associations represent aggregated expert judgment rather than individual viewpoint. Second, examine the structure of the disagreement: is it primarily in peer-reviewed literature or primarily in public and political debate? Controversies that exist mainly in media but not in scientific journals are more likely to reflect manufactured doubt or motivated reasoning. Third, ask about incentives and funding: industry-funded research shows systematic pro-industry bias in multiple meta-analyses. This does not automatically invalidate industry-funded studies but should calibrate your confidence. Fourth, assess whether the expert is speaking within or outside their core domain: a climate scientist's views on climate physics are more credible than their views on optimal climate policy, which involves value choices beyond their expertise. Fifth, favor experts who express calibrated uncertainty, acknowledge limitations, and update their views in response to evidence over those who express absolute certainty and never change their minds. Sixth, be aware of the 'contrarian expert' pattern: in almost every domain, a handful of credentialed contrarians achieve disproportionate media visibility by taking heterodox positions; this does not make them wrong, but their visibility is not evidence of a genuine scientific controversy.

When is expert intuition reliable and when is it not?

Daniel Kahneman and Gary Klein, despite representing different intellectual traditions, collaborated on a 2009 paper that synthesized their views on when expert intuition can be trusted. Their conclusion: expert intuition is reliable when two conditions are met. First, the environment must provide regular, clear, and timely feedback — the expert must be able to learn from experience because the feedback loop is tight enough to allow calibration. Second, there must be adequate opportunity to practice and accumulate experience with cases. A chess grandmaster meets both conditions: the feedback is immediate and unambiguous, and the rule structure is stable. A firefighter reading structural conditions also largely meets these criteria. By contrast, a clinical psychologist predicting long-term patient outcomes, a financial analyst forecasting stock prices, or a political expert predicting election outcomes in novel geopolitical contexts all fail the first criterion: feedback is delayed, ambiguous, or confounded by events outside the expert's control. This framework helps explain a counterintuitive finding from Tetlock's forecasting research: domain experts did not outperform non-experts significantly, and more media-prominent experts performed worse, because political and economic prediction involves environments that do not give the clear feedback needed to calibrate expert intuition. The lesson is not that expertise is worthless but that its reliability is highly domain-specific and depends critically on the structure of the feedback environment.