What is expected value thinking in decision making?

Expected value thinking involves estimating the probability of different outcomes and multiplying each by its value, then summing across scenarios to produce an overall expected value for each option. A decision with a 30 percent chance of gaining $100 and a 70 percent chance of losing $20 has an expected value of (0. 3 x $100) + (0.


In January 1986, seven astronauts died when the Space Shuttle Challenger broke apart 73 seconds after launch. The cause was a failed O-ring seal — a known problem. Engineers at Morton Thiokol had explicitly warned NASA management the night before that low temperatures would compromise the seals. They were overruled. The decision to launch was made not because the engineering data supported it, but because of schedule pressure, organizational hierarchy, and the deeply human tendency to filter out information that contradicts what you want to be true.

The Challenger disaster is an extreme example of a universal problem: the gap between how people think they make decisions and how they actually do. Most people believe they weigh evidence rationally and choose the best available option. The research on human judgment — from Kahneman and Tversky's foundational work in the 1970s to the forecasting studies of Philip Tetlock and the organizational decision research of Gary Klein — paints a more complicated picture.

Good decisions do not require certainty, which rarely exists for decisions that matter. They require a process that is systematically better than intuition alone. Several specific tools have strong evidence behind them.


Why Decisions Go Wrong

Before discussing what to do, it is worth understanding why smart people make bad decisions. The causes are largely predictable.

Overconfidence: People consistently overestimate the accuracy of their own judgments. In calibration studies conducted by Baruch Fischhoff, Paul Slovic, and Sarah Lichtenstein (1977), events assigned a 90 percent probability of occurring happened only about 70 to 75 percent of the time. Experts are not reliably better calibrated than informed non-experts; in some domains they are worse, because domain expertise can increase confidence faster than it increases accuracy. Philip Tetlock's landmark study of 284 expert political forecasters (Expert Political Judgment, 2005) found that expert forecasts were barely more accurate than chance in many domains, while their stated confidence vastly exceeded their actual accuracy.

The inside view: When planning or evaluating a decision, people naturally focus on the details of the specific case — the particular team, the particular market, the particular circumstances. This inside view produces optimistic estimates because the vivid details of "why this case is different" crowd out the base rate evidence on how similar cases have actually turned out.

Confirmation bias: The systematic tendency to seek, interpret, and remember information in ways that confirm existing beliefs. Once a conclusion has formed, the mind preferentially processes evidence that supports it and discounts evidence that challenges it. Psychologists Peter Wason and Jonathan Evans documented this in early laboratory studies in the 1960s-70s, and decades of subsequent research have confirmed it as one of the most robust cognitive biases in human reasoning.

The planning fallacy: Kahneman and Tversky's term for the systematic tendency to underestimate the time, costs, and risks of future actions while overestimating benefits. Bent Flyvbjerg's research (2003, Megaprojects and Risk) found that across thousands of large infrastructure projects globally, cost overruns averaged 45% and benefit shortfalls were common. Projects almost always take longer and cost more than planned. This is not because people are dishonest; it is because the inside view drives planning.

"People make predictions by constructing scenarios of how the future will unfold, not by estimating base rates of similar previous cases. These scenarios are typically optimistic." — Daniel Kahneman, Thinking Fast and Slow (2011)

Sunk cost fallacy: The tendency to continue investing in a course of action because of resources already spent, even when the marginal expected value of continuing is negative. Behavioral economists Hal Arkes and Catherine Blumer (1985, Organizational Behavior and Human Decision Processes) documented this in controlled studies: people who had paid more for theater tickets were more likely to attend in bad weather, even though the ticket cost was sunk regardless of attendance. In organizational decisions, sunk cost thinking sustains failing projects and bad acquisitions long after the rational decision would be to exit.

Groupthink: The tendency for cohesive groups to prioritize harmony and consensus over critical evaluation. Irving Janis (1982) documented this pattern in examining high-profile policy failures including the Bay of Pigs invasion, where the Kennedy administration's decision-making process was distorted by excessive conformity pressure. Groupthink is the organizational version of confirmation bias, in which dissenting information gets filtered out to preserve group cohesion.


Expected Value Thinking

The most fundamental tool in decision-making under uncertainty is expected value — the probability-weighted average of possible outcomes.

The expected value of a decision is calculated by: identifying all plausible outcomes, estimating the probability of each, estimating the value of each (positive or negative), multiplying probability by value for each outcome, and summing the results.

Example: You are considering a business investment that will return $200,000 with a 25 percent probability, $50,000 with a 50 percent probability, and lose $30,000 with a 25 percent probability.

Expected value = (0.25 x $200,000) + (0.50 x $50,000) + (0.25 x -$30,000) = $50,000 + $25,000 - $7,500 = $67,500

The expected value is positive, which is relevant information — but it is not the only relevant information. The distribution matters too. The same expected value can arise from a low-variance distribution (highly predictable) or a high-variance distribution (lots of uncertainty about which outcome occurs). Risk tolerance and the ability to absorb the downside scenario should factor into the decision alongside expected value.

Where Expected Value Thinking Is Most Useful

Expected value thinking is most valuable when:

  • Decisions are repeated (decisions made many times allow probabilities to average out over the long run)
  • Outcomes are measurable enough to assign rough values
  • You are choosing between multiple options, not just accepting or rejecting one

It is least reliable when probabilities are deeply uncertain (not just unknown but unknowable), when outcomes are non-linear (catastrophic downside scenarios that might not be survivable), or when values are genuinely incommensurable (comparing financial outcomes with moral obligations, for example).

The Kelly Criterion and Bet Sizing

For repeated decisions with known probability and value parameters, mathematician J.L. Kelly Jr. (1956) derived the Kelly Criterion — the fraction of a bankroll to bet on a favorable proposition that maximizes long-term growth rate. The formula is:

Fraction = (bp - q) / b

Where b is the net odds received, p is the probability of winning, and q is the probability of losing (1-p).

Professional gamblers, traders, and investors use the Kelly Criterion or fractional-Kelly variants to size positions. The key insight: even when expected value is positive, betting too large a fraction of resources risks ruin; betting too small sacrifices compounding. Kelly sizing threads the needle. This principle generalizes beyond gambling: position sizing, business investment decisions, and resource allocation all benefit from similar logic even when the exact formula is not applicable.


Base Rates and the Outside View

The most consistent finding in forecasting research is that base rates — the historical frequency of an outcome across a reference class of similar cases — are systematically underweighted in favor of the specific details of the case at hand.

This pattern was documented extensively by Philip Tetlock in his long-running research on expert political forecasting, later described in Superforecasting (2015, with Dan Gardner). The forecasters who performed best — those Tetlock called "superforecasters" — shared a specific habit: they began with the base rate and then adjusted for specific factors, rather than beginning with the specific case and ignoring the base rate.

Reference class forecasting is the formalization of this approach, developed by Kahneman and Amos Tversky and later systematized by Bent Flyvbjerg for large infrastructure projects. The procedure:

  1. Identify the reference class — the population of similar past cases (e.g., "comparable IT system implementations," "similarly sized business expansions")
  2. Identify the base rate for the outcome in question in that reference class (e.g., "what fraction of comparable IT projects were delivered on time and budget?")
  3. Use this base rate as the starting estimate
  4. Adjust based on specific features of the current case that are genuinely different from the reference class average

The adjustment in step 4 should be modest. Most people believe their case is more exceptional than it actually is. The base rate is usually a better estimate than the inside view, even after adjustment.

Flyvbjerg's research on large capital projects found that base rates for cost overrun in transportation infrastructure are approximately 45 percent on average, with some categories running higher. Projects estimated by optimistic inside-view analysis at budget compliance routinely overran by 50 percent or more. Had planners started with the base rate, their estimates would have been more accurate and their contingency planning more appropriate.

Decision Domain Common Reference Class Typical Base Rate Finding
Software projects IT system implementations 70%+ exceed budget or timeline (Standish Group, 2020)
New products New product launches ~80-85% fail within 3 years (McKinsey data)
Small businesses Startup survival ~50% fail within 5 years (Bureau of Labor Statistics)
Restaurant industry Restaurant openings ~60% close within 3 years (industry surveys)
Mergers/acquisitions Corporate M&A 50-60% destroy acquirer value (academic meta-analyses)

The Pre-Mortem

The pre-mortem is a technique developed by psychologist Gary Klein and popularized by Kahneman. It is one of the most reliably useful interventions available for improving decision quality in groups.

The standard forward-looking risk analysis asks: "What could go wrong?" This question is addressed in an atmosphere where the plan has already achieved some social momentum — where optimism is the default and raising risks can feel like opposing the team. The result is that known risks are often discussed superficially and novel risks are rarely surfaced.

The pre-mortem reframes the question. At the outset of implementation — before commitment is locked in — the group is told:

"Imagine it is one year from today. We implemented this plan, and it failed. The failure was significant. I want each of you to write down, independently, the most plausible reasons the failure occurred."

Several features make this more effective than standard risk review:

  • The failure is assumed, not speculated: This bypasses the tendency to defend the plan. You are not asked "could this fail?" but "why did it fail?"
  • Independent generation before sharing: Prevents anchoring on the first risk raised and suppresses social conformity effects
  • Prospective hindsight: Research by Deborah Mitchell, Jay Russo, and Nancy Pennington (1989, Organizational Behavior and Human Decision Processes) found that people identify more reasons for an event when asked to explain it after imagining it has already occurred than when asked to explain it prospectively

The pre-mortem is particularly effective for surfacing implementation risks — things that seem unlikely given optimistic assumptions but become plausible when execution complexity is considered honestly.

What to Do With Pre-Mortem Output

The goal is not to abandon the plan; it is to identify the top two or three risks and ask: what would we do differently given these? Can any of these risks be mitigated at acceptable cost? If one of the pre-mortem scenarios is highly plausible and the mitigation is prohibitively expensive, that is important information about whether to proceed at all.

A structured approach: rank the pre-mortem scenarios by probability and severity. For the top three, assign ownership — a person responsible for monitoring the risk indicator and triggering a response if early warning signs emerge. This converts the pre-mortem from a planning exercise into an early warning system.


Type 1 vs Type 2 Decisions: The Bezos Framework

Jeff Bezos's framework for categorizing decisions, first described in his 2015 letter to Amazon shareholders, has become one of the most widely cited frameworks in management because it addresses a specific failure mode of mature organizations.

Bezos distinguishes between:

Type 1 decisions (one-way doors): Decisions that are consequential, difficult to reverse, and require careful analysis before commitment. Once you walk through the door, returning to the prior state is expensive or impossible. Examples: shutting down a product line, making a large acquisition, entering a new country market, changing a fundamental organizational structure.

Type 2 decisions (two-way doors): Decisions that are reversible, with low irreversibility cost. If you walk through the door and it is the wrong call, you can walk back. Examples: trying a new product feature for a quarter, changing an internal process, hiring for a new role on a provisional basis, running a new marketing channel.

The failure mode Bezos identified is that large organizations apply Type 1 deliberation processes to Type 2 decisions. Multiple approval layers, extensive documentation requirements, cross-functional sign-off, long timelines — these make sense for decisions that cannot be undone but are pure overhead for decisions that can be reversed quickly if they do not work.

The result is that organizations slow down their learning cycles, because the fastest way to learn in complex environments is often to try something, observe the result, and adjust. Type 1 processes applied to Type 2 decisions remove the ability to run fast experiments.

Decision Type Reversibility Recommended Process Examples
Type 1 (one-way door) Low to none High deliberation: extensive analysis, diverse perspectives, pre-mortem, scenario planning Acquisitions, major restructuring, platform architecture choices
Type 2 (two-way door) High Low deliberation: decide with available information, act quickly, monitor and adjust Feature tests, process experiments, tactical resource allocation

The challenge in applying this framework is that the line between Type 1 and Type 2 is not always obvious in advance. Decisions that seem reversible sometimes are not; decisions that seem irreversible sometimes have more flexibility than initially apparent. And there is organizational incentive to misclassify decisions as Type 1 when the actual motivation is risk aversion or bureaucratic caution. The framework requires honest assessment of actual reversibility, not desired reversibility.

A related concept from psychology: Kahneman distinguishes between System 1 thinking (fast, intuitive, automatic) and System 2 thinking (slow, deliberate, analytical). Type 1 decisions require System 2; Type 2 decisions can productively rely on System 1. The skill is engaging the right system for the decision at hand.


Separating Decision Quality from Outcome Quality

One of the most important — and least practiced — habits in decision improvement is evaluating decisions on the quality of their process rather than the quality of their outcomes.

A decision can be excellent and still produce a bad outcome, because outcomes under uncertainty are partly determined by factors outside the decision-maker's control. A decision can be poor — based on inadequate analysis, biased information, and wishful thinking — and still produce a good outcome by chance. Evaluating only outcomes provides distorted feedback.

Annie Duke, former professional poker player and author of Thinking in Bets (2018), uses the term resulting for the error of judging the quality of a decision by its outcome. Resulting produces systematic mislearning: you reinforce the habits that happened to work in specific cases rather than the habits that have positive expected value over many cases.

"When we evaluate decisions based on outcomes, we get a biased sample of our decision-making quality. The feedback is systematically misleading because luck contaminates outcomes. We need to evaluate our reasoning at the time of the decision, before the outcome was known." — Annie Duke, Thinking in Bets

The antidote is prospective documentation: writing down the reasoning behind decisions before the outcome is known, including:

  • The options considered and rejected, and why
  • The key uncertainties and how they were estimated
  • The expected distribution of outcomes
  • The confidence level in the assessment

Reviewing these notes after outcomes are known allows honest assessment of where the reasoning was sound and where it failed — independently of whether the outcome was good or bad. This feedback loop, maintained consistently over time, is the most reliable mechanism for genuine improvement in decision quality.

This practice is used formally in professional settings: medical teams conduct structured morbidity and mortality reviews to assess decision quality independently of patient outcomes; aviation uses accident investigation methodologies designed to analyze process quality rather than assign blame based on outcomes alone.


Calibration: Knowing What You Know

Calibration is the alignment between confidence and accuracy: a well-calibrated person who says they are 80 percent confident in a prediction is right approximately 80 percent of the time. A poorly calibrated person who says they are 90 percent confident may be right only 60 percent of the time.

Calibration is trainable. The research from Tetlock's Good Judgment Project — a large-scale forecasting tournament run from 2011-2015 — found that individuals who practiced probabilistic forecasting improved their calibration substantially over time. The practice required:

  • Expressing beliefs as numbers ("70 percent likely" rather than "probably")
  • Tracking predictions against outcomes in a structured log
  • Decomposing predictions into component estimates where possible
  • Updating estimates as new information became available

The superforecasters in Tetlock's project were 30% more accurate than intelligence analysts with access to classified information when forecasting geopolitical events. Their advantage was methodological, not informational: they used structured probability estimation, updated on evidence, and maintained calibrated uncertainty rather than expressing false confidence.

The benefits extended beyond the specific domains being forecasted. Calibration training appears to generalize, making reasoners better at knowing the limits of their knowledge across domains.

A practical starting point: for decisions with uncertain outcomes, practice stating your confidence as a probability. If you think a project will finish on time, how confident are you — 60 percent? 80 percent? Track whether your 80 percent predictions actually occur 80 percent of the time. Most people will initially find they are overconfident. The feedback is uncomfortable and productive.

Tools like Metaculus, Manifold Markets, and PredictionBook allow anyone to practice probabilistic forecasting and receive feedback on calibration over time.


Debiasing Techniques That Work

Beyond the specific tools described above, a body of research identifies interventions that measurably improve decision quality.

Consider the Opposite

Research by Charles Lord, Mark Lepper, and Elizabeth Preston (1984) demonstrated that asking people to consider the opposite — to actively generate arguments against their preferred conclusion before committing — significantly reduced confirmation bias. The instruction "consider whether your conclusion might be wrong, and list your best reasons for thinking so" improved decision quality more reliably than general advice to "be objective."

Red Teams and Dissent

Assigning someone specifically to challenge a plan — a red team or designated devil's advocate — counteracts groupthink and confirmation bias. Research by Charlan Nemeth and colleagues (1995) found that authentic minority dissent (someone who genuinely disagrees) improved group decision quality more than instructed devil's advocacy (someone assigned to argue the other side), but both improved outcomes relative to uncontested consensus.

Structured Decision Analysis

For high-stakes decisions, formal decision analysis frameworks — mapping out decision trees, probability distributions, and value functions — reduce the influence of intuitive biases by forcing explicit specification of assumptions. The act of writing down probability estimates and values exposes unreasonable assumptions to scrutiny in a way that purely mental deliberation does not.

The consulting and intelligence communities have developed structured analytical techniques for exactly this purpose. The US Intelligence Community's Analytic Standards require analysts to use specific methodologies (Analysis of Competing Hypotheses, Key Assumptions Check) that institutionalize the outside view and adversarial challenge.

Sleep and Time

Research on decision quality under fatigue consistently finds that sleep deprivation degrades decision-making in specific ways: increased risk-seeking behavior, reduced consideration of alternatives, and greater susceptibility to cognitive biases. For important decisions, ensuring adequate sleep before the decision point is not a soft suggestion — it is a meaningful quality intervention. Studies by Harrison and Horne (2000, Neuropsychologia) found that 24 hours without sleep produced decision-making impairments comparable to legal blood alcohol intoxication.

Similarly, creating time between generating options and selecting among them — the "sleep on it" principle — consistently improves decision quality in research settings, because unconscious processing integrates information in ways that rushed deliberation does not.


A Framework for High-Stakes Decisions

Combining these tools, a practical decision process for decisions that genuinely matter looks like this:

  1. Identify the reference class and find the base rate for the outcome you care about in that class. Start with the outside view.

  2. Apply expected value thinking: map out the plausible scenarios, estimate probabilities and values, compute expected value, and note the uncertainty around the estimates.

  3. Run a pre-mortem: assume failure and generate reasons why. Address the top risks before commitment.

  4. Classify the decision: is it Type 1 (low reversibility, high deliberation warranted) or Type 2 (high reversibility, faster process appropriate)?

  5. Challenge your reasoning: actively consider the opposite. Assign a red team or devil's advocate if the stakes are high and the group may be subject to groupthink.

  6. Document the reasoning: write it down before the outcome is known.

  7. Review after outcomes are known: evaluate the reasoning quality independently of the outcome.

This process will not eliminate bad outcomes. Uncertainty is real, and some fraction of good-process decisions will still produce bad outcomes. But over time, decisions made through this process will outperform decisions made by intuition alone, because the process is designed to counteract the specific biases — optimism, base rate neglect, confirmation bias, outcome dependency — that most predictably lead smart people to wrong conclusions.

The Challenger disaster was preventable by this logic. The engineering information existed. The base rate for O-ring performance at low temperatures was available. The pre-mortem would have immediately surfaced the seal failure risk. The decision was Type 1 — launching humans on a vehicle is irreversible. What was missing was not information or intelligence; it was a process that protected relevant information from being filtered out by organizational pressure. That is what good decision process provides: a structure that keeps the right information in the room until the decision is made.

Frequently Asked Questions

What is expected value thinking in decision making?

Expected value thinking involves estimating the probability of different outcomes and multiplying each by its value, then summing across scenarios to produce an overall expected value for each option. A decision with a 30 percent chance of gaining \(100 and a 70 percent chance of losing \)20 has an expected value of (0.3 x \(100) + (0.7 x -\)20) = \(30 - \)14 = $16. This framework shifts focus from 'what will happen' to 'what is the probability-weighted outcome,' which is more useful under genuine uncertainty.

What is a pre-mortem?

A pre-mortem, introduced by psychologist Gary Klein, is a technique in which a team imagines that a project or decision has already failed and works backward to identify why. Unlike traditional risk analysis, which asks 'what could go wrong?', the pre-mortem assumes failure has occurred and asks 'what happened?' This shifts the cognitive framing and reliably surfaces failure modes that forward-looking risk analysis misses, because it bypasses the planning fallacy and overcomes the social pressure to be optimistic.

What are base rates and why do they matter?

A base rate is the historical frequency of an outcome across a reference class of similar cases. If 60 percent of small businesses fail within five years, that is the base rate for any specific small business, regardless of how exceptional its founder believes it to be. People systematically underweight base rates in favor of specific case details (the planning fallacy). Deliberately starting with the base rate and then adjusting for specific factors — rather than ignoring the base rate — consistently improves prediction accuracy.

What is the difference between Type 1 and Type 2 decisions?

Jeff Bezos's Type 1/Type 2 framework distinguishes between one-way doors (Type 1 decisions that are difficult or impossible to reverse and require extensive deliberation) and two-way doors (Type 2 decisions that are easily reversible and should be made quickly with less process). Most large organizations make the mistake of treating Type 2 decisions like Type 1 decisions, creating bottlenecks and slowing down learning cycles. The framework prescribes matching the level of deliberation to the reversibility of the decision.

How can you improve decision quality without more information?

Several techniques improve decision quality without requiring more data: explicitly considering base rates before diving into case details, using the outside view (asking how similar situations turned out) before the inside view, conducting a pre-mortem to identify blind spots, separating the decision from its outcome to evaluate decision quality on process rather than results, and writing down the reasoning before the outcome is known to enable honest retrospective analysis.