On March 19, 1628, William Harvey published Exercitatio Anatomica de Motu Cordis et Sanguinis in Animalibus — his account of the circulation of blood. Before Harvey, the dominant explanation for how blood moved through the body was the Galenic model: blood was produced in the liver from food, consumed by the body's tissues as fuel, and continuously replenished. This model required the body to produce and consume enormous quantities of blood continuously. Harvey recognized the problem: the body would need to produce more blood per hour than the person's entire body weight. A single assumption — that blood circulates rather than being consumed — eliminated this absurdity and explained the observed anatomy with far fewer required processes.

Harvey did not invoke Occam's Razor by name. But the logic was identical: the simpler model, requiring fewer assumptions and fewer impossible quantities of blood production, was to be preferred.

This is Occam's Razor in its scientific application: when two explanations account equally for the observed facts, prefer the one that requires fewer assumptions. Not because simpler explanations are necessarily correct, but because each additional assumption is an additional point of failure — an additional way the explanation can be wrong.

William of Ockham: The Origin

William of Ockham (also spelled Occam) was born around 1287 in the village of Ockham in Surrey, England. He studied at Oxford and became a significant figure in Franciscan scholastic philosophy and theology. He was a nominalist in the philosophical debate between realism (the view that universals have independent existence) and nominalism (the view that universals are names for patterns, not independently real entities). His nominalism and his application of parsimony to philosophical argument put him in conflict with Pope John XXII, and he spent years in political exile.

Ockham's formulation was more precise and more limited than the popular version suggests. He wrote, in Latin: "Pluralitas non est ponenda sine necessitate" — "plurality must not be posited without necessity." And: "Frustra fit per plura quod potest fieri per pauciora" — "it is futile to do with more things that which can be done with fewer."

These are methodological principles about intellectual economy, not empirical claims that the world is simple. Ockham was not saying that simple explanations are more likely to be true; he was saying that introducing unnecessary entities or assumptions is a kind of intellectual waste. The "razor" metaphor — shaving away unnecessary assumptions — came later; Ockham himself did not use that image.

The principle in the form "the simplest explanation is usually right" is a stronger empirical claim added by later thinkers, particularly scientists in the 17th century and beyond, who observed that parsimonious theories tended to be more robust and better supported.

The Formal Statement

The principle is most precisely stated as: among competing hypotheses that equally fit the available evidence, prefer the one that requires the fewest assumptions.

Three qualifications are embedded in this formulation:

"Among competing hypotheses." Occam's Razor is a tie-breaker, not a truth criterion. It applies when evidence does not clearly favor one explanation over another. When evidence strongly favors a complex explanation over a simple one, the evidence wins.

"That equally fit the available evidence." The competing explanations must actually account for the known facts. A simpler explanation that fails to account for important observations is not preferred — it is simply wrong.

"Fewest assumptions." This is the operative criterion. An assumption is anything the explanation requires to be true that is not already established by evidence. More assumptions mean more ways to be wrong.

Occam's Razor does not say nature is simple. It says that when you cannot yet tell which of two explanations is correct, you should invest your credence in the one that requires you to be wrong about fewer things.

Applications in Science

Physics and Cosmology

Isaac Newton explicitly invoked parsimony in his scientific method. In Principia Mathematica (1687), he listed as his first "Rule of Reasoning in Philosophy": "We are to admit no more causes of natural things than such as are both true and sufficient to explain their appearances." This is Occam's Razor in Newton's language.

In modern cosmology, parsimony has both successes and failures. The standard cosmological model (Lambda-CDM) is not simple — it requires dark matter, dark energy, inflation, and a particular initial state of the universe, none of which have been directly detected. But it accounts for an extraordinary range of observations with these few components. Competing theories that attempt to explain away dark matter by modifying gravity (MOND — Modified Newtonian Dynamics, proposed by Mordehai Milgrom in 1983) are in some senses simpler but fit the full range of cosmological data less well. Parsimony here yields to evidential fit.

The Higgs boson discovery at CERN (2012) resolved a long-standing tension. The Standard Model of particle physics predicts the Higgs; removing it from the model required more complex explanations of how particles acquire mass. The observed particle confirmed the parsimonious choice: accept the Higgs rather than proliferating alternative mass mechanisms.

Principle Formulation Application Limit
Occam's Razor "Prefer fewest assumptions when evidence is equal" Choose simpler hypothesis as starting point Evidence can override; complex reality exists
Hickam's Dictum "Patients can have as many diseases as they want" Multiple concurrent diagnoses are common Don't overcorrect to needless complexity
Horses, not zebras "Common diseases are common" Start with the probable, not the rare Patients with rare diseases get misdiagnosed

Medicine: Occam's Razor and Hickam's Dictum

Medical diagnosis is perhaps the domain where Occam's Razor is most explicitly taught and most actively debated. The principle is applied as a heuristic: when a patient presents with multiple symptoms, prefer the diagnosis that explains all symptoms with a single cause over multiple concurrent diagnoses.

This medical Occam's Razor is contested by Hickam's Dictum, attributed to physician John Hickam (Duke University, mid-20th century): "Patients can have as many diseases as they damn well please." The dictum reflects a genuine clinical reality: multiple concurrent conditions are common, especially in elderly patients. Applying Occam's Razor too aggressively — insisting that all symptoms must have one cause — causes clinicians to miss genuine complexity.

The "horses, not zebras" heuristic (attributed to internist Theodore Woodward, University of Maryland, 1940s) represents Occam's Razor applied to base rates: when you hear hoofbeats, think horses before zebras. Common diseases are common; prefer the explanation invoking common conditions over the one invoking rare ones. This is statistically sound as a starting point, but generates systematic harm for patients with rare diseases whose presentations are explained away by forcing a common diagnosis.

Evolutionary Biology

In phylogenetics — the reconstruction of evolutionary relationships from genetic or morphological data — maximum parsimony is a formal analytical method. Given the observed genetic sequences of a set of species, the parsimony method selects the evolutionary tree that requires the minimum number of evolutionary changes (mutations, insertions, deletions) to explain the data.

Parsimony analysis was the dominant method in phylogenetics from the 1960s through the 1990s, developed formally by Willi Hennig (in Phylogenetic Systematics, 1966). It has been largely supplemented by maximum likelihood and Bayesian approaches, which can better model how evolutionary change actually occurs. The debate illustrates a general point: parsimony is a useful heuristic that sometimes yields to more explicit probabilistic models when the process generating the data is well understood.

Applications in Machine Learning and AI

Overfitting and Regularization

In machine learning, Occam's Razor is institutionalized in the form of regularization: penalties that reduce model complexity to improve generalization. A model can be made arbitrarily complex to fit any finite training dataset perfectly — but a complex model that fits training data perfectly typically fails to generalize to new data. The additional complexity captures noise in the training data rather than underlying patterns.

Regularization methods (L1/Lasso, L2/Ridge, dropout in neural networks) implement the parsimony principle by penalizing complexity: models with more parameters, larger parameter values, or more connections pay a cost in the optimization objective. The preferred model is the simplest one that fits the data adequately, not the complex one that fits it perfectly.

Minimum Description Length

Jorma Rissanen, a Finnish mathematician and information theorist, developed the Minimum Description Length (MDL) principle in the 1970s as a formal mathematical implementation of Occam's Razor applied to model selection. MDL holds that the best model for a dataset is the one that minimizes the total description length of the model plus the description length of the data given the model — in information-theoretic terms, the most compressed total representation.

Rissanen's formalization connects Occam's Razor to Shannon's information theory: simpler models require fewer bits to describe, and the minimum-description-length criterion quantifies exactly what "simpler" means in terms of information content. This formalization has been influential in statistical learning theory and Bayesian model selection.

Large Language Models and Simplicity

A recurring debate in AI research concerns whether large language models, with billions of parameters, violate Occam's Razor by being maximally complex. Defenders argue that a sufficiently large model that generalizes well is parsimonious at the level of its function even if complex in its parameters — it requires only a few principles (predict the next token; generalize across contexts) rather than many explicit rules. Critics argue that the opacity of large models, and their tendency to produce confident errors, reflects the epistemic costs of complexity that Occam's Razor warns against.

When Occam's Razor Fails

The World Is Not Obligated to Be Simple

The single most important limitation of Occam's Razor is that nature does not guarantee simple explanations are available. Einstein's general theory of relativity replaced Newton's simpler gravitational model precisely because general relativity accounts for observations — the precession of Mercury's orbit, gravitational lensing, time dilation near massive objects — that Newton's simpler model cannot. The complex model won on evidence.

Quantum mechanics is not simple. The standard model of particle physics is not simple. The genome is not simple. The Razor provides no guarantee that the world's fundamental workings are accessible to simple explanation; it only provides a default in the absence of evidence.

Simplicity Is Context-Dependent

What counts as "simpler" is not always obvious. A model with fewer parameters may require a more complex prior; a model with more parameters may require a simpler structure. Different formalizations of simplicity — parameter count, description length, number of assumptions, prior probability — do not always agree. The Razor provides an orientation, not an algorithm.

Premature Simplicity in Medicine and Science

Multiple case studies in the history of medicine involve parsimonious explanations that were wrong and harmful. Peptic ulcers were attributed for decades to stress and lifestyle — a simple psychosomatic explanation — until Barry Marshall and Robin Warren (Nobel Prize, 2005) demonstrated that the majority of cases were caused by Helicobacter pylori infection. The simpler explanation (stress) was wrong; the correct explanation required positing a bacteria that could survive gastric acid, which seemed implausible enough that the community resisted it for years.

KISS (Keep It Simple, Stupid) is Occam's Razor applied to design and engineering. Attributed to Kelly Johnson, lead engineer at Lockheed's Skunk Works division in the 1960s, KISS holds that systems work best when they are simple, and that unnecessary complexity should be avoided in design. Where Occam's Razor is epistemic (about choosing explanations), KISS is practical (about building systems). Both express the same underlying intuition: complexity has costs, and those costs should not be incurred without necessity.

Einstein's formulation captures the critical qualification: "Everything should be made as simple as possible, but not simpler." The Razor justifies choosing simplicity among equally supported options; it does not justify oversimplifying to the point of losing accuracy. The limit of acceptable simplification is the point at which the simpler model fails to account for important observations.

Parsimony is the formal scientific name for the underlying principle: preference for explanations that make fewer assumptions, posit fewer entities, or require fewer resources. The law of parsimony and Occam's Razor are used interchangeably in philosophy of science and statistics.

The Feynman Technique can be understood as Occam's Razor applied to understanding: if you cannot explain something simply, you may be relying on more assumptions than you realize. Genuine understanding produces simple explanations; apparent understanding relying on technical vocabulary often conceals hidden complexity and gaps.

Hanlon's Razor is a domain-specific application: among explanations for bad human behavior, prefer the simpler one (incompetence) over the more complex one (coordinated malicious intent) unless evidence tips the balance.

Research on Simplicity

Columbia University statistician Andrew Gelman and colleagues have written extensively on the tension between simplicity and fit in statistical modeling, arguing in papers including "Bayesian Workflow" (arXiv, 2020) that the practical challenge is not applying Occam's Razor but operationalizing simplicity in ways that match scientific goals. The razor points in the right direction without specifying how far to walk.

Psychologists have studied human preference for simple explanations under the label of explanatory virtues. Research by Igor Douven (Sorbonne University, Paris) and colleagues published in Psychological Science (2016) found that people judge simple, elegant explanations as more probable than complex ones even when confronted with evidence that should update them toward the complex explanation — suggesting that Occam's Razor describes a genuine cognitive preference, not just a methodological prescription. This can be an asset (correctly preferring simpler explanations as priors) or a liability (anchoring on simple explanations even when evidence demands more complex ones).

Bayesian Formalization

The most rigorous modern treatment of Occam's Razor situates it within Bayesian probability theory. Bayesian model selection naturally penalizes complexity through the prior probability assigned to models: a model with more free parameters assigns some prior probability to a wider range of possible data, which means each specific dataset is assigned lower prior probability under the complex model than under a simpler model that makes stronger predictions.

This is Occam's Razor formalized: simpler models make stronger predictions about the data, and when those predictions are confirmed, the evidence more strongly supports the simpler model. Complex models that could fit almost any data are hard to confirm or disconfirm — they explain everything but predict nothing. The Bayesian framework shows why this is a genuine epistemic deficiency rather than mere aesthetic preference for simplicity.

William Jefferys (University of Texas) and James Berger (then at Purdue University) published an influential 1992 paper in American Scientist titled "Ockham's Razor and Bayesian Analysis," demonstrating formally that Bayesian inference automatically implements a version of Occam's Razor without any need for explicit penalization of complexity. The simplicity preference emerges from the mathematics of probability given the structure of how priors are distributed across model classes.

This formalization addresses a longstanding philosophical objection: why should we think simpler explanations are more likely to be true? The Bayesian answer is that simpler explanations make more specific predictions, and specific predictions that are confirmed constitute stronger evidence. The Razor is not an arbitrary aesthetic preference — it follows from the logic of evidence.

Occam's Razor in Business and Decision-Making

Business strategy is not physics, but parsimony applies with real force. Strategies that require many things to go right simultaneously are less robust than strategies that require fewer assumptions. A business model that depends on particular customer behavior, a specific competitive response, a regulatory environment remaining stable, and a technology maturing on schedule is fragile — any single assumption can break it. A model that depends on fewer assumptions is more durable.

This is Occam's Razor applied to strategic planning: among strategies that project similar outcomes, prefer the one that requires fewer assumptions to be true. The discipline of strategy involves explicitly listing the assumptions a plan depends on and asking which are least certain — a direct application of parsimony logic.

Amazon's approach to strategy illustrates this. Jeff Bezos has repeatedly described Amazon's strategy as built on things that won't change: customers will always want low prices, fast delivery, and wide selection. This is a parsimonious strategic foundation — three stable assumptions rather than many assumptions about how markets will evolve. Strategies built on stable assumptions compound over time; strategies built on contingent assumptions require continuous adaptation.

Investment analysis applies Occam's Razor through skepticism of complex stories. Charlie Munger and Warren Buffett of Berkshire Hathaway have repeatedly noted their preference for businesses they can understand simply — businesses whose value does not require a complex chain of assumptions to justify. The investment thesis "this business sells consumer staples at a premium because of brand loyalty built over decades" is parsimonious and robust. The thesis "this business will be worth forty times revenue in five years because it will capture 12% of an emerging market that will grow 30% annually driven by demographic shifts that will produce changes in consumer behavior consistent with our model" requires many things to be true. The Razor suggests the simpler thesis is more likely to be correct as a description of how value will actually accrue.

Common Misapplications

Confusing Simplicity with Familiarity

A common error is equating simple with familiar. A geocentric model of the solar system — the Earth at the center, celestial bodies moving in complex epicycles — was the familiar model in medieval Europe. The heliocentric model initially seemed more complex to many observers because it required abandoning familiar assumptions. In fact, the heliocentric model explained the same observations with a simpler underlying structure, eliminating the need for epicycles. Simplicity is a structural property of the explanation, not a measure of how comfortable or familiar it feels.

Applying the Razor to Settled Questions

Occam's Razor applies when evidence does not clearly resolve the question. When evidence does resolve it, the Razor is irrelevant. The existence of atoms, viruses, and dark matter in galaxies is not debated on grounds of simplicity — the evidence supports these entities directly or indirectly with high confidence. Appeals to Occam's Razor to deny well-evidenced complex explanations represent a misapplication of the principle.

Using It to Avoid Investigation

"The simple explanation is probably right" can become a justification for stopping inquiry prematurely. Medical history is full of cases where the simple explanation (psychosomatic cause, malingering, normal variation) was accepted without adequate investigation and the complex real explanation (rare disease, structural problem, medication interaction) was missed. The Razor is a prior for beginning inquiry, not a conclusion for ending it.

Occam's Razor and Artificial Intelligence Development

The relationship between Occam's Razor and AI is a live area of research and debate, with implications for how we understand what machine learning systems are doing and what risks they carry.

Solomonoff induction, developed by Ray Solomonoff in the 1960s, is a formal theory of machine learning based on Occam's Razor and Kolmogorov complexity. Solomonoff's formulation assigns prior probability to hypotheses inversely proportional to their description length in a universal programming language — shorter programs get higher prior probability. This is Occam's Razor mathematically formalized as a learning algorithm. While not computationally tractable, Solomonoff induction provides a theoretical ideal against which practical machine learning methods can be compared.

Regularization in neural networks implements Occam's Razor practically. Techniques including dropout (randomly deactivating neurons during training), weight decay (penalizing large parameter magnitudes), and early stopping (halting training before the model overfits training data) all implement versions of the preference for simpler models. The field of neural architecture search attempts to find the simplest architecture sufficient for a given task — automated Occam's Razor applied to network design.

Interpretability research in AI is partially motivated by Occam's Razor considerations. A model that can be explained simply — where the reasoning process is transparent and the decision factors are explicit — provides better epistemic grounds for trust than a model whose complexity makes explanation impossible. The black-box nature of large neural networks represents a failure of parsimony at the level of explanation even when the models perform well.

Relationship to Adjacent Concepts

Hanlon's Razor is a domain-specific application: among explanations for bad human behavior, prefer the simpler one (incompetence) over the more complex one (coordinated malicious intent) unless evidence tips the balance.

The Feynman Technique can be understood as Occam's Razor applied to understanding: if you cannot explain something simply, you may be relying on more assumptions than you realize. Genuine understanding produces simple explanations; apparent understanding relying on technical vocabulary often conceals hidden complexity and gaps.

The Cobra Effect involves a failure of parsimony in policy design: designers who do not model the simplest behavioral response to an incentive — how a rational actor will respond to what they are actually being paid for — produce interventions that backfire. The simple model of behavior ("people will optimize for what they are rewarded for") is the parsimonious prior and is more often correct than optimistic models of how people will respond to incentives.

Second-order thinking can appear to conflict with Occam's Razor — it adds complexity by requiring analysis of downstream effects. The resolution is that second-order thinking is not adding unnecessary complexity; it is building a more accurate model of how complex systems actually work. When the system is complex, the simple model that ignores second-order effects is not parsimonious — it is wrong. The genuinely parsimonious model is the simplest one that accurately accounts for the system's behavior.

References

  • Ockham, William of. Summa Logicae (c. 1323). Various translations.
  • Newton, Isaac. Philosophiae Naturalis Principia Mathematica. 1687. Trans. I. Bernard Cohen and Anne Whitman. University of California Press, 1999.
  • Hennig, Willi. Phylogenetic Systematics. University of Illinois Press, 1966. Trans. D. Dwight Davis and Rainer Zangerl.
  • Rissanen, Jorma. "Modeling by Shortest Data Description." Automatica, vol. 14, 1978, pp. 465-471.
  • Harvey, William. Exercitatio Anatomica de Motu Cordis et Sanguinis in Animalibus. Frankfurt, 1628.
  • Marshall, Barry J., and J. Robin Warren. "Unidentified Curved Bacilli in the Stomach of Patients with Gastritis and Peptic Ulceration." The Lancet, vol. 323, no. 8390, 1984, pp. 1311-1315.
  • Douven, Igor, and Jonah Schupbach. "The Role of Explanatory Considerations in Updating." Cognition, vol. 142, 2015, pp. 299-311.
  • Jefferys, William H., and James O. Berger. "Ockham's Razor and Bayesian Analysis." American Scientist, vol. 80, no. 1, 1992, pp. 64-72.
  • Gelman, Andrew, et al. "Bayesian Workflow." arXiv, 2020. arXiv:2011.01808.

Frequently Asked Questions

What is Occam's Razor?

Occam's Razor is the principle that among competing explanations or hypotheses, the one that makes the fewest assumptions should be preferred, all else being equal. It is often stated as 'entities should not be multiplied beyond necessity' — the original Latin formulation attributed to William of Ockham. It does not say the simplest explanation is always correct; it says that simplicity is a reason to prefer one hypothesis over another when evidence is otherwise equal.

Who was William of Ockham?

William of Ockham (c. 1287-1347) was an English Franciscan friar, philosopher, and theologian. He studied and taught at Oxford and became one of the major figures of medieval scholastic philosophy. His application of parsimony to philosophical argument — preferring explanations with fewer ontological commitments — became associated with his name after his death. The 'razor' metaphor suggests cutting away unnecessary assumptions.

Did William of Ockham actually say 'the simplest explanation is usually right'?

No. Ockham's formulation was more technical: 'plurality must never be posited without necessity' (pluralitas non est ponenda sine necessitate). He did not say simpler explanations are more likely to be true — he said unnecessary complexity should be avoided as a matter of intellectual discipline. The stronger empirical claim (simpler explanations are more often correct) was added by later scientists and philosophers.

How is Occam's Razor used in science?

Scientists use parsimony to choose among competing hypotheses when experimental evidence does not definitively settle the question. In phylogenetics (evolutionary biology), parsimony analysis selects the evolutionary tree requiring the fewest evolutionary changes to explain observed genetic data. In physics, simpler mathematical models are preferred when they fit observations as well as complex alternatives. Isaac Newton explicitly invoked parsimony in his 'Principia Mathematica,' stating he would not multiply causes beyond what observations required.

How does Occam's Razor apply to machine learning and AI?

In machine learning, Occam's Razor underlies regularization techniques that penalize model complexity to prevent overfitting. A model with fewer parameters that fits training data reasonably well is generally preferred over a highly complex model that fits training data perfectly but fails to generalize. Minimum Description Length (MDL) principle, developed by Jorma Rissanen in the 1970s, is a formalization of Occam's Razor applied to model selection.

When does Occam's Razor fail?

Occam's Razor fails when the simpler explanation is actually wrong. Nature is not obligated to be simple. Quantum mechanics, general relativity, and the Standard Model of particle physics are not simple — they are the correct descriptions of a complex reality. In medicine, rare diseases are real even when common explanations fit better. The aphorism 'when you hear hoofbeats, think horses not zebras' (common diseases first) is Occam's Razor applied medically, but zebras exist and patients with rare conditions are harmed by systematic exclusion of complex explanations.

What is the KISS principle and how does it relate to Occam's Razor?

KISS (Keep It Simple, Stupid) is an engineering and design principle attributed to Kelly Johnson of Lockheed's Skunk Works division in the 1960s. It expresses a similar intuition to Occam's Razor applied to design: systems work better when they are simple. Where Occam's Razor is epistemological (about choosing explanations), KISS is practical (about designing systems). Both express a general preference for simplicity, but in different domains.

Is Occam's Razor the same as the law of parsimony?

Essentially yes. The 'law of parsimony' is the formal scientific version of the same principle: when competing hypotheses fit the evidence equally well, prefer the one that makes fewer assumptions or posits fewer entities. The terms are used interchangeably in philosophy of science, though 'law of parsimony' tends to appear in more technical philosophical and scientific contexts.

What is Einstein's version of Occam's Razor?

Einstein is often quoted as saying 'Everything should be made as simple as possible, but not simpler.' This captures the crucial qualification: Occam's Razor justifies preferring simplicity among equally supported hypotheses, but it does not justify oversimplifying to the point of losing accuracy. The razor cuts unnecessary complexity; it does not demand false simplicity.