Interpreting Data Correctly: Avoiding Common Analysis Mistakes

In 1854, physician John Snow removed the handle from the Broad Street water pump in London's Soho district. Cholera deaths in the neighborhood dropped dramatically. Snow had traced the epidemic to contaminated water---not "bad air" (miasma), the prevailing medical theory---by meticulously mapping deaths and interviewing families about their water sources.

Snow did not have a p-value. He did not run a regression. He interpreted data correctly by understanding context, questioning assumptions, and following evidence rather than convention. His work is considered the founding moment of epidemiology and a masterclass in data interpretation.

Today, organizations have access to more data than Snow could have imagined. Yet misinterpretation remains epidemic. Analysts confuse correlation with causation, ignore sample sizes, cherry-pick favorable results, and present conclusions that crumble under scrutiny. The data is not the problem. The interpretation is.


The Interpretation Framework: Context Before Calculation

Before performing any calculation, a competent analyst asks five questions:

1. How was this data collected?

The collection method determines what the data can and cannot tell you. A voluntary customer survey captures opinions of people who fill out surveys---not customers generally. Web analytics track JavaScript-enabled browsers with cookies---not all visitors.

Example: In 2016, pre-election polls in the United States consistently predicted a Clinton victory. Post-election analysis revealed systematic non-response bias: Trump supporters were less likely to participate in polls. The data was meticulously collected and analyzed. The sample was wrong.

2. What population does this data represent?

Every dataset is a sample from some population. Understanding that population---and its boundaries---prevents overgeneralization.

A study of Stanford computer science students tells you about Stanford CS students. It does not tell you about programmers generally, tech workers broadly, or Americans at large. Yet findings from elite university studies are routinely generalized to populations they do not represent.

3. What is missing from this data?

The most important question in data interpretation. Missing data is rarely random. People who cancel subscriptions do not fill out exit surveys. Patients who die are excluded from recovery statistics. Companies that fail disappear from industry benchmarks.

4. What timeframe does this data cover?

A company showing 40% revenue growth is impressive---unless you discover the comparison period was a COVID lockdown trough. Time period selection dramatically affects interpretation.

5. What external factors might influence these numbers?

Data does not exist in a vacuum. A spike in website traffic might correlate with a marketing campaign, a competitor outage, a viral social media mention, or seasonal patterns.


Correlation, Causation, and the Space Between

The distinction between correlation and causation is the single most important concept in data interpretation. Despite being taught in every statistics course, it is routinely violated in practice.

The Three Correlation Traps

Confounding variables: Both observed variables are driven by a third, unobserved factor.

Example: Cities with more police officers have higher crime rates. Does hiring police cause crime? Obviously not. Both are driven by city size and urbanization. But this exact logical error appears in business analytics regularly: "Countries where we advertise more have higher sales" might mean advertising drives sales, or it might mean you advertise more in countries where demand already exists.

Reverse causation: The direction of causation is opposite to what is assumed.

Example: Companies with strong cultures have higher revenue. Does culture cause revenue? Or does revenue enable investment in culture? Likely both, but the causal arrow matters for decision-making. If you are a struggling startup, copying Google's perks will not give you Google's revenue.

Selection effects: The way data was selected creates artificial correlations.

Example: Among NBA players, height and basketball skill are not strongly correlated. This seems counterintuitive---taller people are better at basketball. But the selection effect is strong: to make the NBA, short players must be extraordinarily skilled. The filtering process creates an artificial negative correlation between height and skill within the selected population.

Establishing Causation

True causal inference requires more than correlation:

  1. Temporal precedence -- The cause must precede the effect
  2. Covariation -- Changes in the cause correspond to changes in the effect
  3. Elimination of alternatives -- Confounding factors are ruled out

The gold standard is the randomized controlled trial (RCT): randomly assign subjects to treatment and control groups, intervene on the treatment group only, and measure the difference. A/B testing in tech is the business equivalent.

When RCTs are not possible, quasi-experimental methods like difference-in-differences, regression discontinuity, and instrumental variables offer weaker but useful causal evidence.

Understanding how measurement methods introduce systematic bias is essential for interpreting any dataset accurately.


Statistical Significance: The Most Misunderstood Concept in Data

What P-Values Actually Mean

A p-value is the probability of observing results as extreme as the data, assuming the null hypothesis is true. A p-value of 0.03 means: "If there were truly no effect, we would see results this extreme 3% of the time by chance."

What p-values do not mean:

  • The probability that the null hypothesis is true (a common and dangerous misinterpretation)
  • The probability that the result will replicate
  • That the effect is meaningful or important
  • That the result is practically significant

The Significance Trap

Statistical significance is not practical significance. With a large enough sample, even trivially small effects become statistically significant.

Example: An A/B test with 10 million visitors per variant detects a 0.01% improvement in conversion rate with p < 0.001. Statistically significant? Yes. Practically meaningful? Almost certainly not. The cost of implementing the change likely exceeds the value of a 0.01% improvement.

Conversely, practically large effects can be statistically non-significant with small samples. A test showing 20% conversion improvement with 50 visitors per variant might have p = 0.15. Not statistically significant, but potentially a real and valuable effect that the sample was too small to detect.

Better Practices

  • Report effect sizes alongside p-values (how big is the difference?)
  • Report confidence intervals (what range of effects is plausible?)
  • Calculate statistical power before running tests (can this test detect the effect we care about?)
  • Distinguish practical from statistical significance explicitly
  • Consider Bayesian approaches that provide more intuitive probability statements

Simpson's Paradox: When Aggregation Lies

Simpson's Paradox occurs when a trend present in several groups of data reverses when the groups are combined.

A Real-World Case Study

A hospital compares surgery success rates between two surgeons:

Surgeon Easy Cases Success Hard Cases Success Overall Success
Dr. Adams 95% (95/100) 70% (7/10) 92.7% (102/110)
Dr. Baker 98% (49/50) 75% (75/100) 82.7% (124/150)

Dr. Adams has a higher overall success rate (92.7% vs 82.7%). But Dr. Baker performs better in both easy cases (98% vs 95%) and hard cases (75% vs 70%).

The paradox arises because Dr. Baker takes on far more difficult cases. When aggregated, the imbalance in case difficulty hides the fact that Dr. Baker is the superior surgeon in every category.

Implications for Business Analytics

Simpson's Paradox appears anywhere data is aggregated across unbalanced groups:

  • Overall conversion rates can decline even when every channel's conversion rate improves (if traffic shifts toward lower-converting channels)
  • Average revenue per customer can increase while revenue per customer in every segment decreases (if customer mix shifts toward higher-spending segments)
  • Overall employee satisfaction can improve while satisfaction in every department worsens (if growing departments had higher baseline satisfaction)

Defense

Always examine data at multiple levels of aggregation. When the story changes between aggregate and segmented views, understand why. The segmented view is usually more truthful, but the aggregation is not necessarily wrong---it reflects a real compositional change.


The Perils of Averages

Mean vs. Median: A Critical Distinction

The mean (arithmetic average) is distorted by outliers. The median (middle value) is resistant to outliers.

Example: In a room of 10 people earning $50,000 each, the mean salary is $50,000 and the median is $50,000. Jeff Bezos walks in. The mean salary jumps to approximately $10 billion. The median remains $50,000.

When income distribution, company revenue, or user engagement data is skewed (as most real-world data is), the mean is misleading. Report the median and percentiles instead.

Other Average Traps

Ecological fallacy: Drawing conclusions about individuals from group averages. The average income in a wealthy zip code does not mean every resident is wealthy.

Aggregation over time: Monthly averages can hide weekly patterns. A restaurant averaging 100 covers per day might have 150 on weekends and 70 on weekdays. Staffing to the average fails both peaks and troughs.

Weighted vs. unweighted averages: Averaging department satisfaction scores gives equal weight to a department of 5 and a department of 500. Weight by headcount for an accurate organizational picture.


Handling Missing Data: What You Do Not See Matters

Missing data is one of the most under-discussed problems in practical analytics. Data is missing from virtually every real-world dataset, and the way it is missing determines the appropriate handling strategy.

Types of Missingness

Missing Completely at Random (MCAR): The probability of missing data is unrelated to any variable. A sensor fails due to hardware defect, producing random gaps. Dropping incomplete records introduces no bias.

Missing at Random (MAR): Missingness is related to observed variables but not the missing values themselves. Younger respondents skip income questions more often than older respondents. Once you account for age, income missingness is random.

Missing Not at Random (MNAR): Missingness is related to the missing value itself. High earners refuse to report income. Depressed patients skip wellbeing surveys. This is the most dangerous type---and the most common in practice.

Handling Strategies

  1. Complete case analysis -- Use only records with no missing values. Simple but reduces sample size and may introduce bias.
  2. Mean/median imputation -- Replace missing values with the average. Preserves sample size but understates variance.
  3. Multiple imputation -- Create multiple completed datasets with different imputed values, analyze each, and combine results. Statistically rigorous.
  4. Model-based imputation -- Predict missing values using other variables. Powerful but requires modeling assumptions.
  5. Indicator variables -- Add a flag indicating whether data was missing, then use the flag in analysis.

The most critical step: always report how much data is missing and how you handled it. Silent deletion of missing records is a form of cherry-picking.


Extrapolation extends observed trends beyond the range of available data. It is seductive and dangerous.

Example: In 2007, a linear extrapolation of iPhone sales would have predicted Apple selling 2 billion iPhones per quarter by 2020. Growth curves flatten. Markets saturate. Trends that hold for five years rarely hold for twenty.

Safe vs. Dangerous Extrapolation

Relatively safe: Short-term extrapolation in stable environments with well-understood mechanisms. A retailer forecasting next month's sales based on seasonal patterns and recent trends.

Dangerous: Long-term extrapolation of growth rates, extrapolation outside the range of observed conditions, and extrapolation without understanding the underlying mechanism.

The Takeaway

If you understand why a trend exists, you can extrapolate cautiously. If you only observe that a trend exists, extrapolation is speculation wearing the costume of analysis.

Building solid analytical foundations requires learning how to avoid common analytics mistakes systematically rather than relying on intuition.


Building Interpretation Discipline

The Pre-Analysis Checklist

Before presenting any conclusion:

  1. Is the sample representative of the population I care about?
  2. Is the sample large enough to support this conclusion?
  3. Have I checked for confounding variables?
  4. Am I confusing correlation with causation?
  5. Does the conclusion hold when I disaggregate the data?
  6. Am I using the right summary statistic (mean vs. median)?
  7. How much data is missing, and could missingness bias results?
  8. Am I extrapolating beyond the data's range?
  9. Has someone outside the project reviewed my interpretation?
  10. What is the simplest alternative explanation for this pattern?

The Red Team Approach

Before making decisions based on data analysis, assign someone to argue against the conclusion. Their job:

  • Find alternative explanations
  • Identify missing data
  • Challenge sample representativeness
  • Check for cherry-picking
  • Verify statistical methodology

This adversarial approach catches interpretation errors that collaborative analysis misses.

Data interpretation is ultimately about intellectual honesty: the willingness to follow evidence to unwelcome conclusions, to acknowledge uncertainty, and to say "I don't know" when the data does not support a clear answer. Organizations that cultivate this honesty make better decisions---not because they have better data, but because they read it more truthfully.


References

  • Snow, John. On the Mode of Communication of Cholera. John Churchill, 1855. https://www.ph.ucla.edu/epi/snow/snowbook.html
  • Kahneman, Daniel. Thinking, Fast and Slow. Farrar, Straus and Giroux, 2011.
  • Wasserstein, Ronald L. and Lazar, Nicole A. "The ASA Statement on p-Values." The American Statistician, 2016. https://www.tandfonline.com/doi/full/10.1080/00031305.2016.1154108
  • Pearl, Judea. The Book of Why: The New Science of Cause and Effect. Basic Books, 2018.
  • Rubin, Donald B. "Multiple Imputation for Nonresponse in Surveys." Wiley Series in Probability and Statistics, 1987.
  • Simpson, Edward H. "The Interpretation of Interaction in Contingency Tables." Journal of the Royal Statistical Society, 1951.
  • Silver, Nate. The Signal and the Noise. Penguin Books, 2012.
  • Tufte, Edward. The Visual Display of Quantitative Information. Graphics Press, 2001.
  • Wheelan, Charles. Naked Statistics. W.W. Norton, 2013.
  • Angrist, Joshua D. and Pischke, Jorn-Steffen. Mostly Harmless Econometrics. Princeton University Press, 2009.