Data science was named "the sexiest job of the 21st century" by Harvard Business Review in 2012, and the title stuck. Over the decade that followed, companies scrambled to hire data scientists, universities launched dedicated programmes, and bootcamps promised six-figure salaries to anyone who could learn Python and statistics. What the hype obscured was a messier, more interesting reality: data scientists spend most of their time cleaning data, arguing about definitions with stakeholders, and building models that turn out to be less accurate than a simple average. The gap between the glamour of the title and the texture of the work is significant, and understanding it honestly is the starting point for anyone seriously considering the career.

This article exists because the question "what does a data scientist do?" has a genuinely complicated answer that varies enormously by company size, industry, and seniority level. A data scientist at a 15-person startup is doing something quite different from a principal data scientist at a large technology company, and both are doing something different from a data scientist at an insurance company, a hospital, or a government agency. The common thread is the application of statistical and computational methods to extract meaning from data — but the specific work, the tools, the pace, and the business context differ widely.

We will cover the day-to-day reality of the role, the skills and tools required, salary ranges by level and country (with sources), how the data scientist title relates to adjacent roles like data analyst and machine learning engineer, and the career path from junior to principal level. If you are deciding whether to pursue this path or are trying to understand what you are actually hiring for, this guide provides the honest picture.

"The ability to take data — to be able to understand it, to process it, to extract value from it, to visualise it, to communicate it — that's going to be a hugely important capability in the next decades." — Hal Varian, Chief Economist at Google, 2009


Key Definitions

Data science: The interdisciplinary field combining statistics, computer science, and domain knowledge to extract insights and build predictive models from structured and unstructured data.

Machine learning (ML): A subset of artificial intelligence in which models learn patterns from data rather than being explicitly programmed with rules. Most ML work in industry involves supervised learning — training models on labelled examples.

Feature engineering: The process of selecting and transforming raw data variables into inputs that improve model performance. Often the most impactful and time-consuming part of building a useful model.

Data pipeline: The automated system that moves data from source to storage to analysis or model serving. Data scientists often build or depend heavily on data pipelines maintained by data engineers.

A/B testing: A controlled experiment in which two versions of a product or intervention are shown to random user subgroups to measure the causal effect of a change. Statistical rigour in A/B testing is a core data science skill in product-driven companies.


What a Data Scientist Does: The Real Day-to-Day

The most honest description of a data scientist's day is: more time wrangling messy data than training models, more time in meetings than running code, and more time explaining results than generating them.

A 2020 survey by Anaconda of 2,360 data professionals found that respondents spent 45% of their time on data preparation tasks — locating data, cleaning it, and transforming it into usable form. Only 19% of time went to model building and training. This ratio surprises people entering the field from coursework, where clean, well-structured datasets are provided and the interesting part begins immediately.

A Typical Day at a Mid-Size Tech Company

The morning often begins with checking whether any overnight jobs failed. Data pipelines break — databases time out, upstream data schema changes without notice, scheduled jobs run out of memory. A data scientist who owns production models or dashboards spends real time on monitoring and maintenance.

By mid-morning, the work might shift to a current project: perhaps building a churn prediction model for the customer success team. This means querying the data warehouse (using SQL) to pull a training dataset, examining the distribution of the target variable, engineering features from raw event logs, and running initial model experiments in a Jupyter notebook. The first model will almost certainly not be the last — iteration is the core loop.

Afternoons frequently involve meetings. A data scientist in a product-embedded role might attend sprint planning, a stakeholder review of recent A/B test results, or a cross-functional discussion about what metrics to track for an upcoming feature launch. Communication skills matter enormously: the ability to explain a confidence interval, or why a model performs well on average but poorly in a specific segment, to a non-technical audience is what separates scientists who create impact from those who produce reports nobody reads.

Late in the day, there may be code review — reviewing a colleague's analysis for statistical errors, or checking whether the logic in a data pipeline matches the business definition of a metric. Senior data scientists mentor junior colleagues, review their code, and are expected to catch subtle errors in methodology.

What Varies by Company Type

At large technology companies (Google, Meta, Amazon, Spotify), data scientists often sit within product teams and focus heavily on experiment design and analysis. The infrastructure is mature, data is abundant, and the primary work is measuring the impact of product changes accurately.

At startups, data scientists are often more generalist — building dashboards, writing ETL jobs, running ad hoc analyses, and sometimes doing work that at larger companies would belong to a data engineer or analyst. The breadth is higher; the depth infrastructure is thinner.

At traditional industries (retail, insurance, banking, healthcare), data scientists often work on more traditional predictive modelling — credit risk models, demand forecasting, fraud detection — with longer project cycles and more regulatory considerations.


Required Skills

Technical Skills

Python is the primary language of modern data science. Proficiency includes not just syntax but the scientific Python ecosystem: NumPy and pandas for data manipulation, scikit-learn for machine learning, matplotlib and seaborn for visualisation, and increasingly PyTorch or TensorFlow for deep learning work.

SQL is non-negotiable. Virtually every organisation stores data in relational databases or data warehouses (Snowflake, BigQuery, Redshift). A data scientist who cannot write efficient SQL joins, window functions, and aggregations is unable to access the data needed for any analysis. Ironically, SQL fluency often matters more in daily work than ML knowledge.

Statistics and probability form the theoretical foundation. This means understanding distributions, hypothesis testing, confidence intervals, regression assumptions, and experimental design. Without this foundation, it is easy to build models that appear to work but are actually measuring noise.

Machine learning knowledge includes understanding the tradeoffs between model types (linear models, decision trees, gradient boosting, neural networks), knowing when each applies, and understanding common pitfalls like overfitting, data leakage, and distributional shift.

Data visualisation — the ability to turn analysis into clear charts and communicate findings visually — is underrated in most technical curricula but critical in practice.

Soft Skills

Communication is the meta-skill. Data science impact is zero unless the results are understood and acted on by decision-makers. This requires the ability to present statistical findings without jargon, to frame analysis in business terms, and to push back when stakeholders want to misuse or over-interpret results.

Intellectual honesty — the willingness to say "the data does not support that conclusion" or "our model has a significant limitation here" — is rarer and more valuable than technical brilliance.


Salary Ranges by Level and Country

The following figures represent total compensation (base salary plus bonus; equity not included unless noted) as of 2024, drawing on Levels.fyi, Glassdoor, and LinkedIn Salary data.

United States

Level Title Total Compensation (USD)
Entry (0-2 years) Junior / Associate Data Scientist $90,000 - $130,000
Mid (2-5 years) Data Scientist $130,000 - $175,000
Senior (5-8 years) Senior Data Scientist $160,000 - $220,000
Staff / Principal (8+ years) Staff / Principal Data Scientist $220,000 - $350,000+

At FAANG-tier companies, total compensation including equity typically adds 30-70% to these figures at senior and principal levels.

United Kingdom: Entry-level roles pay GBP 35,000-50,000. Senior data scientists earn GBP 70,000-100,000. London rates are 20-30% higher than the national average.

Germany: Entry roles pay EUR 45,000-65,000. Senior roles pay EUR 75,000-100,000. Berlin and Munich are highest-paying markets.

Canada: Entry roles pay CAD 75,000-100,000. Senior roles pay CAD 120,000-160,000.

India: Entry roles (especially in Bangalore, Hyderabad, or Mumbai at large technology companies) pay INR 800,000-1,500,000. Senior roles at top companies can reach INR 3,000,000-6,000,000.

Australia: Entry roles pay AUD 80,000-110,000. Senior roles pay AUD 130,000-180,000.


Data Scientist vs Data Analyst vs Machine Learning Engineer

These three titles are frequently confused, and the confusion is compounded by different companies using them inconsistently.

Data analyst roles focus on describing the past. The primary tools are SQL, Excel, and BI dashboards (Tableau, Looker, Power BI). Analysts answer questions like "how did revenue change last quarter?" and "which customer segments are churning?" They surface patterns; they typically do not build predictive models.

Data scientist roles focus on prediction and inference. They use statistical modelling and machine learning to answer questions like "which customers are most likely to churn next month?" and "what would have happened to revenue if we had made this product change six months ago?" The scope includes A/B test design, model building, and statistical analysis that goes beyond what basic BI tools support.

Machine learning engineer roles focus on production systems. They take models that data scientists develop and build the software infrastructure to serve those models reliably at scale — handling deployment, versioning, monitoring for drift, and maintaining latency requirements. This role requires stronger software engineering skills and less statistical depth than a data scientist role.

In practice, the lines blur. Many data scientists deploy their own models at small companies. Many analysts do light predictive work. The distinction is most meaningful at companies large enough to have all three roles as separate functions.


Career Path: Junior to Principal

Junior / Associate Data Scientist (0-2 years): Works on well-scoped problems with significant guidance. Learns the data stack, gets comfortable with the codebase and infrastructure, and builds foundational skills in SQL, Python, and basic modelling. Impact is largely through completing assigned analyses accurately and on time.

Data Scientist (2-5 years): Works more independently, scopes own analyses, and starts identifying problems worth solving rather than just answering questions asked. Begins influencing product or business decisions directly through analysis. May mentor junior team members.

Senior Data Scientist (5-8 years): Leads significant projects end-to-end, influences team direction, and is trusted to handle ambiguous, high-stakes problems with minimal guidance. Often the primary data science voice in cross-functional discussions. Mentors junior and mid-level scientists.

Staff Data Scientist (8-12 years): Works across multiple teams or a major product area. Identifies opportunities that individual teams are not seeing, shapes methodology standards across the organisation, and contributes to hiring and team building. Comparable to a senior engineering lead in organisational influence.

Principal Data Scientist (12+ years): Company-wide technical leadership. Sets the direction for how data science is practised, identifies the most important problems for the organisation to apply data science to, and is typically a known expert outside the company in their domain. Rare role that requires combining deep technical expertise with strategic business thinking.


How to Get Started

For career changers: A bootcamp or self-study programme covering Python, SQL, and basic statistics is a viable entry point. The key is building a portfolio of real projects — not toy datasets, but work that demonstrates you can frame a business question, source data, analyse it rigorously, and communicate findings clearly.

For recent graduates: A degree in statistics, computer science, mathematics, economics, or a quantitative social science is the standard background. Domain knowledge — healthcare, finance, climate — increasingly differentiates candidates for specialised roles.

For analysts looking to transition: The jump from data analyst to data scientist is achievable and common. The gap is usually in machine learning skills and statistical depth. Targeted study of ML fundamentals (Andrew Ng's courses on Coursera remain a solid foundation) combined with project work closes that gap over 6-12 months for motivated practitioners.


Pros and Cons

Pros: High salary ceiling, strong job market, intellectually stimulating work, applicable across almost every industry, remote-friendly in most organisations.

Cons: Much of the work is unglamorous data cleaning, not sophisticated modelling. Results are often ambiguous or ignored. Career progression can stall if you do not develop stakeholder management skills. The field moves fast and requires continuous learning investment to remain current.


Practical Takeaways

The most important skill difference between data scientists who advance quickly and those who plateau is not technical — it is the ability to identify which questions are worth answering. Building a perfect model for the wrong problem is a common failure mode. Senior practitioners spend proportionally more time deciding what to work on than executing the work itself.

If you are entering the field, invest heavily in SQL and communication before optimising for deep learning knowledge. The majority of data science work at most companies does not use neural networks, but all of it uses SQL and all of it requires persuading someone to act on the results.


References

  1. Davenport, T. H., & Patil, D. J. "Data Scientist: The Sexiest Job of the 21st Century." Harvard Business Review, October 2012.
  2. Anaconda. "2020 State of Data Science Report." Anaconda, Inc., 2020.
  3. Bureau of Labor Statistics, US Department of Labor. "Occupational Outlook Handbook: Data Scientists." BLS.gov, 2023-24 edition.
  4. Levels.fyi. "Data Science Salary Data." Levels.fyi, accessed 2024.
  5. Glassdoor. "Data Scientist Salaries." Glassdoor.com, 2024.
  6. LinkedIn Economic Graph. "Jobs on the Rise 2024." LinkedIn, 2024.
  7. Provost, F., & Fawcett, T. "Data Science for Business." O'Reilly Media, 2013.
  8. Ng, A. "Machine Learning Specialisation." Coursera / DeepLearning.AI, 2022 edition.
  9. Grus, J. "Data Science from Scratch." O'Reilly Media, 2nd edition, 2019.
  10. Kaggle. "State of Data Science and Machine Learning Survey." Kaggle, 2023.
  11. McKinsey Global Institute. "The Age of Analytics: Competing in a Data-Driven World." McKinsey & Company, 2016.
  12. VanderPlas, J. "Python Data Science Handbook." O'Reilly Media, 2016.

Frequently Asked Questions

What does a data scientist do day to day?

Day-to-day work varies by company and seniority but typically involves querying databases to extract data, cleaning and transforming messy datasets, building statistical or machine learning models, and presenting findings to stakeholders. A significant portion — often 50-70% of time — is spent on data wrangling rather than modelling.

What is the difference between a data scientist and a data analyst?

Data analysts focus on describing what happened using SQL, dashboards, and reports. Data scientists go further by building predictive models and running experiments to understand why something happened and what will happen next. In practice the boundary is blurry and job titles vary widely by company.

What skills do you need to become a data scientist?

Core technical skills include Python or R, SQL, statistics and probability, and familiarity with machine learning libraries such as scikit-learn, TensorFlow, or PyTorch. Strong communication is equally important — data scientists must translate complex results into clear business recommendations.

How much do data scientists earn?

In the United States, entry-level data scientists earn \(90,000-\)120,000 per year. Senior data scientists earn \(140,000-\)200,000, and principal or staff-level roles at major tech companies can exceed $300,000 in total compensation including equity. Salaries outside the US are typically 30-60% lower.

How is a data scientist different from a machine learning engineer?

Data scientists focus on analysis, experimentation, and model development. Machine learning engineers take those models and build the production infrastructure to serve them at scale — handling deployment, monitoring, and reliability. At smaller companies, one person often does both jobs.