A Day in the Life of a Data Scientist

Q: "How much time do data scientists spend cleaning data?"

"Industry surveys consistently show data scientists spend 45-60% of their time on data collection, cleaning, and preparation - leaving only 20-40% for the modelling and analysis work most people imagine when they picture the role."

Q: "How many meetings do data scientists have?"

"Most data scientists spend 2-4 hours per day in meetings. Senior data scientists typically have heavier meeting loads - sometimes 3-5 hours daily - across requirements discussions, stakeholder reviews, and team standups."

Q: "Is data science more creative or analytical?"

"Both, at different stages. Problem framing and feature engineering require creativity; model evaluation and statistical testing require analytical rigour. The ratio shifts depending on the project phase."

Q: "How does a startup data scientist day differ from big tech?"

"Startup data scientists own a much broader range of tasks - pipelines, dashboards, modelling, and stakeholder work. Big tech data scientists have specialised roles with better infrastructure support but often less direct visibility into business impact."

Q: "What is the most frustrating part of being a data scientist?"

"Data quality and data access issues are the primary frustration cited by most data scientists - spending weeks cleaning data or waiting for engineering to fix pipelines before the actual analytical work can begin."

The popular image of data science - elegant statistical models surfacing hidden insights from vast datasets, changing business strategy overnight - bears little resemblance to the typical data scientist's Tuesday.

The gap between the marketing materials and the lived experience of the job is wider in data science than in almost any other technology role, and that gap has real consequences: people enter the field with wrong expectations, get frustrated quickly, and either leave or struggle to perform.

Understanding what data scientists actually do on a typical day - including the parts that nobody mentions in recruiting videos - matters enormously for career decisions. If you love modelling and find data cleaning tedious, you should know that your day will be dominated by cleaning.

If you dislike long stakeholder meetings, you should know that the senior versions of this role involve more communication and less coding, not less.

This article covers what data scientists actually do across a typical week, how the work differs by seniority, how the daily experience varies dramatically between startups, large tech companies, and consulting environments, and what the realistic project cycle looks like from discovery to deployment.

"If I had to describe data science in one sentence, I'd say it's mostly being a janitor who occasionally gets to do science." - Hillary Mason, founder of Fast Forward Labs and former Chief Scientist at Bitly, 2015 interview

Key Definitions

Data cleaning: The process of identifying and correcting errors, inconsistencies, null values, duplicates, and formatting issues in raw data before analysis or modelling. This typically consumes the majority of a data scientist's working time.

Stakeholder management: The ongoing process of communicating with business partners, product managers, and executives about analytical findings, project status, and data limitations. A critical skill that grows in importance with seniority.

Exploratory data analysis (EDA): The initial phase of any analysis project - understanding data structure, distributions, relationships, and anomalies before formulating hypotheses or building models.

Technical debt: Accumulated shortcuts in data pipelines, model implementations, or code quality that must eventually be addressed. Data scientists working in resource-constrained environments accumulate significant technical debt.

Model monitoring: Ongoing tracking of deployed model performance to detect data drift, degraded accuracy, or unexpected behaviour in production. An often-overlooked part of the job after initial model deployment.

Feature engineering: The process of using domain knowledge to transform raw data into informative inputs for machine learning models. Often more consequential for model performance than algorithm selection.

Data drift: The phenomenon in which the statistical properties of model inputs change over time relative to the training data, causing degraded model performance in production without any change to the model itself.

The 80% That Nobody Talks About

The most consistent finding across surveys of data scientists is the time allocation breakdown.

The 2022 Kaggle State of Data Science survey (covering 23,997 respondents across 167 countries) found that respondents spend an average of 26% of their time on data collection, 19% on data cleaning, 18% on building and selecting models, 11% on visualising data, and 9% on putting models into production.

The two cleaning-related categories together account for approximately 45% of time - and that is before you count the debugging and investigation required when data pipelines break.

A 2016 survey by CrowdFlower famously found that data scientists spend 60% of their time cleaning and organising data, and reported it as their least favourite part of the work. Years later, little has changed structurally.

The tools have improved - dbt, Great Expectations, and modern data warehouses make data quality management more tractable - but the underlying problem remains: real-world data is messy in ways that cannot be fully automated away.

What does this look like in practice? It looks like spending half a day tracing why a conversion metric in the analytics table suddenly changed, only to discover that a product team changed how they log events three weeks ago without updating the documentation.

It looks like a dataset that has customer_id in three different formats depending on which source system it came from. It looks like revenue figures that do not reconcile between the finance system and the analytics warehouse.

DJ Patil and Thomas Davenport, writing in Harvard Business Review (2012), described data scientists as needing to be simultaneously comfortable with statistical modelling, programming, and business communication - a combination they called "the unicorn skill set." What they did not emphasise is how much of that unicorn's day is spent debugging Python environments and chasing schema changes.

How Time Is Actually Spent

Activity	Expected Allocation	Actual Allocation (Survey Data)
Data cleaning and preparation	10-20%	45-60%
Exploratory data analysis	20-30%	15-20%
Model building and tuning	30-40%	10-15%
Communication and presentation	10-15%	5-10%
Infrastructure and pipeline work	5-10%	20-30%
Model monitoring and maintenance	0-5%	5-10%

The monitoring and maintenance row deserves particular attention. Chip Huyen's Designing Machine Learning Systems (O'Reilly, 2022) documents extensively that model maintenance - keeping deployed models performing as expected as data distributions shift - is one of the most consistently underestimated time commitments in the field.

A model shipped to production is not finished work; it requires ongoing attention that junior data scientists rarely plan for and often find themselves responsible for.

A Junior Data Scientist's Typical Day

Junior data scientists (roughly 0-3 years of experience) spend most of their time executing defined tasks with close supervision. The day typically looks something like this:

Morning (9:00-11:00): Team standup (15-30 minutes), followed by hands-on work - likely data exploration or cleaning on an assigned project. A junior DS might spend this time investigating why a model's output changed after a recent data pipeline update, writing SQL queries to understand the data, and documenting findings.

Late morning (11:00-12:00): Collaboration time - a one-on-one with a senior data scientist to discuss approach, or a cross-functional meeting with a product manager who has requested an analysis.

Afternoon (13:00-16:00): Execution work. Feature engineering, model training runs, writing code, reviewing output. Jupyter notebooks are common at this stage, though companies with better engineering culture encourage moving to proper Python scripts with tests.

Late afternoon (16:00-17:30): Documentation, code reviews, updating project trackers, reviewing feedback on a presentation or analysis shared earlier in the week.

The coding-to-meeting ratio at junior level is typically higher than later career stages - perhaps 60-70% heads-down technical work, 30-40% communication and collaboration. The technical work often feels less glamorous than expected: more debugging, more data investigation, less clever model building.

The First Year Surprise

Surveys of first-year data scientists consistently surface a common theme: the gap between expectation and reality is largest in the first 6-12 months.

A 2023 survey by the Data Science Weekly newsletter found that 61% of respondents in their first data science role said they spent "significantly more time than expected" on data quality issues, and 44% said their first year involved "little to no machine learning work." The majority of first-year time, across most company types, is spent building the foundational familiarity with the company's data systems, pipelines, and domain that enables more advanced work later.

This is not a failure of the role or the individual - it is the nature of joining any complex technical organisation. The data scientist who understands this expectation is far better positioned to make productive use of that first year than one who is frustrated by it.

A Mid-Level Data Scientist's Typical Day

Mid-level data scientists (3-7 years) have moved beyond purely executing assigned work and are beginning to define their own analytical agenda in collaboration with stakeholders. They run projects end to end, manage junior team members on specific tasks, and contribute significantly to technical design decisions.

Morning (9:00-10:00): Review overnight model training results and any automated monitoring alerts. Write a brief Slack update to stakeholders on the status of an ongoing project.

Mid-morning (10:00-12:00): Lead a requirements meeting with a product manager for a new analytical project. Convert the discussion into a scoped project brief with defined deliverables and timeline.

Afternoon (13:00-15:00): Deep technical work - building and iterating on a recommendation system, or constructing a statistical test design for an upcoming A/B experiment.

Mid-afternoon (15:00-16:30): Code review for a junior data scientist's pipeline implementation. Review a pull request, leave structured feedback, and explain the reasoning behind suggested changes.

Late afternoon (16:30-17:30): Write a project memo summarising findings from a completed analysis, framed for a non-technical audience with clear recommendations.

The mid-level role is arguably the most technically demanding stage. The expectation is full project ownership with less scaffolding than at junior level, but the organisational leverage and influence of senior roles has not yet been built.

Many practitioners describe this stage as the most technically rich period of a data science career.

A Senior Data Scientist's Typical Day

Senior data scientists (roughly 7+ years) have a fundamentally different day. The shift is from executing tasks to defining problems, influencing decisions, and enabling others.

Many senior data scientists report that they write substantially less code than they did at mid-level, and substantially more documents, presentations, and design reviews.

Morning (9:00-10:30): Two or three back-to-back meetings. A project kickoff with a product team discussing analytical requirements. A review of a junior DS's model design with detailed feedback. A leadership review meeting presenting results from a recent pricing analysis.

Mid-morning (10:30-12:00): Deep work on a complex analysis or strategy document - the kind of work that requires sustained focus and is impossible to do well in 30-minute fragments. Calendar blocking is essential at senior levels.

Afternoon (13:00-15:00): Continued deep work if the schedule allows, or more meetings. Review of a junior team member's code. A conversation with a data engineer about pipeline requirements for an upcoming project.

Late afternoon (15:00-17:30): Writing - a project postmortem, a recommendation memo, a one-page summary of findings for executive consumption. Responding to async questions from across the team.

The meeting load at senior level is substantial. Three to five hours per day in meetings is not uncommon, and poor calendar management can effectively eliminate all productive technical work. The most effective senior data scientists are disciplined about protecting deep work time through explicit calendar blocking.

"The transition from mid-level to senior data scientist is fundamentally about learning to communicate upward. You stop getting credit for what you build and start getting credit for what you change.
That requires a completely different skill set than modelling." - Eugene Yan, Applied Scientist at Amazon, ApplyingML Newsletter, 2023

Startup vs Big Tech vs Consulting: How the Day Differs

Environment	Breadth	Depth	Infrastructure	Impact Visibility	Autonomy
Early-stage startup	Very high	Lower	Build it yourself	High	Very high
Mid-stage startup	High	Moderate	Developing	Moderate	High
Large tech company	Low (specialised)	High	World-class	Low (one of many)	Low-moderate
Consulting firm	High (client variety)	Low	Client-dependent	Moderate	Moderate
Financial services	Moderate	High (domain)	Variable	Moderate	Low
Healthcare/Pharma	Moderate	High (domain)	Variable	High (clinical)	Low-moderate

Early-Stage Startup (under 200 employees)

At a startup, the data scientist is often the entire data function. This means building infrastructure that a big-company data scientist would never touch - setting up the data warehouse, writing the logging code, creating the first dashboards from scratch.

The breadth is genuinely exciting for people who like owning things end to end, but it is relentlessly demanding.

A typical startup data scientist day involves context switching between writing a SQL data model in dbt in the morning, debugging a broken Airflow pipeline after lunch, and presenting a cohort retention analysis to the leadership team at 4pm.

The variety is high; the depth on any one thing is lower than a more specialised role would allow.

The startup data scientist also faces the specific challenge of working with data that was not designed with analytics in mind. Event logging schemas change without notice. Key metrics have no agreed definition. Business stakeholders interpret the same number differently because nobody has formally defined the metric.

Monica Rogati's 2017 "AI Hierarchy of Needs" framework placed data collection and infrastructure at the base of the pyramid precisely because startups routinely attempt to build sophisticated models on a foundation that does not yet reliably exist.

Large Technology Company (FAANG tier or comparable)

At a large tech company, roles are highly specialised. A data scientist at Google does not build pipelines - a data engineer does that. The modelling work can be genuinely sophisticated, the data volume is enormous, and the infrastructure is world-class.

The tradeoffs are different. Big-tech data scientists often report feeling removed from business impact - they are one of many contributors to a metric that is itself one of many metrics on a dashboard.

The organisational overhead is high: reviews, design documents, alignment meetings, and approval processes that can slow a project from conception to production significantly.

A 2021 study by MIT Sloan Management Review found that large-company data scientists reported an average project cycle from conception to production deployment of 8-12 months, compared to 2-4 months at startups with fewer than 500 employees.

The compensation at large tech companies is substantially higher than at most other environments, and the quality of the technical infrastructure means the actual modelling work is rarely impeded by broken pipelines or data access issues.

For data scientists who want to work on genuinely large-scale problems with world-class tooling, large tech is the right environment.

Consulting

Consulting data scientists (at firms like McKinsey, BCG, Deloitte, or boutique analytics shops) work across multiple client engagements, often in 3-6 month project rotations. The pace is high, the client variety is stimulating, and the business impact is often visible quickly.

The downsides are equally significant: the technical depth is often limited because projects are too short to build truly sophisticated systems, the data is frequently incomplete or poorly documented, and the hours can be punishing during crunch phases.

Harvard Business School's 2018 survey of data scientists by Hugh Bowne-Anderson found that consulting data scientists were the most likely to report feeling that they "rarely use advanced modelling techniques" in their actual work - largely because client engagements are scoped for deliverable speed rather than methodological sophistication.

Project Phases and What Each Feels Like

Phase	Typical Duration	Primary Activity	How It Feels
Discovery	1-2 weeks	Stakeholder meetings, requirement scoping	Engaging, light on coding
Data exploration	1-4 weeks	SQL, EDA, data quality investigation	Often frustrating
Modelling	1-3 weeks	Feature engineering, training, evaluation	The "fun" phase
Review and refinement	1-3 weeks	Stakeholder iteration, validation	Slow, process-heavy
Communication and deployment	1-2 weeks	Presentations, handoff to engineering	Underestimated in time
Post-deployment monitoring	Ongoing	Performance tracking, drift detection	Frequently neglected

The gap between expected and actual time in discovery and exploration phases is one of the biggest surprises for new data scientists. The modeling phase - which people imagine as core data science work - is typically shorter than the data wrangling that precedes it.

The post-deployment monitoring row is almost universally underestimated. Sculley et al.'s landmark 2015 paper Hidden Technical Debt in Machine Learning Systems (NIPS Proceedings) documented that the ratio of supporting code to model code in production ML systems is typically greater than 20:1.

The model itself is a small part of a complex system of data pipelines, feature generation, serving infrastructure, and monitoring - and maintaining all of it falls partly on data scientists in most organisations.

Tools and Technologies in the Real Day-to-Day

Understanding what tools actually appear in a typical day demystifies the skill requirements considerably:

SQL: Used daily by the large majority of data scientists regardless of seniority. The most consistently valuable technical skill in applied data science. According to the 2024 Stack Overflow Developer Survey, SQL was used by 71% of data science respondents - more than Python (68%).

Python: The dominant programming language for data science, used for EDA, modelling, and pipeline scripting. The pandas library for data manipulation and scikit-learn for machine learning are the core toolkit for most practitioners.

Jupyter notebooks: Standard for exploratory work, data investigation, and sharing findings. Widely criticised for poor reproducibility and version control, but universally used regardless.

dbt (data build tool): Increasingly standard for defining data transformations in the warehouse. Combines SQL with software engineering best practices (version control, testing, documentation).

Tableau, Looker, or Power BI: Business intelligence tools used for dashboards and ad-hoc stakeholder reporting. Most data scientists spend more time in BI tools than modelling frameworks.

Git: Version control. Essential for collaborative work and deployment. Still inconsistently used among data scientists with purely academic backgrounds.

Spark or BigQuery: For large-scale data processing. More relevant at large tech companies and data-mature enterprises than at small startups.

The skill gap that most surprises new data scientists is software engineering fundamentals: writing modular, testable code; using version control properly; understanding deployment basics. Most academic data science training emphasises modelling and statistics while underweighting software craft - but the practical job requires both.

What Changes with Company Data Maturity

One dimension that rarely gets discussed honestly is the relationship between data scientists and their data.

At companies with immature data infrastructure, data scientists spend enormous effort simply gaining access to the data they need - navigating permissions, working around missing documentation, and dealing with pipelines that break regularly.

At mature data organisations, the infrastructure is reliable enough that the modelling work can genuinely be the focus.

This structural difference makes the maturity of a company's data culture one of the most important factors to evaluate when considering a role. An impressive-sounding title at a company where data is not taken seriously means spending most of your time fighting infrastructure problems.

Interview questions that reveal data maturity: "What does the lineage of your core business metrics look like - who owns the definition and who maintains the pipeline?" "How are data quality issues typically discovered?" "What fraction of the data team's time is spent on maintenance versus new projects?" Honest answers to these questions are the most reliable predictors of what your actual day will look like.

Career Progression and the Paths Forward

Data science career ladders split at the senior level into two primary tracks: the individual contributor (IC) track, which continues toward staff and principal data scientist roles with growing technical scope, and the management track, which moves toward leading teams of data scientists and eventually heading analytics functions.

The IC track suits practitioners who want to remain close to the technical work. The management track suits those who prefer developing people and influencing organisational priorities.

The choice is not permanent, but the skills required diverge significantly after the senior level, and companies rarely allow lateral movement between tracks without explicit justification.

A third path - moving into applied research - exists at the largest technology companies and at research-focused organisations. Applied researchers sit between academic research and production data science, working on problems with longer time horizons and publishing results.

The compensation is typically slightly below production data science, offset by the intellectual freedom.

The 2024 Kaggle survey found that the most common next roles for experienced data scientists were: senior data scientist (35%), ML engineer (22%), data science manager (18%), and independent consultant (12%).

The diversification of exit paths reflects how broad the data scientist skill set has become - practitioners who can build models, handle data infrastructure, communicate findings, and understand business context have transferable value across many adjacent roles.

Practical Takeaways

If you are entering data science, mentally prepare to spend significantly more time on data cleaning and less time on modelling than you expect. This is not a temporary inefficiency you will work your way out of - it is the nature of the work.

Evaluate potential employers not just on their role title and compensation but on the maturity of their data infrastructure. Ask in interviews: "What does the data pipeline look like? Who maintains it? How often do pipelines break?" The answers reveal the true daily experience.

Senior data scientists who want to stay technical need to actively protect their time. Calendar blocking for deep work is not optional at senior levels - it is how you continue to do the work that makes you effective.

The communication skills that nobody emphasises in school - writing clearly, presenting to non-technical audiences, structuring ambiguous problems - matter more at every career stage than most data scientists expect when they start out.

The most influential data scientists in organisations are not necessarily the best modellers; they are the practitioners who can translate technical findings into decisions that non-technical people will act on.

Sources & Further Reading

Kaggle. (2022). State of Data Science and Machine Learning Survey. Kaggle.
Kaggle. (2024). State of Data Science and Machine Learning Survey. Kaggle.
CrowdFlower (Figure Eight). (2016). Data Science Report: Data Preparation and Cleaning.
Mason, H. (2015). Data Science in Practice. Interview transcript, O'Reilly Strata Conference.
Lorica, B. (2019). The Data Scientist's Reality: What We Do and What We Want. O'Reilly Media.
Huyen, C. (2022). Designing Machine Learning Systems. O'Reilly Media.
Yan, E. (2023). Applied Scientist Reflections: First Year to Staff Level. ApplyingML Newsletter.
Bowne-Anderson, H. (2018). What Data Scientists Really Do, According to 35 Data Scientists. Harvard Business Review.
Rogati, M. (2017). The AI Hierarchy of Needs. Hackernoon.
Stack Overflow. (2024). Developer Survey: Daily Work Experience Section.
Davenport, T. and Patil, D. (2012). Data Scientist: The Sexiest Job of the 21st Century. Harvard Business Review.
Sculley, D., et al. (2015). Hidden Technical Debt in Machine Learning Systems. NIPS Proceedings.
Kleppmann, M. (2017). Designing Data-Intensive Applications. O'Reilly Media.
MIT Sloan Management Review. (2021). Data Science Project Lifecycle Benchmarks. sloanreview.mit.edu
Data Science Weekly. (2023). First-Year Data Scientist Experience Survey. datascienceweekly.org
Cady, F. (2017). The Data Science Handbook. Wiley.

Frequently Asked Questions

How much time do data scientists spend cleaning data?

Industry surveys consistently show data scientists spend 45-60% of their time on data collection, cleaning, and preparation - leaving only 20-40% for the modelling and analysis work most people imagine when they picture the role.

How many meetings do data scientists have?

Most data scientists spend 2-4 hours per day in meetings. Senior data scientists typically have heavier meeting loads - sometimes 3-5 hours daily - across requirements discussions, stakeholder reviews, and team standups.

Is data science more creative or analytical?

Both, at different stages. Problem framing and feature engineering require creativity; model evaluation and statistical testing require analytical rigour. The ratio shifts depending on the project phase.

How does a startup data scientist day differ from big tech?

Startup data scientists own a much broader range of tasks - pipelines, dashboards, modelling, and stakeholder work. Big tech data scientists have specialised roles with better infrastructure support but often less direct visibility into business impact.

What is the most frustrating part of being a data scientist?

Data quality and data access issues are the primary frustration cited by most data scientists - spending weeks cleaning data or waiting for engineering to fix pipelines before the actual analytical work can begin.

A Day in the Life of a Data Scientist

Key Definitions

The 80% That Nobody Talks About

How Time Is Actually Spent

A Junior Data Scientist's Typical Day

The First Year Surprise

A Mid-Level Data Scientist's Typical Day

A Senior Data Scientist's Typical Day

Startup vs Big Tech vs Consulting: How the Day Differs

Early-Stage Startup (under 200 employees)

Large Technology Company (FAANG tier or comparable)

Consulting

Project Phases and What Each Feels Like

Tools and Technologies in the Real Day-to-Day

What Changes with Company Data Maturity

Career Progression and the Paths Forward

Practical Takeaways

Sources & Further Reading

Tags

Frequently Asked Questions

Share this article

Continue Reading

Getting Into Cybersecurity: A Step-by-Step Guide

Adaptive Leadership: Navigating Change Effectively

Understanding the Quantitative Analyst Role

Conducting Effective Performance Reviews: Best Practices

Essential Product Management Frameworks Explained

Freelance Consulting: Finding Niche Clients and Pricing

The Complete UX Designer Career Path Explained

Crafting a Strong Interview Answer: Personal Story

Key Definitions

The 80% That Nobody Talks About

How Time Is Actually Spent

A Junior Data Scientist's Typical Day

The First Year Surprise

A Mid-Level Data Scientist's Typical Day

A Senior Data Scientist's Typical Day

Startup vs Big Tech vs Consulting: How the Day Differs

Early-Stage Startup (under 200 employees)

Large Technology Company (FAANG tier or comparable)

Consulting

Project Phases and What Each Feels Like

Tools and Technologies in the Real Day-to-Day

What Changes with Company Data Maturity

Career Progression and the Paths Forward

Practical Takeaways

Sources & Further Reading

Tags

Frequently Asked Questions

Share this article

Continue Reading

Getting Into Cybersecurity: A Step-by-Step Guide

Adaptive Leadership: Navigating Change Effectively

Understanding the Quantitative Analyst Role

Conducting Effective Performance Reviews: Best Practices

Essential Product Management Frameworks Explained

Freelance Consulting: Finding Niche Clients and Pricing

The Complete UX Designer Career Path Explained

Crafting a Strong Interview Answer: Personal Story

We Value Your Privacy

Cookie Preferences

Essential Cookies

Analytics & Performance Cookies

Advertising & Marketing Cookies