Data Scientist Interview Questions

This guide is for hiring managers, engineering leads, and recruiters who need to run a structured data scientist interview — and want to go beyond generic question lists. It covers the five domains that matter most when evaluating data science candidates: statistics, experiment design, modeling judgment, SQL fluency, and business communication.

Each question includes what a strong answer should demonstrate and what to listen for — so you can evaluate candidates consistently, not just pick the one who sounded most confident.

Questions like these inform how we screen data scientist candidates as part of our direct-hire recruiting process. Learn about our data scientist recruiting.

On this page

1. Statistics and probability
2. Experimentation and A/B testing
3. Machine learning and modeling
4. SQL and analytical thinking
5. Business reasoning and communication
6. Common interview mistakes
7. Data scientist vs. ML engineer
8. FAQ

Statistics and probability questions

Statistical reasoning is the foundation of data science. These questions test whether a candidate truly understands the tools they use — not just whether they can name them.

1. Walk me through the assumptions behind a t-test and when you would choose a different test.

What a strong answer demonstrates: Understanding of normality, independence, and variance equality — and when non-parametric alternatives or bootstrapping are appropriate.

Listen for: Candidates who explain the conditions rather than just naming the test. Weak candidates treat statistical tests as black boxes. Strong ones know when the assumptions break and what to do about it.

2. What is the difference between Type I and Type II error? How do you decide where to set your significance threshold?

What a strong answer demonstrates: Conceptual clarity on false positive vs. false negative, and the ability to reason about the cost of each error in a business context.

Listen for: Whether the candidate connects the significance threshold to real consequences — not just restates textbook definitions. A strong candidate might say the threshold depends on whether the cost of a false positive (shipping a bad feature) outweighs the cost of a false negative (missing a real improvement).

3. Explain the central limit theorem in plain language. Describe a situation where it matters in practice.

What a strong answer demonstrates: The ability to communicate a statistical concept clearly and connect it to a concrete scenario — a core skill for a stakeholder-facing data scientist.

Listen for: A clear, jargon-free explanation followed by a relevant example (e.g., why A/B test metrics are approximately normal even when the underlying distribution is skewed). Candidates who can only produce a technical definition are harder to deploy in business-facing roles.

4. How do you handle multiple comparisons when running many simultaneous tests?

What a strong answer demonstrates: Awareness of the multiple testing problem and familiarity with correction methods (Bonferroni, Benjamini-Hochberg) — and the judgment to know when to apply them.

Listen for: Whether the candidate understands the trade-off between controlling false discovery rate and losing statistical power. Red flag: candidates who run twenty sub-group tests and pick the significant ones without acknowledging the problem.

5. What does a confidence interval actually tell you — and what does it not tell you?

What a strong answer demonstrates: Precise statistical thinking and the ability to distinguish frequentist confidence intervals from Bayesian credible intervals.

Listen for: The common misconception that a 95% CI means "there's a 95% probability the true value is in this interval." A strong candidate will explain the repeated sampling interpretation and acknowledge why that matters for how results are reported.

Experimentation and A/B testing questions

Experiment design is often the highest-leverage skill a data scientist brings to a product team. These data science interview questions test whether a candidate can design experiments that produce trustworthy conclusions — not just run a significance test.

6. Walk me through how you would design an A/B test for a new checkout flow when traffic is limited.

What a strong answer demonstrates: Power analysis, metric selection, minimum detectable effect, and sample size trade-offs — with awareness of the practical constraints of low-traffic scenarios.

Listen for: Whether the candidate starts with the business goal and minimum detectable effect, or jumps straight to running a test. The best answers acknowledge that small sample sizes increase the risk of underpowered tests and discuss options like longer run windows or variance reduction techniques.

7. Your A/B test shows statistical significance after two weeks, but the effect size is tiny. What do you do?

What a strong answer demonstrates: The distinction between statistical significance and practical significance — and the ability to advise stakeholders on whether to ship a change.

Listen for: Candidates who consider implementation cost, opportunity cost, and long-term effects — not just the p-value. A strong answer connects the effect size to business impact (e.g., "a 0.1% lift on a high-volume surface may still be worth shipping; on a low-volume feature it probably isn't").

8. How do you design an experiment when full randomization is not feasible?

What a strong answer demonstrates: Familiarity with quasi-experimental methods — difference-in-differences, synthetic control, regression discontinuity — and causal inference under non-random assignment.

Listen for: This is a strong signal for senior data scientist candidates. Candidates who only know randomized A/B testing will struggle in environments where true randomization is impossible (e.g., market-level rollouts, enterprise products). Listen for methodological flexibility and an honest acknowledgment of the assumptions each approach requires.

9. How would you handle network effects or spillover between treatment and control groups?

What a strong answer demonstrates: Understanding of experiment interference and cluster-based or geo-based randomization strategies for social or marketplace products.

Listen for: Whether the candidate recognizes the problem without prompting. The most common setup error in marketplace experiments is contamination — treatment users affect control users. Strong candidates propose design solutions (cluster randomization, holdout cells) rather than just noting the issue.

Machine learning and modeling questions

Data scientists are expected to build and evaluate models — but the interview signal here is modeling judgment, not implementation fluency. These data scientist technical interview questions test whether a candidate can reason about models in context.

10. How do you decide between a simple model and a complex one for a business prediction task?

What a strong answer demonstrates: Awareness of interpretability, maintenance cost, data requirements, and business constraints — not just model performance metrics.

Listen for: Red flag: candidates who default to the most powerful model without considering the use case. Strong candidates ask questions like: Who will use this model? Does it need to be explainable to regulators or stakeholders? How often will it be retrained? A simple model that works reliably often beats a complex model that requires constant care.

11. Walk me through how you would evaluate a classification model before recommending it to stakeholders.

What a strong answer demonstrates: Knowing which metrics matter for different business problems — precision vs. recall, AUC-ROC, calibration — and how to translate model performance into business language.

Listen for: Whether the candidate goes beyond accuracy and asks about class imbalance, threshold selection, and the cost of different error types. Strong candidates also mention how they would communicate the evaluation to a non-technical audience.

12. Your model performs well in training but degrades quickly after deployment. What do you investigate first?

What a strong answer demonstrates: Systematic diagnostic thinking — feature drift, training-serving skew, data pipeline issues, label leakage, and concept drift are all relevant.

Listen for: Whether the candidate has a structured diagnostic process or just guesses. Strong data scientists decompose the problem: Is the feature distribution shifting? Is there a data pipeline bug? Did the real-world relationship between features and labels change? Each of these requires a different response.

13. Describe a situation where the model you built was not the right choice for the business — even if it performed well technically.

What a strong answer demonstrates: Maturity and business judgment — the ability to recognize when the right technical answer is not the right business answer.

Listen for: Candidates who have actually shipped things and learned from deployment failures, stakeholder feedback, or misaligned incentives. This question surfaces senior-level judgment that purely technical interviews miss.

14. How do you approach feature selection for a model where interpretability is important?

What a strong answer demonstrates: Knowledge of feature importance methods (permutation importance, SHAP), correlation analysis, and the business reasoning behind feature inclusion — not just algorithmic filters.

Listen for: Whether the candidate considers regulatory constraints, stakeholder trust, and the risk of proxy variables. In regulated industries or HR contexts, a model using certain features may be technically accurate but legally or ethically problematic.

SQL and analytical thinking questions

SQL fluency matters less than analytical clarity. These questions test whether a candidate can pull the right data, validate it, and reason about what it means — not just write syntactically correct queries.

15. A key business metric dropped 20% overnight. Walk me through how you investigate.

What a strong answer demonstrates: A systematic investigation framework — ruling out instrumentation issues, isolating the affected segment, checking for external factors, and forming and testing hypotheses.

Listen for: Whether the candidate checks logging and data quality before assuming the business changed. This is the classic "metric investigation" question — strong candidates check for measurement artifacts first, then segment by platform, geography, user cohort, and product area before drawing conclusions.

16. How would you write a query to calculate 7-day rolling retention for a user cohort?

What a strong answer demonstrates: Window function fluency, cohort construction logic, and the ability to define "retention" precisely before writing a query.

Listen for: Whether the candidate clarifies the definition of retention (does a user need to be active on day 7, or any day in days 1–7?) before writing code. The best answers also include validation steps — checking for duplicates, null user IDs, and timezone handling.

17. How would you detect whether a metric drop is a real product change versus a data pipeline issue?

What a strong answer demonstrates: Operational awareness of data infrastructure — event tracking, ingestion delays, schema changes, and the telltale signs of upstream data problems.

Listen for: Candidates who distinguish between a real-world change and a measurement artifact. Good heuristics: check if the drop affects all metrics uniformly (pipeline issue) or one specific metric (real issue); check if raw event counts dropped; cross-reference with engineering deployment logs.

18. Given a table of user events, how would you identify users who are at high risk of churning?

What a strong answer demonstrates: The ability to translate a business problem into a data problem — defining churn, selecting leading indicators, and describing a predictive or rule-based approach.

Listen for: Whether the candidate starts by defining churn precisely (inactivity for N days? subscription cancellation?) before proposing a model or rule. Strong candidates also discuss the asymmetry between the cost of a false churn prediction versus missing a real churner.

Business reasoning and communication questions

Data scientists who cannot communicate findings clearly or push back constructively on stakeholder pressure are limited in their impact. These questions surface the communication and judgment skills that technical assessments miss.

19. How do you communicate a null result to a stakeholder who expected a positive outcome?

What a strong answer demonstrates: Communication maturity, statistical honesty, and the ability to present uncertainty without undermining trust in the process.

Listen for: Whether the candidate frames the null result as information (not failure), acknowledges what the test was and was not powered to detect, and suggests next steps. Red flag: candidates who reanalyze the data until they find something — a common source of p-hacking in practice.

20. A senior leader wants to ship a product change based on intuition. Your analysis says it is unlikely to work. How do you handle that?

What a strong answer demonstrates: Intellectual confidence combined with organizational judgment — the ability to advocate for data without being combative.

Listen for: Whether the candidate proposes a structured middle ground — running a small experiment, defining success criteria in advance, setting a review checkpoint — rather than either capitulating or refusing. This is a high-signal question for senior data scientist candidates who will operate with significant autonomy.

21. Describe how you translate a vague business question into a measurable analytical problem.

What a strong answer demonstrates: Problem decomposition, metric definition, and the ability to scope analysis before writing a single line of code.

Listen for: A structured approach: identifying the decision being made, what data would change that decision, what metric proxies the outcome, and what assumptions are embedded in that metric. Strong candidates ask clarifying questions rather than starting with the data.

22. How do you decide which analyses are worth doing versus which to deprioritize?

What a strong answer demonstrates: Strategic prioritization — expected impact, reversibility of the decision, and the cost of analysis relative to the cost of the decision being made.

Listen for: Whether the candidate thinks about analysis as a cost-benefit trade-off. A three-week deep dive into a decision that will be revisited in a month is often the wrong investment. Strong data scientists match analytical rigor to decision stakes.

Common data scientist interview mistakes

Even well-intentioned interviews produce unreliable signal when the process is not structured well. These are the most common mistakes hiring teams make when running data science interviews.

Testing statistics but skipping experiment design

Many interviews over-index on textbook statistics and ignore a candidate's ability to design experiments, reason about causality, or handle quasi-experimental scenarios. Experiment design is where most data scientists create business value.
No rubric for communication quality

How a candidate explains a null result or presents uncertainty to a stakeholder is as important as statistical accuracy. Without a rubric, interviewers default to liking candidates who sound confident — which is not the same as being correct.
Using ML engineer questions for data scientist roles

These roles have different orientations. Using a heavy ML systems or infrastructure question set for a data science candidate will screen out great analytical thinkers who are not pipeline engineers — and vice versa.
Skipping the business reasoning layer

Data scientists operate at the interface between data and decisions. Interviews that never ask candidates to frame a business problem, prioritize analyses, or communicate trade-offs will miss a critical dimension of job performance.
Not evaluating judgment on ambiguous problems

Textbook questions with clear answers test recall, not judgment. Include at least one open-ended scenario where there is no single right answer and evaluate how the candidate reasons through it.

Hiring for a data scientist role?

We recruit data scientists for direct-hire roles — sourcing and screening candidates with statistical rigor, experimentation depth, and business judgment.

Data scientist recruiting overview

Ready to start a search?

Submit your open role and we will follow up within one business day to discuss whether the search is a fit.

Submit a Role Book an Intake Call

How interviewing a data scientist differs from interviewing an ML engineer

These titles are often conflated — but the roles have different orientations, and the right interview questions to ask differ significantly.

Data Scientist

—Oriented toward insight, decisions, and analytical methodology
—Heavy emphasis on experiment design, causal inference, and statistical rigor
—Regular stakeholder communication — translating findings for non-technical audiences
—Modeling work tends to be exploratory — selecting and evaluating models for specific business problems
—Interview should test: statistics, experimentation, analytical communication, business framing

ML Engineer

—Oriented toward building, deploying, and scaling ML systems
—Heavy emphasis on pipeline architecture, feature serving, and model lifecycle
—More engineering-adjacent — latency, throughput, reliability matter significantly
—Modeling work tends to be operational — shipping and maintaining models in production
—Interview should test: MLOps, system design, training pipelines, deployment patterns

See the ML engineer recruiting overview for the equivalent question set and evaluation framework.

Hiring for a data scientist role?

We recruit data scientists for direct-hire roles — questions like these inform how we screen candidates before making an introduction to your team.

Submit a Role Book an Intake Call

Frequently asked questions

What should data scientist interview questions focus on?

A strong data science interview covers at least four areas: statistical reasoning, experiment design, modeling judgment, and communication. Many hiring teams over-index on coding or ML technique and underweight the analytical and stakeholder-facing skills that separate strong data scientists from average ones. The best interviews include questions that require candidates to reason out loud, not just recall facts.

How many interview rounds is typical for a data scientist?

Most structured data science interviews run three to five rounds: an initial screen, a technical assessment or take-home, one or two structured technical interviews, and a final stakeholder or leadership round. The number varies by company size and seniority level, but the core evaluation should cover statistics, experiment design, and business communication in some combination.

Should I use a take-home assignment for data scientist candidates?

Take-home assignments can work well if they are realistic, time-boxed, and reviewed consistently. The risk is that open-ended assignments favor candidates with more free time and can introduce evaluation bias. If you use one, define the rubric in advance and debrief candidates live on their submission — the debrief often reveals more signal than the assignment itself.

How do I tell a strong data scientist candidate from an average one?

Strong candidates reason about uncertainty honestly, push back on poorly defined problems, and connect their analysis to business outcomes. Average candidates tend to answer technically correct questions but struggle when asked to justify a decision, communicate a null result, or design an analysis from scratch. Watch for candidates who ask clarifying questions before answering — that is usually a good sign.

What SQL level should I expect from a data scientist candidate?

Mid-to-senior data scientists should be comfortable with window functions, CTEs, and cohort-style queries. You should not expect them to optimize query execution plans or tune indexes — that is closer to data engineering. A practical test is asking a candidate to write or describe a query that calculates rolling retention or funnel conversion, then asking how they would validate the result.

How is interviewing a data scientist different from interviewing a data analyst?

Data analysts typically focus on reporting, dashboards, and descriptive analysis. Data scientists are expected to design experiments, build predictive models, and own the analytical methodology behind business decisions. The interview questions for these roles overlap but diverge significantly in depth — data science interviews should probe causal inference, model selection trade-offs, and statistical rigor at a level that most analyst interviews do not require.

More questions? See the full FAQ or contact us.

Related resources

Hiring for an AI, ML, or data role?

Send us the role details and we will respond with whether the search is a fit.

Submit a Role Book an Intake Call