Mastering Data Science Interview Loops
Mastering Data Science Interview Loops
Documenting summaries, and lessons learning from personal experience interviewing with
both small startups, and several top-tier companies including: Microsoft, Twitter, Airbnb,
Branch, Apple, and NVIDIA: the most efficient ways to ace the data science interviews.
We’ve also gathered knowledge from various data scientists who have interviewed
hundreds of candidates at those companies.
We seek to provide a full picture, and a comprehensive summary of the questions, and
what to expect/ encounter during the rounds of data science interviews in a small company
or a FAANG.
Every battle is won or lost before its even fought: Sun TzU
Table of Contents
Examples: Data Scientist, Analytics at Airbnb; Data Scientist at Lyft; Data Scientist at
Facebook; Product Analyst at Google
Requirements: knowledge of machine learning (not only how to use it but also the
underlying math and theory); strong coding ability
Requirements: end to end data scientists with data engineering skills; knowledge of
distributed systems; MapReduce and Spark; practical experience working with Spark;
strong coding ability
Many companies consider three tracks of data science work to meet the needs of the
business — Analytics, Inference, and Algorithms.
So before starting to search for a role, it’s important to determine what flavor of data
science appeals to you. Based on your response to that, what you study and what
questions you’ll be asked will vary.
Despite the differences in the types, generally speaking, they’ll follow a similar interview
loop although the particular questions asked may vary.
Data science interviews, like other technical interviews, require plenty of preparation.
There are a number of subjects that need to be covered in order to ensure you are ready
for back-to-back questions on statistics, programming, and machine learning.
How do I prepare for the specific content that comes up in the interviews for the product,
and metric driven track?
These interviews focus more on asking product questions like what kind of metrics you
would use to show what you should improve in a product. These are often paired with
SQL and some Python questions.
Airbnb — Product heavy, metrics diagnostics, metrics creation, A/B testing, tons of
behavioral questions, and take-home material.
Netflix — Product-sense questions, A/B testing, experimental design, metric design
Microsoft — Programming heavy, binary tree traversal, SQL, machine learning
Expedia — Product, programming, SQL, product sense, machine learning questions about
SVM, regression and decision tree
For senior-level candidates, we’d expect them to have relevant industry experience
and in-depth statistics knowledge to not only answer questions, but also drive
towards solutions that create the optimized level of member experience and business
impact.
Develop product sense, create metrics, and design robust A/B tests.
Define, measure, and report the key performance indicators/ metrics for the
product:
What are some of the key metrics that the product might want to optimize?
How to design robust A/B tests
Understand what experimental design is.
What is Applied A/B testing, and how to interpret important results
statistically?
How to validate the new feature by current users’ behaviors?
An important metric goes down, how would you dig into the causes?
What metrics would you use to quantify the success of Youtube ads (this could
also be extended to other products like Snapchat filters, Twitter live-streaming,
Fortnite new features, etc)
How do you measure the success or failure of a product/product feature
Google has released a new version of its search algorithm, for which they used
A/B testing. During the testing process, engineers realized that the new
algorithm was not implemented correctly and returned less relevant results.
Two things happened during testing:
People in the treatment group performed more queries than the control group.
Advertising revenue was higher in the treatment group as well.
What may be the cause of people in the treatment group performing more
searches than the control group? There are different possible answers here.
Given there are no metrics being tracked for Google Docs, a product manager
comes to you and asks what are the top five metrics you would implement?
In addition, let’s say there’s a dip in the engagement metric of Google Docs.
What would you investigate?
Let’s say we want to implement a notification system for reminding nurses to
discharge patients at a hospital. How would you implement it?
Let’s say at LinkedIn we want to implement a green dot for an “active user” on
the new messaging platform. How would you analyze the effectiveness of it for
roll out?
When you’re doing a coding challenge, it’s important to keep in mind that companies
aren’t always looking for the ‘correct’ solution. They may also be looking for code
readability, good design, or even a specific optimal solution.
Example:
1. Fizzbuzz
2. Given a list of timestamps in sequential order, return a list of lists grouped by weekly
aggregation.
3. Given a list of characters, a list of prior of probabilities for each character, and a matrix
of probabilities for each character combination, return the optimal sequence for the
highest probability.
4. Given a log file with rows featuring a date, a number, and then a string of names, parse
the log file and return the count of unique names aggregated by month.
This type of question is not as common as the other types. They ask candidates to carry
out data processing and transformations without using SQL or any data analysis library
such as pandas. Instead, candidates are only allowed to use a programming language of
choice to solve the problems.
In essence, this type of interview question, data manipulation, expects candidates to have a
high-level understanding of logic.
Come up with a solution and then try to find a more efficient algorithm
The most challenging part of data manipulation is to understand the logic behind the
question. Even the most sophisticated programming comes down to Python fundamentals,
e.g., string access and manipulation, different data types, and for and while loops, etc.
Representing two datasets as dictionaries, and joining them together on some given
key values.
Given a dictionary of dictionaries representing a JSON blob, doing some basic parsing
to extract particular entries.
Kennedy Wangari ||Data Scientist || AI Community Lead
Writing a function that is similar to the “spread” or “gather” functions in R’s tidyr
package, and testing it using a dataset.
Parsing event logs and returning the count of unique strings by day/month/year.
Some additional topics such as Linked Lists and Graphs (Depth First Search or Breadth-First
Search) are less likely to occur during this type of interview.
Typically, multiple questions will be asked about a single scenario, ranging from simple to
hard. Each question may cover a unique data structure or algorithm. Here is an example of
a classic problem that revolves around finding the median of a list of numbers:
Part 1: Find the median using any method. Candidates can use a built-in sorting
function and simply return the median after sorting.
Part 2: The interviewer now asks for a more optimized version of finding the median. In
this setting, knowledge of common algorithms, such as quickselect, will come in
handy.
This type of question may also appear as an applied business problem. For such
questions, the candidate is expected to code up a solution to a hypothetical applied
problem, which is usually related to the company’s business model. These questions are
easy to medium in the level of difficulty (based on the categorization of Leetcode). The key
here is to understand the business scenario and exact requirements before coding.
Communication is still the King! Ask for clarification questions before proceeding to
the coding part.
Typically, a data scientist on the Analytics team doesn’t focus the majority of their
time on modeling work. However, we have found that having knowledge of common
machine learning algorithms and knowing how to apply them to specific business
contexts are key to the success of a data scientist’s career.
Without the proper understanding in this area, it’s easy to cause incorrect
interpretation of data, which may lead to imperfect decision-making or worse
outcomes. The constant debates around “correlation versus causation” and “wrong
data is worse than no data” are all good examples.
What is also really important is how the candidate picks the right machine learning
algorithms (knowing the pros and cons from each, e.g., logistic regression, linear
regression, decision-tree, deep learning, etc.) for the type of business problems he or
she is solving for.
The modelling questions are more frequent nowadays, the interview questions for a
product data scientist are mainly geared toward how to apply those models rather than the
underlying math and theories.
Machine learning questions are all over the board: from a practical problem (how would
you go about setting up the data, cross validation, modelling, performance monitoring for
a given scenario) to generic questions (how do you deal with a categorical variable with
high cardinality) to more theoretical.
Bootstrapping
Confidence Intervals and their significance
how over sampling works
significance of the ROC curve and how to interpret a ROC curve
How Random forest works
Practical experience about Overfitting and Underfitting
Practical experience about variable selection
Basics of Logistic Regression
Example: How does the linear regression algorithm determine what the best coefficient
values are?
The point is to see how deeply you understand linear regression, which is critical because
in many data science roles you won’t just work with algorithms in a black box; you’ll
actually put them into action. This category of question tests how much you know about
what's actually happening beneath the surface.
So this is one of those "show your work" moments. Trace out every step of your thinking
and write down the equations. As you’re writing out the solution, describe your thought
process so the interviewer can see your mathematical logic at work.
It goes without saying that a strong grasp of statistics is important for solving
different data science problems. Chances are you’ll be tested on your ability to reason
statistically and your knowledge of statistical theory.
Proving your mettle requires showing you understand the fundamentals of statistics. But
more than that, interviewers also want to see whether you're capable of using the
technical language and logic of statistics to grapple with ideas you may not often
approach that way—and still communicate them clearly. So be no-nonsense in your
response. Use the relevant statistical knowledge to arrive at your answer, but be as direct
as possible about whatever you're asked to define.
These questions are meant to see how you envision your work delivering products or
services from end to end. Scenario questions don’t test for knowledge in every field;
they're meant to explore a product's life cycle from beginning to delivery and see what
limits the candidate might have at each stage of that process. But these questions also
evaluate holistic knowledge—for instance, what it takes to manage a team to deliver a
final product—to determine how candidates perform in team situations.
The capabilities assessed here include the ability to solve a business case with the
right analytical approach and reasonable data intuition, as well as the ability to make
relevant and actionable recommendations based on data insights.
The case studies could be from business domains like products, marketing, or sales,
which are all based on what you would experience on a daily basis at work.
The scenario-based questions are designed to test your experience and knowledge in
different fields of data science, to find out the practical limits of your abilities.
Demonstrate your applied knowledge as thoroughly as you can, and you’ll come off well
in any case analysis.
Example: If you were a data scientist at a web company that sells shoes, how would you
build a system that recommends shoes to visitors?
Get to fully demonstrate the business sense knowledge you’ve acquired. This helps
the company to identify the best sub-teams within Analytics that candidates could be
allocated to.
Be honest about where you can add a lot of value, but don’t be shy about where you
expect to get a little bit of help from your teammates. Try to relate how your technical
knowledge can help with business outcomes, and always explain the thought process
behind your choices and the assumptions that guide them.
Presentation
That sounds great, but how do you do that? My main recommendation is to think
through all the details, such as high-level goals and success metrics to ETL to modeling
Kennedy Wangari ||Data Scientist || AI Community Lead
implementation details, to deployment, monitoring, and improvement. The little things add
up to make a great presentation rather than one big idea. Here are a few questions worth
rethinking to help reach your ideal presentation:
What were the goal and the success metric of the project?
How do you decide to launch the project?
How do you know whether customers are benefiting from this project? By how
much?
How do you test it out? How to design your A/B test?
What was the biggest challenge?
When presenting a project, you want to engage the audience. To make my presentations
interesting, I often share interesting findings and the biggest challenges of the project.
But the best way to make sure you are engaging is practice. Practice and practice out loud.
Get to understand the company’s mission, and core values. These greatly helps to answer
questions.
Why us? / What do you value most in a job?
What have you liked and disliked about your previous position?
Introduce yourself / Why are you leaving your current job?
The biggest success/failure/challenge in your career. Other versions: Tell me about a
time you resolved a conflict or you’ve had to convince your manager or a PM on
something.
This part of the interview is just as, if not more, important than the others. It could be the
person conducting this side of the interview has the last say. Have a look over some
example Competency based questions such as ‘what’s your biggest weakness’, ‘what are
your own standards of success’ or one of my favourites — ‘name a time when things
haven’t gone your way and how did you overcome it’. The interviewers will be looking for
specific examples describing exactly what you did in certain situations, not what the
team’s role as a whole was, or what you would do in a hypothetical situation.
They will be interested in the outcome of the situation, whether there was anything you
learned from the experience etc. The interviewers want a lot more than one word answers
or statements. This is a real opportunity to display character and grit and what you’re
made of as a human being — it may not be your favourite part of the interview though do
embrace it and treat it with as much respect as your Technical tests.
At the end of the day — the business has seen your profile, they know you’ve got a Degree
in Comp Science / Stats and they know you can use Python or R — now they need to
know who you are.
Documenting best practices, actionable, and game changing tips on how to prepare
effectively for the various rounds of data science interviews.
1. Prepare adequately for the “Why Us” and “What feature or product would
you add” questions
Most companies will ask you why you’re interested in them. Get to know what features
you could add to their offerings. What you’d add. Sometimes they ask for your favorite
feature of their products. These are to test your knowledge and interest in the company.
Come with prepared answers to both before every interview. Use the company’s product
before the interview if you can.
2. Use the company product/ services
Go deep into the product beforehand to help yourself stand out. You can show up with a
The first allows you to play offense a bit. It shows you’re confident in yourself. The second
tends to be one they don’t have a prepared answer for, and usually teaches you something
valuable about the company and the job. Even if you don’t end up working at that
company, you tend to get some good advice for wherever you do end up working!
There are 4 major qualities you want to convey during your interview.
Logical Reasoning:
The interviewer wishes to see candidates make logical connections between the
information provided and the ultimate answer. You should therefore describe clearly what
is needed for the computation and how you would write the code to solve the problem,
before diving into the actual coding.
Communication:
The interviewer will also evaluate your overall code quality. While the standard
expectations in a DS interview would not be as high as those in a software engineering
interview, candidates should still focus on several aspects:
General coding best practice, e.g. modularity, handling of edge cases, naming
conventions, etc.
Proficiency:
You've already done the hard weightlifting work to get to this step, so now it's time to
finish strong.
This list contains top mistakes that are stumbling blocks during the various rounds of
interviews.
Having projects in your portfolio serves as a major safety net for "how would you" type
interview questions. Instead of speaking in hypotheticals, you'll be able to point to
concrete examples of how you handled certain situations.
In addition, many hiring managers will specifically look for your ability to be self-sufficient
because data science roles naturally include elements of project management. That means
you should understand the entire data science workflow and know how to piece
everything together.
Complete end-to-end projects that allow you to practice every major step (i.e. Data
Cleaning, Model Training, etc.).
Organize your methodology. Data science should be deliberate, not haphazard.
Technical skills and machine learning knowledge are the basic prerequisites for landing a
data science position. However, to truly stand out above the competition, you should
learn more about the specific industry you'll be applying your skills to.
If you're interviewing for a position at a bank, brush up on some basic finance concepts.
If you're interviewing for a strategy position at a Fortune 500, practice a few case
interviews and learn about drivers of profitability.
If you're interviewing for a startup, learn about its market and try to discern how it will
gain a competitive edge.
In short, taking a little bit of extra initiative here can pay big dividends!
Currently, in most organizations, data science teams are still very small compared to
developer teams or analyst teams. So while an entry-level software engineer will often be
managed a senior engineer, data scientists tend to work in more cross-functional settings.
Interviewers will look for your ability to communicate with colleagues of various technical
and mathematical backgrounds.
When people come across case studies, guesstimates and puzzles during their data
science interview, the first instinct is to jump the answer. There’s not much thought behind
how to structure your thoughts – a big no-go for an interviewer.
Example: Suppose you have recently joined a transport company as the CEO. The
company has been posting heavy losses recently. How would you go about turning
around the situation?
Most interviewees start by listing off ideas, like “analyze the pricing”, “look at the overall
costs”, “look at the route planning”, etc. That is the absolute wrong way to go about
things! Consider this a rejection for sure.
Lay down a framework using pen and paper (or a whiteboard). Putting a structure to your
thoughts showcases your thinking ability – a must-have skill in data science.
Focusing too much on the answer and not on the process is a sure shot way to failing
your data science interview.
Think about it for a moment – the aim behind an interview is to help the interviewer
understand the thought and reasoning behind your problem solving skills. Right? The
interviewer does not care much about if your answer is precise to the decimal point.
There might not even be a right or wrong answer in the first place! So make sure you
communicate your thought process to the interviewer, including the assumptions you are
making. It’s a win for both sides.
When you are given a case study, you often have an advantage you can capitalize on: you
choose the model(s) to use. That means that you can anticipate some of the questions
interviewers might ask you!
For example, if you end up using an XGBClassifier for your task, try to understand how it
works, as deeply as you can. Everyone knows it’s based on decision trees, but which other
“ingredients” do you need for it? Do you know how XGBoost handles missing values?
Could you explain Bagging and Boosting in layman’s terms?
Even if you end up using linear regression, you should have a clear idea about what is
happening under the hood, and the meaning behind the parameters you set. If you say “I
set the learning rate to X”, and somebody follows with “What’s a learning rate?”, it’s quite
bad if you cannot at least spend a few words on it.
You must always make sure to consider the business impact of your model. I believe this
mistake is specifically popular for Data Scientists because they focus solely on the model
and its performance, but fail to mention how the business was impacted.
You want to highlight your results in a way that is accurate but more
importantly, impactful. You most likely have impactful results but you may have failed to let
the interviewers know this point. You can phrase your answers like this:
“I worked on the Decision Tree model that automated a manual process, saving
the process 50% time, and 50% money, creating time and money for bettering
that product.”
If you refer to your 99% accuracy constantly but fail to mention its impact, you can expect
that the interviewers will think you do not know how to work within a business and are
more educational oriented. Sometimes even lower accuracy is better if the overall process
is faster and more impactful in some way.
Pretend like you were hiring someone — you would want to know they can help your
business.
Strive to include a discussion around the Data Engineering and Machine Learning
components that happen before and after a main Data Science project.
These points may not be something you have performed yourself, but the interviewers
know that every Data Scientist is not also a Data Engineer, Machine Learning Engineer, or
Software Engineer for that matter. What they are testing you on is if you were aware of the
whole process from start to finish and who worked on what. If you answer this question
correctly, then the company will see that you fall into a more specialized Data Science role,
and could possibly learn the beginning or end parts of the Data Science process.
No one wants to work with a know-it-all that is overly confident. Similarly, no one wants to
work with someone who is not interested in the company or its goals and respective Data
Science projects. Most importantly, when you do not ask enough questions, it sounds like
you did not care enough to listen. Ultimately, what you are portraying as you fail to ask
questions to the interviewers, is that you would not be a good candidate to collaborate
with.
What you are doing with interviewers is just that — storytelling. A common mistake I have
made was assuming the interviewers knew some information about the background of my
project.
They most likely will know nothing — some will not even read your resume.
You’ll need to set the scene when answering a question by providing basic information for
the Data Science project you performed. This type of explanation will show the interviewers
that you can work with stakeholders and other non-Data Science people.
Once you outline your past projects in this format, it will better paint the picture of your
answer.
Don’t Try to Fumble Out Answers for a Topic you haven’t Studied Before
This is a very common mistake people have. There will be certain questions you won’t
know the answer to. That’s ok – it’s human to not know everything. But candidates still try
to answer these questions by making up answers on the spot. This isn’t a great look.
Let the interviewer know that you aren’t an expert on the topic. Highlight a way to solve
the problem which you already know. For example, if you don’t know how a boosting
algorithm works, you could solve the same problem using a technique you do know. And
later, learn boosting and get back to the interviewer.
There are several layers to a data science interview. The process isn’t one-dimensional!
Just preparing for questions around tools and techniques will not land you the role. Of
course these technical skills matter, but there are other equally important topics you will
be judged on.
There will be an interaction with the project team, a case study related to the domain, role
plays, and much more. You should prepare for all these formats.
Expect some questions regarding A/B Testing, questions regarding which metrics would
be best to optimize, and questions about how to best evaluate your experimental results.
Doing deep dives to understand more about how users use your product?
Expect questions that test your ability to carry a data project from end to end and to
effectively and faithfully communicate my findings. Expect to discuss projects from
previous experiences or your education and communicate what you were able to find and
what you did.
Doing applied research on inference, prediction, or optimization problems?
For example Uber’s Surge Pricing feature or LinkedIn’s People You May Know Feature.
Depending on your specific role, you may get a traditional software engineering interview
with a focus on processing large amounts of data, or be asked about your previous
experience solving large-scale, difficult, and custom data problems.
There are many more roles of a data scientist, do your research on both the product and
the role before you set foot in the interview room.
The key question to be asking yourself is: within my role at the company, what is the
best way to understand and improve the product and the business using data?
Sample:
I want to work for you long term…… sell yourself answering this question. Give reasons
of how your passions and skills align with the company’s business
Data Science is growing. So am I: Be prepared to explain how you are keeping up with the
latest/ greatest insights on Data Science, where do you think the area will be in 3 years?
Read into the backgrounds of the people that will be interviewing you so that you can
know what perspective they have on the job/ task.
“You don’t just find and get a great job. You find and win a great job against a pool of very
competitive candidates who may want that job as much, if not more, than you do. Finding
and winning a great job is a competitive sport that requires as much career athleticism and
perseverance as making it to the Olympics. You must be in the finest career shape possible
in order to win.”