This repository contains useful articles and papers for the (aspiring) unicorn data scientist. Unlike other awesome-xyz repositories, this does not consolidate software tools or libraries; only reading materials.
Pull requests are welcome! See Contributing.
- Rules of ML
- Engineers Shouldn’t Write ETL: A Guide to Building a High Functioning Data Science Department
- Machine Learning: The High-Interest Credit Card of Technical Debt
- Production-Level-Deep-Learning
- End-to-end Machine Learning with TFX on TensorFlow 2.x
- Monitoring Machine Learning Models in Production
- ML engineering best practices
- Google MLOps whitepaper
- Ways I Use Testing as a Data Scientist
- MLOps Principles
- Continuous delivery and automation pipelines in machine learning
- No, you don't need MLOps
- Python dependency management is a dumpster fire
- Detecting Interference: An A/B Test of A/B Tests
- Switchback Tests and Randomized Experimentation Under Network Effects at DoorDash
- Innovating Faster on Personalization Algorithms at Netflix Using Interleaving
- Python Design Patterns
- Awesome Prometheus Alerts
- Refactoring Guru
- Choose boring technology
- The Beginner's Guide to Databases
- System Design 101
- How to do a code review
- Probability Distribution Explorer
- Common probability distributions
- William Chen probability cheat sheet
- KDE visualisation
- Modeling conversion rates and saving millions of dollars using Kaplan-Meier and gamma distributions
- Robust Statistical Distances for Machine Learning
- How Not To Sort By Average Rating
- Common statistical tests are linear models
- Bayesian Optimization
- On Average, You’re Using the Wrong Average: Geometric & Harmonic Means in Data Analysis
- Stop aggregating away the signal in your data
- Inferring Concept Drift Without Labeled Data
- The hacker's guide to uncertainty estimates
- How Walmart Leverages CUPED and Reduces Experimentation Lifecycle
- A One-Page Primer on: Statistical Power
- The Illustrated Machine Learning website
- Feature Engineering A-Z
- Applied Machine Learning for Tabular Data
- Interpretable ML Book
- Machine Learning Visualized
- Google Recommendation Systems Crash Course
- Deep Neural Networks for YouTube Recommendations
- Deep density networks and uncertainty in recommender systems
- Microsoft Recommenders
- aman.ai recsys
- The Best Way to Use Text Embeddings Portably is With Parquet and Polars
- AI Engineer 2025 - Improving RecSys & Search with LLM techniques
- Bandit Algorithms
- A Contextual-Bandit Approach to Personalized News Article Recommendation
- Python Causality Handbook
- A Recipe for Training Neural Networks
- Numerically stable and computationally efficient log-sum-exp
- Interpreting loss curves
- Visualizing the Loss Landscape of a Neural Network
- AI Content Generation Tools
- Google research tuning playbook
- The Illustrated Transformer
- Transformer Explainer
- Understanding Large Language Models
- LLM101n - Andrej Kaparthy
- A Visual Guide to Quantization
- Anti-hype LLM reading list
- Understanding LLMs from Scratch Using Middle School Math
- The Illustrated Transformer
- awesome-chatgpt-prompts
- Prompt Engineering Guide
- Anthropic prompt engineering overview
- Prompt Engineering Roadmap
- OpenAI Cookbook
- Google Prompt Design Strategies
- Zero to One: Learning Agentic Patterns
- A practical guide to building agents - OpenAI
- Building effective agents - Anthropic
- How to Build an Agent
- 12 factor agents
- Lessons on building an AI data analyst
- Building a Generative AI Platform
- Emerging Architectures for LLM Applications
- Open source LLM tools
- rerankers
- ML and LLM system design: 450 case studies to learn from
- Introducing Contextual Retrieval
- Vector Databases Are the Wrong Abstraction
- Here’s how I use LLMs to help me write code
- AI in organizations: Some tactics
- Improving Recommendation Systems & Search in the Age of LLMs
- Synthetic Consumers
- LLMOps Is About People Too: The Human Element in AI Engineering
- Field Notes From Shipping Real Code With Claude
- Data Analytics Design Patterns
- How AI will Disrupt BI As We Know It
- Coding for Economists
- Practical advice for analysis of large, complex data sets
- What is an analytics engineer?
- The Modern Data Stack
- Emerging architectures for modern data infrastructure
- Highly Opinionated Integrations
- Choosing a Product Analytics Tool
- 2022 ETL Buyer’s Guide: How to Pick the Right Tool for Your Analytics Stack
- Modern Data Stack in a Box with DuckDB
- Simple ML for Sheets
- Data Pipeline Design Patterns
- The Analytics Development Lifecycle
- The Rise of the Declarative Data Stack
- Nomadic Infrastructure Design for AI Workloads
- DataHub: Popular metadata architectures explained
- Biases in AI Systems
- How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh
- Apple's Human Interface Guidelines for Charts
- Better dashboards align with the scales of business decisions
- Hashicorp manager charter
- Good DS vs. Bad DS
- Open decision making
- North star metrics
- Agile analytics
- Modern data culture stack
- So You Want to Become a Data Science Manager?
- Mochary Method Curriculum
- Gitlab Data Team Handbook
- The Great CEO Within
- Coordination Headwind
- The Art of Onboarding
- Don't be Frupid
- 7 Mindsets That Are Slowing Down Your Career Growth
- 3 Critical Skills You Need to Grow Beyond Senior Levels in Engineering
- Best Practice vs. Fit Practice
- Cargo Culting
- The Best Programmers I Know
- How to Consistently Hire Remarkable Data Scientists
- How Coursera Competes Against Google and Facebook for the Best Talent
- How to hire smarter than the market: a toy model
- Interviewing is a noisy prediction problem
- How to set compensation using commensense principles
- VP of Engineering hiring cheatsheet
- Quantifying the statistical skills needed to be a Google Data Scientist
- The Data Science Interview Book
- Conor Dewey's list of data science interview resources
- 101 Data Science Interview Questions, Answers, and Key Concepts
- How I negotiated a $300,000 job offer in Silicon Valley
- How to Make Your Data Science Job Application Stand Out
- You’re probably answering these 5 common interview questions wrong
- 6 red flags I saw while doing 60+ technical interviews in 30 days
- Google interview warmup
- Red Flags to Look Out for When Joining a Data Team
- Data Scientist Handbook
- A guide to passing the A/B test interview question in tech companies
- Top 50 Large Language Model (LLM) Interview Questions