Welcome to the official code repository for the book, "Build DeepSeek from Scratch" by Dr. Raj Dandekar, Dr. Rajat Dandekar, Dr. Sreedath Panat, and Naman Dwivedi of Vizuara AI Labs.
This book and repository provide a hands-on guide to understanding and implementing the key architectural innovations behind DeepSeek.
This book is accompanied by our viral "Build DeepSeek from Scratch" YouTube playlist, which has helped researchers, developers, and entrepreneurs worldwide. We highly recommend watching the videos alongside reading the chapters for a comprehensive learning experience.
➡️ Watch the full playlist on YouTube
DeepSeek LLM represents a pivotal moment in open source recently as the first fully open-weights model to achieve state-of-the-art performance comparable to closed-source giants. This book democratizes the knowledge behind this breakthrough, teaching you the nuts and bolts of how every single aspect of DeepSeek was built from the ground up.
You will learn to implement and extend DeepSeek's core modules from scratch, including:
- Multi-Head Latent Attention (MLA)
- Mixture-of-Experts (MoE)
- Multi-Token Prediction (MTP)
- Advanced Training and Fine-Tuning Pipelines (FP8, SFT, RL, Distillation)
The repository is organized by chapter. Each chXX/ directory contains the README.md with a summary and links to relevant videos, and a subdirectory with the code listings (.ipynb notebooks) for that chapter.
- /ch01/: Introduction to the DeepSeek.
- /ch02/: The Road to MLA: Understanding the KV Cache Bottleneck.
- /ch03/: The DeepSeek Breakthrough: Multi-Head Latent Attention (MLA).
- /ch04/: Mixture-of-Experts (MoE) in DeepSeek.
- /ch05/: Multi-Token Prediction and FP8 Quantization.
- /ch06/: The DeepSeek Training Pipeline.
- /ch07/: Post Training: SFT and Reinforcement Learning.
- /ch08/: Knowledge Distillation.
We hope you find this resource valuable on your journey to mastering modern LLM architecture!