Skip to content

steppegras/DeepSeek-From-Scratch

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Build DeepSeek from Scratch

Welcome to the official code repository for the book, "Build DeepSeek from Scratch" by Dr. Raj Dandekar, Dr. Rajat Dandekar, Dr. Sreedath Panat, and Naman Dwivedi of Vizuara AI Labs.

This book and repository provide a hands-on guide to understanding and implementing the key architectural innovations behind DeepSeek.

Official YouTube Series

This book is accompanied by our viral "Build DeepSeek from Scratch" YouTube playlist, which has helped researchers, developers, and entrepreneurs worldwide. We highly recommend watching the videos alongside reading the chapters for a comprehensive learning experience.

➡️ Watch the full playlist on YouTube

About This Book

DeepSeek LLM represents a pivotal moment in open source recently as the first fully open-weights model to achieve state-of-the-art performance comparable to closed-source giants. This book democratizes the knowledge behind this breakthrough, teaching you the nuts and bolts of how every single aspect of DeepSeek was built from the ground up.

You will learn to implement and extend DeepSeek's core modules from scratch, including:

  • Multi-Head Latent Attention (MLA)
  • Mixture-of-Experts (MoE)
  • Multi-Token Prediction (MTP)
  • Advanced Training and Fine-Tuning Pipelines (FP8, SFT, RL, Distillation)

Repository Structure

The repository is organized by chapter. Each chXX/ directory contains the README.md with a summary and links to relevant videos, and a subdirectory with the code listings (.ipynb notebooks) for that chapter.

  • /ch01/: Introduction to the DeepSeek.
  • /ch02/: The Road to MLA: Understanding the KV Cache Bottleneck.
  • /ch03/: The DeepSeek Breakthrough: Multi-Head Latent Attention (MLA).
  • /ch04/: Mixture-of-Experts (MoE) in DeepSeek.
  • /ch05/: Multi-Token Prediction and FP8 Quantization.
  • /ch06/: The DeepSeek Training Pipeline.
  • /ch07/: Post Training: SFT and Reinforcement Learning.
  • /ch08/: Knowledge Distillation.

We hope you find this resource valuable on your journey to mastering modern LLM architecture!

About

Learn the building blocks of how to build DeepSeek from scratch.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%