nanoMoE

Overview

Mixture of Experts (MoE) architecture has gained significant traction in the LLM community for good reason: these models effectively maintain the scaling laws established for dense models while keeping inference costs relatively manageable – a compelling advantage in our era of ever-growing language models.

To better understand how MoE architectures actually work under the hood, I extended Andrej Karpathy's excellent nanoGPT repository to support MoE architecture. This implementation prioritizes clarity and educational value over performance optimizations.

Key Insights

I've documented my findings, experiments, and observations in a detailed blog post: nanoMoE: Extending NanoGPT with Mixture of Experts

The post covers:

Implementation details of MoE in a minimalist framework
Experimental results comparing MoE and dense models
Practical insights about expert load balancing and training stability
Visualizations of various MoE scaling patterns

Implementation Details

The core modifications to nanoGPT include:

moe_model.py: Inherits from model.py and implements the core MoE layers
train.py: Minor modifications to accommodate MoE model architecture and track load balance loss
config/train_gpt2_moe.py*: Configuration files for various MoE model variants

Everything else is forked from nanoGPT - refer to Andrej's README for details on the base implementation.

Limitations

This implementation is designed as a learning tool to understand MoE architecture. It uses intuitive approaches (like for-loops) that aren't necessarily optimal for GPU computation. For production-scale implementations, more sophisticated techniques like Block Sparse MoE or Expert Parallelism would be required.

Acknowledgments

This project builds directly on Andrej Karpathy's nanoGPT and draws inspiration from Google's Switch Transformers paper.

Name		Name	Last commit message	Last commit date
Latest commit History 214 Commits
assets		assets
config		config
data		data
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bench.py		bench.py
configurator.py		configurator.py
model.py		model.py
moe_model.py		moe_model.py
sample.py		sample.py
scaling_laws.ipynb		scaling_laws.ipynb
train.py		train.py
transformer_sizing.ipynb		transformer_sizing.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

nanoMoE

Overview

Key Insights

Implementation Details

Limitations

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

License

sijunhe/nanoMoE

Folders and files

Latest commit

History

Repository files navigation

nanoMoE

Overview

Key Insights

Implementation Details

Limitations

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages