This repository records EleutherAI's library for training large-scale language models on GPUs. Our current framework is based on NVIDIA's Megatron Language Model and has been augmented with techniques from DeepSpeed as well as some novel optimizations. We aim to make this repo a centralized and accessible place to gather techniques for training large-scale autoregressive language models, and accelerate research into large-scale training.
For those looking for a TPU-centric codebase, we recommend Mesh Transformer JAX.
If you are not looking to train models with billions of parameters from scratch, this is likely the wrong library to use. For generic inference needs, we recommend you use the Hugging Face transformers library instead which supports GPT-NeoX models.
GPT-NeoX
Implementation of model parallel autoregressive transformers on GPUs
Downloads:
1 This Week
Windows
Mac
Linux
BSD
ChromeOS