Skip to main content

DeepSpeed

DeepSpeed empowers developers to streamline distributed training and inference, making it easier to scale AI models efficiently while minimizing costs and operational complexity.

Training advanced deep learning models is challenging. Beyond model design, model scientists also need to set up the state-of-the-art training techniques such as distributed training, mixed precision, gradient accumulation, and checkpointing. Yet still, scientists may not achieve the desired system performance and convergence rate. Large model sizes are even more challenging: a large model easily runs out of memory with pure data parallelism, and it is difficult to use model parallelism. DeepSpeed addresses these challenges to accelerate model development and training. DeepSpeed enables the world’s most powerful language models like MT-530B and BLOOM. It is an easy-to-use deep learning optimization software suite that powers unprecedented scale and speed for both training and inference.

DeepSpeed was contributed by Microsoft to the Linux Foundation in January 2025.