Example scripts for deep learing

This repository contains example scripts for deep learning, including pretraining configurations for Large Language Models (LLMs) and Multimodal Models.

Scripts Overview

Pretraining Guide

Large Language Models

NeMo LLM Pretraining Scripts : Contains example scripts for pretraining LLM Models using the NeMo Framework. These scripts are adapted from NeMo-Framework-Launcher
Megatron-LM LLM Pretraining Scripts : Contains example scripts for pretraining LLM Models that adapted from Megatron-LM.
Megatron-DeepSpeed LLM Pretraining Scripts : Contains example scripts for pretraining LLM Models that adapted from Megatron-DeepSpeed

Multimodal Models

training/nemo/neva : Scripts to Multimodal Models - NeVa (LLaVA) pretraining with recommended config (from NeMo-Framework-Launcher) on NVIDIA H100, in fp16 data type, running on NeMo Framework

Running Deep Learning Examples

Prerequisites

Before running the examples, ensure the following:

Container: Use the ScitiX NeMo container (registry-ap-southeast.scitix.ai/hpc/nemo:24.07) or the NGC NeMo container (nemo:24.07). If using NGC, clone this repository into the container or a shared storage accessible by distributed worker containers.
Datasets: Refer to the README.md under deep_learning_examples/training for dataset preparation.
- For LLM based on NeMo or Megatron-LM, mock data can be used.
- For ScitiX SiFlow or CKS, preset datasets are available.
Pretrained Models: Prepare corresponding pretrained models for fine-tuning and multimodal pretraining. Preset models are available for ScitiX SiFlow or CKS.

Using SiFlow All-in-One AI Platform

Refer to the README.md under deep_learning_examples/training for detailed instructions.

Using PyTorchjob Operator

Scripts for launching PyTorch jobs on a Kubernetes cluster are located in launcher_scripts/k8s.

Refer to the README.md under deep_learning_examples/training for detailed instructions.

For example, to launch the LLaMA2-13B pretraining, use the following command:

cd ${DEEP_LEARNING_EXAMPLES_DIR}/launcher_scripts/k8s/training/llm
./launch_nemo_llama2_13b_bf16.sh

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
docs		docs
launcher_scripts		launcher_scripts
thirdparty		thirdparty
training		training
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Example scripts for deep learing

Scripts Overview

Pretraining Guide

Large Language Models

Multimodal Models

Running Deep Learning Examples

Prerequisites

Using SiFlow All-in-One AI Platform

Using PyTorchjob Operator

Performance

LLM Training Performance Results

NeVa Training Performance Results

NeVa Finetune Performance Results

About

Uh oh!

Releases

Packages

Languages

scitix/deep_learning_examples

Folders and files

Latest commit

History

Repository files navigation

Example scripts for deep learing

Scripts Overview

Pretraining Guide

Large Language Models

Multimodal Models

Running Deep Learning Examples

Prerequisites

Using SiFlow All-in-One AI Platform

Using PyTorchjob Operator

Performance

LLM Training Performance Results

NeVa Training Performance Results

NeVa Finetune Performance Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages