This repository contains below libraries. They can be installed independent of each other.
- ort_moe: Mixture of Experts implementation in PyTorch
- torch_ort: ONNX Runtime package that accelerates PyTorch models
- torch_ort_infer: ONNX Runtime package that accelerates inference for PyTorch models
Mixture of Experts layer implementation is available in the ort_moe folder.
- ort_moe/docs/moe.md provides brief overview of the implementation.
- A simple MoE tutorial is provided here.
- Note: ONNX Runtime (following pre-requisites) is not required to run the MoE layer. It is intergrated in stand-alone Pytorch.
cd ort_moe
pip install build # Install PyPA build
python -m build
ONNX Runtime for PyTorch accelerates PyTorch model training using ONNX Runtime.
This repository contains the source code for the package, as well as instructions for running the package.
You need a machine with at least one NVIDIA or AMD GPU to run ONNX Runtime for PyTorch.
You can install and run torch-ort in your local environment, or with Docker.
By default, torch-ort depends on PyTorch 1.9.0, ONNX Runtime 1.9.0 and CUDA 10.2.
-
Install CUDA 10.2
-
Install CuDNN 7.6
-
Install torch-ort
pip install torch-ort
-
Run post-installation script for ORTModule
python -m torch_ort.configure
Get install instructions for other combinations in the Get Started Easily
section at https://www.onnxruntime.ai/ under the Optimize Training
tab.
-
Clone this repo
git clone [email protected]:pytorch/ort.git
-
Install extra dependencies
pip install wget pandas sklearn transformers
-
Run the training script
python ./ort/tests/bert_for_sequence_classification.py
from torch_ort import ORTModule
model = ORTModule(model)
# PyTorch training script follows
import torch
from torch_ort.optim import FusedAdam
class NeuralNet(torch.nn.Module):
...
# Only supports GPU Currently.
device = "cuda"
model = NeuralNet(...).to(device)
ort_fused_adam_optimizer = FusedAdam(
model.parameters(), lr=1e-3, betas=(0.9, 0.999), weight_decay=0.01, eps=1e-8
)
loss = model(...).sum()
loss.backward()
ort_fused_adam_optimizer.step()
ort_fused_adam_optimizer.zero_grad()
For detailed documentation see FusedAdam
For a full working example see FusedAdam Test Example
import torch
from torch.utils.data import DataLoader
from torch_ort.utils.data import LoadBalancingDistributedSampler
class MyDataset(torch.utils.data.Dataset):
...
def collate_fn(data):
...
return samples, label_list
samples = [...]
labels = [...]
dataset = MyDataset(samples, labels)
data_sampler = sampler.LoadBalancingDistributedSampler(
dataset, complexity_fn=complexity_fn, world_size=2, rank=0, shuffle=False
)
train_dataloader = DataLoader(dataset, batch_size=2, sampler=data_sampler, collate_fn=collate_fn)
for batched_data, batched_label in train_dataloader:
optimizer.zero_grad()
loss = loss_fn(model(batched_data) , batched_labels)
loss.backward()
optimizer.step()
For detailed documentation see LoadBalancingDistributedSampler
For a full working example see LoadBalancingDistributedSampler Test Example
To see torch-ort in action, see https://github.com/microsoft/onnxruntime-training-examples, which shows you how to train the most popular HuggingFace models.
ONNX Runtime for PyTorch is now extended to support PyTorch model inference using ONNX Runtime.
It is available via the torch-ort-infer python package. This preview package enables OpenVINO™ Execution Provider for ONNX Runtime by default for accelerating inference on various Intel® CPUs, Intel® integrated GPUs, and Intel® Movidius™ Vision Processing Units - referred to as VPU.
This repository contains the source code for the package, as well as instructions for running the package.
-
Ubuntu 18.04, 20.04
-
Python* 3.7, 3.8 or 3.9
By default, torch-ort-infer depends on PyTorch 1.12 and ONNX Runtime OpenVINO EP 1.12.
-
Install torch-ort-infer with OpenVINO dependencies.
pip install torch-ort-infer[openvino]
-
Run post-installation script
python -m torch_ort.configure
Once you have created your environment, execute the following steps to validate that your installation is correct.
-
Clone this repo
git clone [email protected]:pytorch/ort.git
-
Install extra dependencies
pip install wget pandas transformers
-
Run the inference script
python ./ort/torch_ort_inference/tests/bert_for_sequence_classification.py
from torch_ort import ORTInferenceModule
model = ORTInferenceModule(model)
# PyTorch inference script follows
Users can configure different options for a given Execution Provider to run inference. As an example, OpenVINO™ Execution Provider options can be configured as shown below:
from torch_ort import ORTInferenceModule, OpenVINOProviderOptions
provider_options = OpenVINOProviderOptions(backend = "GPU", precision = "FP16")
model = ORTInferenceModule(model, provider_options = provider_options)
# PyTorch inference script follows
If no provider options are specified by user, OpenVINO™ Execution Provider is enabled with following options by default:
backend = "CPU"
precision = "FP32"
For more details on APIs, see usage.md.
Experimental support on Intel® MyriadX VPU in this preview.
This project has an MIT license, as found in the LICENSE file.