Faster inference using ONNX
Open Neural Network Exchange (ONNX) provides faster inference for models that are trained. The optimum library, on the other hand, provides easier ONNX exports for even pipelines of Hugging Face-based models. The usage and implementation are straightforward, as you will see:
- The first thing to do is install the
optimumandonnxruntimelibraries:$ pip install optimum[onnxruntime]
- The next step is to load the pipeline using the
optimumpipeline:from optimum.pipelines import pipeline pipe = pipeline("text-classification", "cardiffnlp/twitter-xlm-roberta-base-sentiment", accelerator="ort") - Two types of accelerators exist: ONNX runtime (ORT) and BETTERTRANSFORMER. ORT is used for the ONNX export of the model. For this specific example, we just picked a multilingual sentiment analysis model based on XLM-Roberta. Now that it has been converted, you can easily run the pipeline...