export llama to onnx

Export llama to onnx files without modifying transformers modeling_llama.py

support export llama_hf (alpaca, etc.), Alibaba Qwen by export_llama.py

support export ChatGlm2 by export_chatglm2.py

Please use pytorch 2.1 (if not released, use newest nightly built version) for exporting chatglm2. You can refer demo infer_glm2_by_onnx.py for inferring exported chatglm2 onnx

support export bloom by export_bloom.py

Models to export

For llama, we will export four onnx files by the following models:

LlamaForCausalLM.lm_head

LlamaModel.embed_tokens

LlamaModel.layers

LlamaModel.norm

Actually it's very easy to convert all these sub models in a single onnx model, we show this in export chatglm2.py, export_llama_single.py.

Usage example

convert llama_hf

python export_llama.py -m model_dir --dtype fp16 # convert model to multi onnx files
# python export_llama_single.py -m model_dir --dtype fp16 # convert model to single onnx file

convert Qwen:

python export_llama.py -m model_dir --dtype fp16 --model_type Qwen

before converting Qwen, it's better to replace the rearrange ops in modeling_qwen.py to simplify the exported onnx models (please ref https://blog.csdn.net/u013701860/article/details/132123476).

convert chatglm2:

python export_chatglm2.py -m model_dir --dtype fp16 # [--add_topk_warper 1]

Some other arguments can be used to configure the export, such as the opset, output dirs.

Note

Please uninstall/disable FlashAttention (and maybe xformers) before model conversion.

For kv_cache, some models use the format of [batch, head, seq, hidden], while some use [batch, seq, head, hidden]. However, the [batch, seq, head, hidden] format is much more friendly for deployment, since the memory of new cache is continuous.

The project (all versions) and its developers are not responsible for the correctness of the exported models, and any consequences arising from the use of the project and exported models.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
export_bloom.py		export_bloom.py
export_chatglm2.py		export_chatglm2.py
export_llama.py		export_llama.py
export_llama_single.py		export_llama_single.py
infer_glm2_by_onnx.py		infer_glm2_by_onnx.py
onnx_rt_utils.py		onnx_rt_utils.py
sample_utils.py		sample_utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

export llama to onnx

support export llama_hf (alpaca, etc.), Alibaba Qwen by export_llama.py

support export ChatGlm2 by export_chatglm2.py

support export bloom by export_bloom.py

Models to export

Usage example

Note

About

Uh oh!

Releases

Packages

Languages

License

duanqn/export_llama_to_onnx

Folders and files

Latest commit

History

Repository files navigation

export llama to onnx

support export llama_hf (alpaca, etc.), Alibaba Qwen by export_llama.py

support export ChatGlm2 by export_chatglm2.py

support export bloom by export_bloom.py

Models to export

Usage example

Note

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages