Skip to content

Deploying VITA-1.5 Multimodal Model with ExecuTorch #10757

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jordanqi opened this issue May 7, 2025 · 7 comments
Open

Deploying VITA-1.5 Multimodal Model with ExecuTorch #10757

jordanqi opened this issue May 7, 2025 · 7 comments
Labels
module: llm Issues related to LLM examples and apps, and to the extensions/llm/ code triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@jordanqi
Copy link

jordanqi commented May 7, 2025

🚀 The feature, motivation and pitch

I’m trying to deploy a VITA-1.5 multimodal model (supports audio, vision, and text) using ExecuTorch.

The tokenizer is in Hugging Face tokenizer.json format, and I’d like to ask:

  1. Is there any suggested way to convert the model into .pte format for ExecuTorch?
  2. Since this is a new architecture, is there any guidance or examples for adding custom models?
  3. Can I still use the LlamaDemo Android app with this multimodal?

Alternatives

No response

Additional context

No response

RFC (Optional)

No response

cc @larryliu0820 @mergennachin @cccclai @helunwencser @jackzhxng

@Jack-Khuu
Copy link
Contributor

cc: @larryliu0820 for MM
@kirklandsign for Android

@Jack-Khuu Jack-Khuu added triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module module: llm Issues related to LLM examples and apps, and to the extensions/llm/ code labels May 9, 2025
@github-project-automation github-project-automation bot moved this to To triage in ExecuTorch Core May 9, 2025
@Jack-Khuu
Copy link
Contributor

Hi @jordanqi, if you haven't joined our discord channel, we would love to have you on there :)

@jordanqi
Copy link
Author

jordanqi commented May 9, 2025

Hi @jordanqi, if you haven't joined our discord channel, we would love to have you on there :)

I haven't joined the discord, please send the channel link to me thanks

@kirklandsign
Copy link
Contributor

@kirklandsign
Copy link
Contributor

kirklandsign commented May 12, 2025

Is there any suggested way to convert the model into .pte format for ExecuTorch?
Since this is a new architecture, is there any guidance or examples for adding custom models?

Is this model from HF? @guangy10 maybe know

Can I still use the LlamaDemo Android app with this multimodal?

If it can run with desktop llama_runner out of box then it can run with LlamaDemo Android app, but not sure about image processing and prompt format part

@kirklandsign
Copy link
Contributor

We support Hugging Face tokenizer.json format right now

@larryliu0820
Copy link
Contributor

larryliu0820 commented May 12, 2025

Hi @jordanqi thanks for your interest. A lot of the features you are asking for are under development, so there might not be convenient ways to do them, but I believe with a bit of work you can make these working!

Is there any suggested way to convert the model into .pte format for ExecuTorch?

Yes I would suggest you to take a look at the Llava example for multimodal LLMs: https://github.com/pytorch/executorch/blob/main/examples/models/llava/export_llava.py

We are actively working on a more general API that works on models with the same architecture

Since this is a new architecture, is there any guidance or examples for adding custom models?

Again please refer to the Llava example. We are open to take a new model under example directory, as long as it works.

Can I still use the LlamaDemo Android app with this multimodal?

Most likely you can use the demo app for image prefill, but to support audio we need a lot of work. Also keep in mind that right now the model and the runner are coupled so it's really a case by case situation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: llm Issues related to LLM examples and apps, and to the extensions/llm/ code triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
Status: To triage
Development

No branches or pull requests

4 participants