Skip to content

Research: How to integrate VITA 1.5 for multi-modal GGUF deployment? #13520

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
5 tasks
jordanqi opened this issue May 14, 2025 · 0 comments
Open
5 tasks

Research: How to integrate VITA 1.5 for multi-modal GGUF deployment? #13520

jordanqi opened this issue May 14, 2025 · 0 comments

Comments

@jordanqi
Copy link

jordanqi commented May 14, 2025

Research Stage

  • Background Research (Let's try to avoid reinventing the wheel)
  • Hypothesis Formed (How do you think this will work and it's effect?)
  • Strategy / Implementation Forming
  • Analysis of results
  • Debrief / Documentation (So people in the future can learn from us)

Previous existing literature and research

I'm trying to deploy a multi-modal model based on VITA-1.5, where:

The text backbone is the same as Qwen2.

The vision tower is InternViT-300M-448px from OpenGVLab.

Yesterday I noticed that convert_hf_to_gguf.py added a new class:

class InternVisionModel(VisionModel)

which is the same one used in vita's vision part
However:

There's no corresponding tensor name mapping in constants.py under MODEL_TENSORS.

There's no build function in llama_model.cpp (e.g., no build_internvit() ).

I’m not sure how to combine the vision and text parts into a single GGUF model so that llama.cpp can infer with both modalities.

My goal:
To deploy VITA-1.5 via llama.cpp and run image+text inference (similar to LLaVA / MobileVLM).

Questions:
What is the recommended way to combine Qwen2 text + InternViT vision into one GGUF model?

Will InternViTVisionModel support GGUF inference soon, or should I write the corresponding GGML graph manually?

Hypothesis

No response

Implementation

No response

Analysis

No response

Relevant log output

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant