Skip to content

mtmd : support InternVL 3 38B and 78B mmproj #13443

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 11, 2025

Conversation

city96
Copy link
Contributor

@city96 city96 commented May 10, 2025

This is an attempt to add support for InternVL3-38B and InternVL3-78B. The PR at #13422 already works with the smaller models.

According to the readme, the mmproj for these is InternViT-6B-448px-V2_5 instead of InternViT-300M-448px-V2_5. The only difference seems to be the lack of q/k/v biases and the addition of q/k attention norms.

@github-actions github-actions bot added examples python python script changes labels May 10, 2025
@city96 city96 changed the title Support InternVL 3 38B and 78B mmproj mtmd : support InternVL 3 38B and 78B mmproj May 11, 2025
@ngxson
Copy link
Collaborator

ngxson commented May 11, 2025

Nice, thanks. I ran the test to make sure this doesn't break anything:

OK:   llama-mtmd-cli ggml-org/SmolVLM-500M-Instruct-GGUF:Q8_0
OK:   llama-mtmd-cli ggml-org/SmolVLM2-2.2B-Instruct-GGUF:Q4_K_M
OK:   llama-mtmd-cli ggml-org/SmolVLM2-500M-Video-Instruct-GGUF:Q8_0
OK:   llama-mtmd-cli ggml-org/gemma-3-4b-it-GGUF:Q4_K_M
OK:   llama-mtmd-cli THUDM/glm-edge-v-5b-gguf:Q4_K_M
OK:   llama-mtmd-cli second-state/Llava-v1.5-7B-GGUF:Q2_K
OK:   llama-mtmd-cli cjpais/llava-1.6-mistral-7b-gguf:Q3_K
OK:   llama-mtmd-cli ibm-research/granite-vision-3.2-2b-GGUF:Q4_K_M
OK:   llama-mtmd-cli second-state/MiniCPM-Llama3-V-2_5-GGUF:Q2_K
OK:   llama-mtmd-cli openbmb/MiniCPM-V-2_6-gguf:Q2_K
OK:   llama-mtmd-cli openbmb/MiniCPM-o-2_6-gguf:Q4_0
OK:   llama-mtmd-cli bartowski/Qwen2-VL-2B-Instruct-GGUF:Q4_K_M
OK:   llama-mtmd-cli ggml-org/Qwen2.5-VL-3B-Instruct-GGUF:Q4_K_M
OK:   llama-mtmd-cli ggml-org/InternVL2_5-1B-GGUF:Q8_0
OK:   llama-mtmd-cli ggml-org/InternVL3-1B-Instruct-GGUF:Q8_0

@ngxson ngxson merged commit 3eac209 into ggml-org:master May 11, 2025
46 checks passed
@city96
Copy link
Contributor Author

city96 commented May 11, 2025

There's one thing I'm not super sure about which is the norm type.

The config at InternViT-6B-448px-V2_5/config.json doesn't include any info about the norms and only has layer_norm_eps so I assumed it was layer norm like the smaller models, but I just checked and the config included with the llm has "norm_type": "rms_norm" set. I think that other config is relying on configuration_intern_vit.py having the default value set to rms norm.

It also seems to just hardcode the eps to config.layer_norm_eps despite it also having config.rms_norm_eps in the other config (both of which are the same, so no issues there).

The current version works fine for OCR and descriptions, so I'm not sure which value is correct, I might just be misreading it this early in the morning lol. Also not super sure how this would be handled if it is different, I don't see any examples of norm type being stored in the gguf metadata, so the only idea I have is something jank like this in clip.cpp:

norm_type norm_t = ctx->vision_model.hparams.n_embd == 3200 ? NORM_TYPE_RMS : NORM_TYPE_NORMAL;

@city96 city96 deleted the internvl_mmproj branch May 11, 2025 11:14
@ngxson
Copy link
Collaborator

ngxson commented May 11, 2025

We don't have metadata for norm type, because most of the time one arch use one type of norm only

Tbh I don't think InternViT-6B-448px-V2_5 is the good place to look at. Each InternVL model has its own config.json and configuration_intern_vit.py which actually handle the config.

It seems like the 38B use RMS norm, so potentially we should update it everywhere. But to be extra safe, can you also look at other config files to see if they all use RMS norm or only the 38B use that?

@city96
Copy link
Contributor Author

city96 commented May 11, 2025

Looking at the config files, both 38B and 78B have the same line in the config, so those are definitely RMS norm then. 14B and below has it set to layer norm.

I also checked InternVL 2.5, for that one 26B, 38B and 78B all have RMS norm, while 8B and below all have layer norm.

So based on that, everything that uses the bigger ViT is RMS norm, and everything that uses the smaller ViT is layer norm in the config.

@ngxson
Copy link
Collaborator

ngxson commented May 11, 2025

Ok thanks, then I think we can add a heuristic check to switch to RMS norm based on n_embd (as you said) and n_layers if needed. Feel free to open a new PR for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
examples python python script changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants