mtmd : support InternVL 3 38B and 78B mmproj #13443

city96 · 2025-05-10T23:36:24Z

This is an attempt to add support for InternVL3-38B and InternVL3-78B. The PR at #13422 already works with the smaller models.

According to the readme, the mmproj for these is InternViT-6B-448px-V2_5 instead of InternViT-300M-448px-V2_5. The only difference seems to be the lack of q/k/v biases and the addition of q/k attention norms.

ngxson · 2025-05-11T09:35:02Z

Nice, thanks. I ran the test to make sure this doesn't break anything:

OK:   llama-mtmd-cli ggml-org/SmolVLM-500M-Instruct-GGUF:Q8_0
OK:   llama-mtmd-cli ggml-org/SmolVLM2-2.2B-Instruct-GGUF:Q4_K_M
OK:   llama-mtmd-cli ggml-org/SmolVLM2-500M-Video-Instruct-GGUF:Q8_0
OK:   llama-mtmd-cli ggml-org/gemma-3-4b-it-GGUF:Q4_K_M
OK:   llama-mtmd-cli THUDM/glm-edge-v-5b-gguf:Q4_K_M
OK:   llama-mtmd-cli second-state/Llava-v1.5-7B-GGUF:Q2_K
OK:   llama-mtmd-cli cjpais/llava-1.6-mistral-7b-gguf:Q3_K
OK:   llama-mtmd-cli ibm-research/granite-vision-3.2-2b-GGUF:Q4_K_M
OK:   llama-mtmd-cli second-state/MiniCPM-Llama3-V-2_5-GGUF:Q2_K
OK:   llama-mtmd-cli openbmb/MiniCPM-V-2_6-gguf:Q2_K
OK:   llama-mtmd-cli openbmb/MiniCPM-o-2_6-gguf:Q4_0
OK:   llama-mtmd-cli bartowski/Qwen2-VL-2B-Instruct-GGUF:Q4_K_M
OK:   llama-mtmd-cli ggml-org/Qwen2.5-VL-3B-Instruct-GGUF:Q4_K_M
OK:   llama-mtmd-cli ggml-org/InternVL2_5-1B-GGUF:Q8_0
OK:   llama-mtmd-cli ggml-org/InternVL3-1B-Instruct-GGUF:Q8_0

city96 · 2025-05-11T11:14:13Z

There's one thing I'm not super sure about which is the norm type.

The config at InternViT-6B-448px-V2_5/config.json doesn't include any info about the norms and only has layer_norm_eps so I assumed it was layer norm like the smaller models, but I just checked and the config included with the llm has "norm_type": "rms_norm" set. I think that other config is relying on configuration_intern_vit.py having the default value set to rms norm.

It also seems to just hardcode the eps to config.layer_norm_eps despite it also having config.rms_norm_eps in the other config (both of which are the same, so no issues there).

The current version works fine for OCR and descriptions, so I'm not sure which value is correct, I might just be misreading it this early in the morning lol. Also not super sure how this would be handled if it is different, I don't see any examples of norm type being stored in the gguf metadata, so the only idea I have is something jank like this in clip.cpp:

norm_type norm_t = ctx->vision_model.hparams.n_embd == 3200 ? NORM_TYPE_RMS : NORM_TYPE_NORMAL;

ngxson · 2025-05-11T11:24:57Z

We don't have metadata for norm type, because most of the time one arch use one type of norm only

Tbh I don't think InternViT-6B-448px-V2_5 is the good place to look at. Each InternVL model has its own config.json and configuration_intern_vit.py which actually handle the config.

It seems like the 38B use RMS norm, so potentially we should update it everywhere. But to be extra safe, can you also look at other config files to see if they all use RMS norm or only the 38B use that?

city96 · 2025-05-11T11:53:30Z

Looking at the config files, both 38B and 78B have the same line in the config, so those are definitely RMS norm then. 14B and below has it set to layer norm.

I also checked InternVL 2.5, for that one 26B, 38B and 78B all have RMS norm, while 8B and below all have layer norm.

So based on that, everything that uses the bigger ViT is RMS norm, and everything that uses the smaller ViT is layer norm in the config.

ngxson · 2025-05-11T12:02:19Z

Ok thanks, then I think we can add a heuristic check to switch to RMS norm based on n_embd (as you said) and n_layers if needed. Feel free to open a new PR for that.

city96 added 2 commits May 11, 2025 01:21

Support InternVL 3 38B and 78B mmproj

dcc53b1

Swap norms in clip.cpp

a505b92

github-actions bot added examples python python script changes labels May 10, 2025

Group variables together

814734e

city96 changed the title ~~Support InternVL 3 38B and 78B mmproj~~ mtmd : support InternVL 3 38B and 78B mmproj May 11, 2025

ngxson approved these changes May 11, 2025

View reviewed changes

ngxson merged commit 3eac209 into ggml-org:master May 11, 2025
46 checks passed

city96 deleted the internvl_mmproj branch May 11, 2025 11:14

city96 mentioned this pull request May 11, 2025

mtmd : use RMS norm for InternVL 3 38B and 78B mmproj #13459

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mtmd : support InternVL 3 38B and 78B mmproj #13443

mtmd : support InternVL 3 38B and 78B mmproj #13443

city96 commented May 10, 2025

ngxson commented May 11, 2025

city96 commented May 11, 2025

ngxson commented May 11, 2025

city96 commented May 11, 2025 •

edited

Loading

ngxson commented May 11, 2025

mtmd : support InternVL 3 38B and 78B mmproj #13443

mtmd : support InternVL 3 38B and 78B mmproj #13443

Conversation

city96 commented May 10, 2025

ngxson commented May 11, 2025

city96 commented May 11, 2025

ngxson commented May 11, 2025

city96 commented May 11, 2025 • edited Loading

ngxson commented May 11, 2025

city96 commented May 11, 2025 •

edited

Loading