mtmd : support InternVL 2.5 and 3 #13422

ngxson · 2025-05-10T10:19:45Z

WIP

Tested with:

InternVL 3: 1B, 2B, 8B, 14B
InternVL 2.5: 1B, 4B (note: for certain sizes, conversion fails due to broken tokenizer)

# InternVL 2.5 and 3
(tool_name) -hf ggml-org/InternVL2_5-1B-GGUF
(tool_name) -hf ggml-org/InternVL2_5-2B-GGUF
(tool_name) -hf ggml-org/InternVL3-1B-Instruct-GGUF
(tool_name) -hf ggml-org/InternVL3-2B-Instruct-GGUF
(tool_name) -hf ggml-org/InternVL3-4B-Instruct-GGUF
(tool_name) -hf ggml-org/InternVL3-14B-Instruct-GGUF

Test result:

(NOTE: MobileVLM test is removed, see in comment)

OK:   llama-mtmd-cli ggml-org/SmolVLM-500M-Instruct-GGUF:Q8_0
OK:   llama-mtmd-cli ggml-org/SmolVLM2-2.2B-Instruct-GGUF:Q4_K_M
OK:   llama-mtmd-cli ggml-org/SmolVLM2-500M-Video-Instruct-GGUF:Q8_0
OK:   llama-mtmd-cli ggml-org/gemma-3-4b-it-GGUF:Q4_K_M
OK:   llama-mtmd-cli THUDM/glm-edge-v-5b-gguf:Q4_K_M
OK:   llama-mtmd-cli second-state/Llava-v1.5-7B-GGUF:Q2_K
OK:   llama-mtmd-cli cjpais/llava-1.6-mistral-7b-gguf:Q3_K
OK:   llama-mtmd-cli ibm-research/granite-vision-3.2-2b-GGUF:Q4_K_M
OK:   llama-mtmd-cli second-state/MiniCPM-Llama3-V-2_5-GGUF:Q2_K
OK:   llama-mtmd-cli openbmb/MiniCPM-V-2_6-gguf:Q2_K
OK:   llama-mtmd-cli openbmb/MiniCPM-o-2_6-gguf:Q4_0
OK:   llama-mtmd-cli bartowski/Qwen2-VL-2B-Instruct-GGUF:Q4_K_M
OK:   llama-mtmd-cli ggml-org/Qwen2.5-VL-3B-Instruct-GGUF:Q4_K_M
OK:   llama-mtmd-cli ggml-org/InternVL2_5-1B-GGUF:Q8_0
OK:   llama-mtmd-cli ggml-org/InternVL3-1B-Instruct-GGUF:Q8_0
OK:   llama-mtmd-cli ggml-org/pixtral-12b-GGUF:Q4_K_M
OK:   llama-mtmd-cli ggml-org/Mistral-Small-3.1-24B-Instruct-2503-GGUF
OK:   llama-mtmd-cli ggml-org/Qwen2-VL-2B-Instruct-GGUF:Q4_K_M
OK:   llama-mtmd-cli ggml-org/Qwen2-VL-7B-Instruct-GGUF:Q4_K_M
OK:   llama-mtmd-cli ggml-org/Qwen2.5-VL-3B-Instruct-GGUF:Q4_K_M
OK:   llama-mtmd-cli ggml-org/Qwen2.5-VL-7B-Instruct-GGUF:Q4_K_M
OK:   llama-mtmd-cli ggml-org/InternVL3-8B-Instruct-GGUF:Q4_K_M
OK:   llama-mtmd-cli ggml-org/InternVL3-14B-Instruct-GGUF:Q4_K_M

ngxson · 2025-05-10T13:26:55Z

tools/mtmd/clip.cpp

+        if (ctx->vision_model.mm_glm_tok_boi) {
+            n_patches += 2; // for BOI and EOI token embeddings
+        }


Note: This change breaks MobileVLM in the test. But after closer inspection, I think MobileVLM has broken chat template from the beginning, which make it "randomly" passes the test by coincidence.

I decided to remove the test for MobileVLM because it seems like no one use that model anymore, it's just not practically usable without a proper chat template.

ngxson · 2025-05-10T13:29:26Z

tools/mtmd/clip.cpp

+    // sanity check (only support batch size of 1 for now)
+    const int n_tokens_out = embeddings->ne[1];
+    const int expected_n_tokens_out = clip_n_output_tokens(ctx, imgs.entries[0].get());
+    if (n_tokens_out != expected_n_tokens_out) {
+        LOG_ERR("%s: expected %d tokens, got %d\n", __func__, expected_n_tokens_out, n_tokens_out);
+        GGML_ABORT("Invalid number of output tokens");
+    }
+


This sanity check should prevent problems similar to #13381

ggerganov · 2025-05-10T13:34:52Z

tools/mtmd/clip.cpp

+
+        // projector (always using GELU activation)
+        {
+            cur = build_norm(cur, model.mm_0_w, model.mm_0_b, NORM_TYPE_NORMAL, 1e-5, -1);


This norm epsilon is hardcoded to 1e-5 or is it a parameter from the model config?

The original code uses LayerNorm default value, which is 1e-5 on pytorch's docs.

I'm adding a comment now:

Suggested change

cur = build_norm(cur, model.mm_0_w, model.mm_0_b, NORM_TYPE_NORMAL, 1e-5, -1);

// projector LayerNorm uses pytorch's default eps = 1e-5

// ref: https://huggingface.co/OpenGVLab/InternVL3-8B-Instruct/blob/a34d3e4e129a5856abfd6aa6de79776484caa14e/modeling_internvl_chat.py#L79

cur = build_norm(cur, model.mm_0_w, model.mm_0_b, NORM_TYPE_NORMAL, 1e-5, -1);

city96 · 2025-05-10T22:17:17Z

Are there plans to support the larger 38B and 78B models as well? According to the huggingface model card the larger models use InternViT-6B-448px-V2_5 instead of InternViT-300M-448px-V2_5

The actual keys for the vision model are slightly different, so the current conversion script fails with:

ValueError: Can not map tensor 'vision_tower.vision_model.encoder.layers.0.attn.k_norm.weight'

city96 · 2025-05-10T23:09:43Z

Actually, I think I got it working, will post a PR in a bit.

mingyi456 · 2025-05-12T16:40:49Z

Hi @ngxson, there is also a 9B version of InternVL 3, could you please test that as well? I find it interesting because the text model is based on InternLM 3 instead of Qwen 2.5.

nicoboss · 2025-05-13T21:45:45Z

Hi @ngxson, there is also a 9B version of InternVL 3, could you please test that as well? I find it interesting because the text model is based on InternLM 3 instead of Qwen 2.5.

I just tested all the mainline InternVL models. @mingyi456 All 9B version of InternVL 3 failed. First they did so due to missing preprocessor_config.json but you can just get the file from any other InternLM 3 model. However once you have done so it still fails due to IndexError: piece id is out of range due to a broken tokenizer. @ngxson mentioned in the initial post that some InternVL 2.5 based models have this issue but apparently this issue affects InternVL3-9B, InternVL3-9B-Instruct and InternVL3-9B-Pretrained as well.

Models with missing preprocessor_config.json (can be fixed by using preprocessor_config.json from simular model):

InternVL3-9B
InternVL3-9B-Instruct
InternVL3-9B-Pretrained
InternVL2_5-78B
InternVL2_5-78B-MPO
InternVL2-40B

Models with broken tokenizer (unfixable):

InternVL3-9B
InternVL3-9B-Instruct
InternVL3-9B-Pretrained
InternVL2_5-2B
InternVL2_5-8B
InternVL2_5-26B
InternVL2_5-2B-MPO
InternVL2_5-8B-MPO
InternVL2_5-26B-MPO
InternVL2-8B-MPO
InternVL2-2B
InternVL2-8B
InternVL2-26B
Mini-InternVL-Chat-2B-V1-5
InternVL-Chat-V1-5

Incompatible V1, V1.5 and V2 models:
Note: None of them is official supported by this PR but all not listed work or fail because of above issues

InternVL2-4B
InternVL2-40B
InternVL2-Llama3-76B
Mini-InternVL-Chat-4B-V1-5
InternVL-Chat-V1-1
InternVL-Chat-V1-2
InternVL-Chat-V1-2-Plus

All InternVL2_5 and InternLM 3 main series models not mentioned above worked without any issues.

ngxson added 2 commits May 10, 2025 11:01

convert : internvl support

e309f16

InternVL3-1B working

2bfa7a4

github-actions bot added examples python python script changes labels May 10, 2025

ngxson changed the title ~~mtmd : support InternVL 2 and 3~~ mtmd : support InternVL 2.5 and 3 May 10, 2025

ngxson added 5 commits May 10, 2025 12:46

fix regression

ae25cd2

rm mobilevlm from test

6472d8c

fix conversion

abb1238

add test for internvl

95be388

add to list of pre-quant

d0adae4

github-actions bot added the documentation Improvements or additions to documentation label May 10, 2025

restore boi/eoi check

321d4ed

ngxson commented May 10, 2025

View reviewed changes

ngxson marked this pull request as ready for review May 10, 2025 13:27

ngxson requested a review from ggerganov May 10, 2025 13:28

ngxson commented May 10, 2025

View reviewed changes

ggerganov approved these changes May 10, 2025

View reviewed changes

add clarify comment for norm eps

1c0a25b

ngxson merged commit 053367d into ggml-org:master May 10, 2025
46 of 47 checks passed

city96 mentioned this pull request May 10, 2025

mtmd : support InternVL 3 38B and 78B mmproj #13443

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mtmd : support InternVL 2.5 and 3 #13422

mtmd : support InternVL 2.5 and 3 #13422

ngxson commented May 10, 2025 •

edited

Loading

ngxson May 10, 2025

ngxson May 10, 2025

ggerganov May 10, 2025

ngxson May 10, 2025

city96 commented May 10, 2025

city96 commented May 10, 2025

mingyi456 commented May 12, 2025

nicoboss commented May 13, 2025 •

edited

Loading

mtmd : support InternVL 2.5 and 3 #13422

mtmd : support InternVL 2.5 and 3 #13422

Conversation

ngxson commented May 10, 2025 • edited Loading

ngxson May 10, 2025

Choose a reason for hiding this comment

ngxson May 10, 2025

Choose a reason for hiding this comment

ggerganov May 10, 2025

Choose a reason for hiding this comment

ngxson May 10, 2025

Choose a reason for hiding this comment

city96 commented May 10, 2025

city96 commented May 10, 2025

mingyi456 commented May 12, 2025

nicoboss commented May 13, 2025 • edited Loading

ngxson commented May 10, 2025 •

edited

Loading

nicoboss commented May 13, 2025 •

edited

Loading