-
Notifications
You must be signed in to change notification settings - Fork 11.8k
mtmd : support InternVL 2.5 and 3 #13422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
if (ctx->vision_model.mm_glm_tok_boi) { | ||
n_patches += 2; // for BOI and EOI token embeddings | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: This change breaks MobileVLM in the test. But after closer inspection, I think MobileVLM has broken chat template from the beginning, which make it "randomly" passes the test by coincidence.
I decided to remove the test for MobileVLM because it seems like no one use that model anymore, it's just not practically usable without a proper chat template.
// sanity check (only support batch size of 1 for now) | ||
const int n_tokens_out = embeddings->ne[1]; | ||
const int expected_n_tokens_out = clip_n_output_tokens(ctx, imgs.entries[0].get()); | ||
if (n_tokens_out != expected_n_tokens_out) { | ||
LOG_ERR("%s: expected %d tokens, got %d\n", __func__, expected_n_tokens_out, n_tokens_out); | ||
GGML_ABORT("Invalid number of output tokens"); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sanity check should prevent problems similar to #13381
|
||
// projector (always using GELU activation) | ||
{ | ||
cur = build_norm(cur, model.mm_0_w, model.mm_0_b, NORM_TYPE_NORMAL, 1e-5, -1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This norm epsilon is hardcoded to 1e-5
or is it a parameter from the model config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original code uses LayerNorm default value, which is 1e-5 on pytorch's docs.
I'm adding a comment now:
cur = build_norm(cur, model.mm_0_w, model.mm_0_b, NORM_TYPE_NORMAL, 1e-5, -1); | |
// projector LayerNorm uses pytorch's default eps = 1e-5 | |
// ref: https://huggingface.co/OpenGVLab/InternVL3-8B-Instruct/blob/a34d3e4e129a5856abfd6aa6de79776484caa14e/modeling_internvl_chat.py#L79 | |
cur = build_norm(cur, model.mm_0_w, model.mm_0_b, NORM_TYPE_NORMAL, 1e-5, -1); |
Are there plans to support the larger 38B and 78B models as well? According to the huggingface model card the larger models use InternViT-6B-448px-V2_5 instead of InternViT-300M-448px-V2_5 The actual keys for the vision model are slightly different, so the current conversion script fails with:
|
Actually, I think I got it working, will post a PR in a bit. |
Hi @ngxson, there is also a 9B version of InternVL 3, could you please test that as well? I find it interesting because the text model is based on InternLM 3 instead of Qwen 2.5. |
I just tested all the mainline InternVL models. @mingyi456 All 9B version of InternVL 3 failed. First they did so due to missing preprocessor_config.json but you can just get the file from any other InternLM 3 model. However once you have done so it still fails due to Models with missing preprocessor_config.json (can be fixed by using preprocessor_config.json from simular model):
Models with broken tokenizer (unfixable):
Incompatible V1, V1.5 and V2 models:
All InternVL2_5 and InternLM 3 main series models not mentioned above worked without any issues. |
WIP
Tested with:
# InternVL 2.5 and 3 (tool_name) -hf ggml-org/InternVL2_5-1B-GGUF (tool_name) -hf ggml-org/InternVL2_5-2B-GGUF (tool_name) -hf ggml-org/InternVL3-1B-Instruct-GGUF (tool_name) -hf ggml-org/InternVL3-2B-Instruct-GGUF (tool_name) -hf ggml-org/InternVL3-4B-Instruct-GGUF (tool_name) -hf ggml-org/InternVL3-14B-Instruct-GGUF
Test result:
(NOTE: MobileVLM test is removed, see in comment)