-
Notifications
You must be signed in to change notification settings - Fork 11.8k
mtmd : support InternVL 3 38B and 78B mmproj #13443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Nice, thanks. I ran the test to make sure this doesn't break anything:
|
There's one thing I'm not super sure about which is the norm type. The config at InternViT-6B-448px-V2_5/config.json doesn't include any info about the norms and only has It also seems to just hardcode the eps to The current version works fine for OCR and descriptions, so I'm not sure which value is correct, I might just be misreading it this early in the morning lol. Also not super sure how this would be handled if it is different, I don't see any examples of norm type being stored in the gguf metadata, so the only idea I have is something jank like this in clip.cpp: norm_type norm_t = ctx->vision_model.hparams.n_embd == 3200 ? NORM_TYPE_RMS : NORM_TYPE_NORMAL; |
We don't have metadata for norm type, because most of the time one arch use one type of norm only Tbh I don't think InternViT-6B-448px-V2_5 is the good place to look at. Each InternVL model has its own It seems like the 38B use RMS norm, so potentially we should update it everywhere. But to be extra safe, can you also look at other config files to see if they all use RMS norm or only the 38B use that? |
Looking at the config files, both 38B and 78B have the same line in the config, so those are definitely RMS norm then. 14B and below has it set to layer norm. I also checked InternVL 2.5, for that one 26B, 38B and 78B all have RMS norm, while 8B and below all have layer norm. So based on that, everything that uses the bigger ViT is RMS norm, and everything that uses the smaller ViT is layer norm in the config. |
Ok thanks, then I think we can add a heuristic check to switch to RMS norm based on |
This is an attempt to add support for InternVL3-38B and InternVL3-78B. The PR at #13422 already works with the smaller models.
According to the readme, the mmproj for these is InternViT-6B-448px-V2_5 instead of InternViT-300M-448px-V2_5. The only difference seems to be the lack of q/k/v biases and the addition of q/k attention norms.