convert : register UMT5Model architecture for T5 conversion #17160
+1
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When trying to convert
google/umt5-xlto GGUF, I encountered "Model UMT5Model is not supported".The model repository doesn't have safetensors files, so I downloaded it using
AutoModel.from_pretrained(). WhenAutoModelloads a model, it uses the base model class (UMT5Model) rather than the task-specific variant (UMT5ForConditionalGeneration), and this is what gets written to config.json when callingsave_pretrained().While
UMT5ForConditionalGenerationwas already registered, the baseUMT5Modelclass was not.Adding
@ModelBase.register("UMT5Model")allows the conversion to work for models downloaded withAutoModel, similar to how other base model classes likeBloomModel,BertModel, andRobertaModelare also registered alongside their task-specific variants.I tested the resulting GGUF models (F32, F16, and Q8_0) and verified they produce identical embeddings and encoder weights compared to the original PyTorch model (max difference: 0.0, mean difference: 0.0).
(Sorry for the duplicate PRs, I thought this was a misconfiguration on my part so I deleted my initial fork)
test_umt5_encoding.py