-
Notifications
You must be signed in to change notification settings - Fork 287
Add Qwen 2.5 #2088
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Qwen 2.5 #2088
Conversation
Thanks for the PR! Before review, could you please do a forward pass and match the output with HF's Qwen? Also, let's make it a draft PR till then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took a cursory glance. Let's do the weight conversion and numerics check first!
to fix code format error |
@shivance - let us know when this PR is ready for review. Thanks! |
@abheesht17 I have got tokenizer working currently, I am working on matching output of HF model and keras model. |
Great, no hurry. Was just checking. Do ping if you hit any blockers :) |
@abheesht17 I see that in newer checkpoint conversion script we use
instead of old kernel assign
Has API changed for assigning bias as well? Why was the new method created, What is the difference? |
@abheesht17 upon weight loading, outputs look like this!
succeeds, i.e. absolute tolerance 1e-3. I am testing at fp32, since it's a 0.5B model. |
@abheesht17 i have marked this PR as ready for review |
Great. Were you able to bring the difference in numerics down to 1e-5? Might be worth checking layer-by-layer which one's causing an issue. |
Edit: never mind, the conversion script is part of the PR. |
@abheesht17 here is the colab version of conversion script. |
@abheesht17 did you get a chance to inspect the delta in output? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Just some initial comments and questions.
misc_special_tokens -= {eos_token} | ||
|
||
# Add misc special tokens | ||
for i, token in enumerate(misc_special_tokens): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are these used for? I don't see these used anywhere. A lot of tokenizers have reserved and unused tokens (e.g. for bert the first thousand I think), we don't generally give them special treatment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just followed llama3 tokenizer!
I think it's necessary to check in detail where the error is. As much as possible, we should ensure that the fp32 error is around 1e-5 under the torch backend. The maximum error of bf16 should not exceed 1e-2. |
@mattdangerw / @abheesht17 / @divyashreepathihalli How do you completely disable MPS backend with Keras? Please take a look at latest conversion script, despite I am moving model to cpu using keras.device and also moving inputs, reversible embedding call step exits with stacktrace
the stacktrace points that somewhere allocation is still happening on MPS, which I have already disabled !! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work! This is overall looking great. Still a few comments to work through.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! LGTM. Will merge after the final nit on exporting the attention layer.
return config | ||
|
||
@staticmethod | ||
def get_layout_map( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have we tested this at all? if not, maybe leave as a follow up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope didn't test this.
|
||
print("\n-> Huggingface model and tokenizer loaded") | ||
|
||
# === Check that the models and tokenizers outputs match === |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably want to save the actual model from this (so we can upload the artifacts to kaggle when we convert). And could add some fancier print output, see
keras-hub/tools/checkpoint_conversion/convert_gemma_checkpoints.py
Lines 262 to 266 in 6b76c07
print("✅ Output validated") | |
keras_model.save_to_preset(preset) | |
keras_tokenizer.save_to_preset(preset) | |
print(f"🏁 Preset saved to ./{preset}") |
This can be a follow up though!
@mattdangerw addressed all the comments! |
Thank you! Once this is green I will pull. |
Closes #2078
References:
Qwen 2.5 uses Qwen2 backbone from Huggingface Transformers
HF Config path
HF Source Code