Skip to content

Add Qwen 2.5 #2088

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Mar 17, 2025
Merged

Add Qwen 2.5 #2088

merged 25 commits into from
Mar 17, 2025

Conversation

kanpuriyanawab
Copy link
Collaborator

@kanpuriyanawab kanpuriyanawab commented Feb 9, 2025

Closes #2078

References:

Qwen 2.5 uses Qwen2 backbone from Huggingface Transformers
HF Config path
HF Source Code

@kanpuriyanawab kanpuriyanawab self-assigned this Feb 9, 2025
@kanpuriyanawab kanpuriyanawab changed the title Add Qwen 2.5 [WIP] Add Qwen 2.5 Feb 9, 2025
@abheesht17
Copy link
Collaborator

abheesht17 commented Feb 10, 2025

Thanks for the PR! Before review, could you please do a forward pass and match the output with HF's Qwen? Also, let's make it a draft PR till then

Copy link
Collaborator

@abheesht17 abheesht17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a cursory glance. Let's do the weight conversion and numerics check first!

@kanpuriyanawab kanpuriyanawab marked this pull request as draft February 10, 2025 10:00
@divyashreepathihalli
Copy link
Collaborator

divyashreepathihalli commented Feb 12, 2025

to fix code format error
you will need to run shell/api_gen.sh at root
if you don't have ruff install ruff pip install ruff and then run shell/format.sh at root.

@abheesht17
Copy link
Collaborator

@shivance - let us know when this PR is ready for review. Thanks!

@kanpuriyanawab
Copy link
Collaborator Author

kanpuriyanawab commented Feb 18, 2025

@abheesht17 I have got tokenizer working currently, I am working on matching output of HF model and keras model.
Thanks for patience!

Screenshot 2025-02-18 at 10 23 36 PM

@abheesht17
Copy link
Collaborator

Great, no hurry. Was just checking. Do ping if you hit any blockers :)

@kanpuriyanawab
Copy link
Collaborator Author

kanpuriyanawab commented Feb 18, 2025

@abheesht17 I see that in newer checkpoint conversion script we use set_weights method, eg.

keras_hub_model.transformer_layers[
            i
        ]._self_attention_layer._query_dense.set_weights(
            [
                hf_model.model.layers[i]
                .self_attn.q_proj.weight.T.reshape(
                    config.hidden_size,
                    config.num_attention_heads,
                    config.hidden_size // config.num_attention_heads,
                )
                .detach()
                .cpu()
                .float()
                .numpy()
            ]
        )

instead of old kernel assign

keras_hub_model.get_layer(
            f"f_net_layer_{i}"
        )._intermediate_dense.kernel.assign(
            hf_wts[f"encoder.layer.{i}.intermediate.dense.weight"]
            .transpose(1, 0)
            .numpy()
        )

Has API changed for assigning bias as well? Why was the new method created, What is the difference?

@kanpuriyanawab
Copy link
Collaborator Author

Screenshot 2025-02-19 at 12 37 18 AM

@abheesht17 upon weight loading, outputs look like this!
there is still some delta here,

np.testing.assert_allclose(
            keras_hub_logits, hf_output_logits, atol=1e-3
        )

succeeds, i.e. absolute tolerance 1e-3.

I am testing at fp32, since it's a 0.5B model.

@kanpuriyanawab kanpuriyanawab changed the title [WIP] Add Qwen 2.5 Add Qwen 2.5 Feb 18, 2025
@kanpuriyanawab kanpuriyanawab marked this pull request as ready for review February 18, 2025 19:20
@kanpuriyanawab
Copy link
Collaborator Author

@abheesht17 i have marked this PR as ready for review

@abheesht17
Copy link
Collaborator

abheesht17 commented Feb 20, 2025

@abheesht17 i have marked this PR as ready for review

Great. Were you able to bring the difference in numerics down to 1e-5? Might be worth checking layer-by-layer which one's causing an issue.

@abheesht17
Copy link
Collaborator

abheesht17 commented Feb 21, 2025

@shivance - can you please share the weight conversion Colab as well?

Edit: never mind, the conversion script is part of the PR.

@kanpuriyanawab
Copy link
Collaborator Author

@abheesht17 here is the colab version of conversion script.

@kanpuriyanawab
Copy link
Collaborator Author

@abheesht17 did you get a chance to inspect the delta in output?

Copy link
Member

@mattdangerw mattdangerw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Just some initial comments and questions.

misc_special_tokens -= {eos_token}

# Add misc special tokens
for i, token in enumerate(misc_special_tokens):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are these used for? I don't see these used anywhere. A lot of tokenizers have reserved and unused tokens (e.g. for bert the first thousand I think), we don't generally give them special treatment.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just followed llama3 tokenizer!

@pass-lin
Copy link
Contributor

pass-lin commented Mar 1, 2025

I think it's necessary to check in detail where the error is. As much as possible, we should ensure that the fp32 error is around 1e-5 under the torch backend. The maximum error of bf16 should not exceed 1e-2.
I've also implemented a Keras model with a similar error, and this level of error would cause a significant decrease in inference performance, as well as repetition.

@kanpuriyanawab
Copy link
Collaborator Author

kanpuriyanawab commented Mar 8, 2025

@mattdangerw / @abheesht17 / @divyashreepathihalli How do you completely disable MPS backend with Keras?

Please take a look at latest conversion script, despite I am moving model to cpu using keras.device and also moving inputs, reversible embedding call step exits with

stacktrace

-> Keras 3 model and tokenizer loaded.
Traceback (most recent call last):
  File "/Users/anshuman/.pyenv/versions/3.11.10/lib/python3.11/runpy.py", line 198, in _run_module_as_main
    return _run_code(code, main_globals, None,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/anshuman/.pyenv/versions/3.11.10/lib/python3.11/runpy.py", line 88, in _run_code
    exec(code, run_globals)
  File "/Users/anshuman/.cursor/extensions/ms-python.debugpy-2024.6.0-darwin-arm64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/__main__.py", line 39, in <module>
    cli.main()
  File "/Users/anshuman/.cursor/extensions/ms-python.debugpy-2024.6.0-darwin-arm64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main
    run()
  File "/Users/anshuman/.cursor/extensions/ms-python.debugpy-2024.6.0-darwin-arm64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 284, in run_file
    runpy.run_path(target, run_name="__main__")
  File "/Users/anshuman/.cursor/extensions/ms-python.debugpy-2024.6.0-darwin-arm64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 321, in run_path
    return _run_module_code(code, init_globals, run_name,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/anshuman/.cursor/extensions/ms-python.debugpy-2024.6.0-darwin-arm64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 135, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/Users/anshuman/.cursor/extensions/ms-python.debugpy-2024.6.0-darwin-arm64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code
    exec(code, run_globals)
  File "/Users/anshuman/Desktop/Projects/keras-hub/tools/checkpoint_conversion/convert_qwen_checkpoints.py", line 353, in <module>
    app.run(main)
  File "/Users/anshuman/.pyenv/versions/3.11.10/envs/qwen/lib/python3.11/site-packages/absl/app.py", line 308, in run
    _run_main(main, args)
  File "/Users/anshuman/.pyenv/versions/3.11.10/envs/qwen/lib/python3.11/site-packages/absl/app.py", line 254, in _run_main
    sys.exit(main(argv))
             ^^^^^^^^^^
  File "/Users/anshuman/Desktop/Projects/keras-hub/tools/checkpoint_conversion/convert_qwen_checkpoints.py", line 336, in main
    test_model(keras_hub_model, keras_hub_tokenizer, hf_model, hf_tokenizer)
  File "/Users/anshuman/Desktop/Projects/keras-hub/tools/checkpoint_conversion/convert_qwen_checkpoints.py", line 228, in test_model
    keras_hub_output = keras_hub_model(keras_hub_inputs)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/anshuman/.pyenv/versions/3.11.10/envs/qwen/lib/python3.11/site-packages/keras/src/utils/traceback_utils.py", line 122, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/Users/anshuman/.pyenv/versions/3.11.10/envs/qwen/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/anshuman/.pyenv/versions/3.11.10/envs/qwen/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/anshuman/.pyenv/versions/3.11.10/envs/qwen/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/anshuman/.pyenv/versions/3.11.10/envs/qwen/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/anshuman/Desktop/Projects/keras-hub/keras_hub/src/layers/modeling/reversible_embedding.py", line 129, in call
    return super().call(inputs)
           ^^^^^^^^^^^^^^^^^^^^
  File "/Users/anshuman/.pyenv/versions/3.11.10/envs/qwen/lib/python3.11/site-packages/torch/nn/functional.py", line 2516, in embedding
    return handle_torch_function(
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/anshuman/.pyenv/versions/3.11.10/envs/qwen/lib/python3.11/site-packages/torch/overrides.py", line 1720, in handle_torch_function
    result = mode.__torch_function__(public_api, types, args, kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/anshuman/.pyenv/versions/3.11.10/envs/qwen/lib/python3.11/site-packages/torch/utils/_device.py", line 104, in __torch_function__
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/anshuman/.pyenv/versions/3.11.10/envs/qwen/lib/python3.11/site-packages/torch/nn/functional.py", line 2551, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Exception encountered when calling ReversibleEmbedding.call().

Placeholder storage has not been allocated on MPS device!

Arguments received by ReversibleEmbedding.call():
  • inputs=torch.Tensor(shape=torch.Size([1, 5]), dtype=int32)
  • reverse=False


the stacktrace points that somewhere allocation is still happening on MPS, which I have already disabled !!

Copy link
Member

@mattdangerw mattdangerw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work! This is overall looking great. Still a few comments to work through.

Copy link
Member

@mattdangerw mattdangerw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM. Will merge after the final nit on exporting the attention layer.

return config

@staticmethod
def get_layout_map(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have we tested this at all? if not, maybe leave as a follow up

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope didn't test this.


print("\n-> Huggingface model and tokenizer loaded")

# === Check that the models and tokenizers outputs match ===
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably want to save the actual model from this (so we can upload the artifacts to kaggle when we convert). And could add some fancier print output, see

print("✅ Output validated")
keras_model.save_to_preset(preset)
keras_tokenizer.save_to_preset(preset)
print(f"🏁 Preset saved to ./{preset}")

This can be a follow up though!

@kanpuriyanawab
Copy link
Collaborator Author

@mattdangerw addressed all the comments!

@mattdangerw mattdangerw added the kokoro:force-run Runs Tests on GPU label Mar 17, 2025
@mattdangerw
Copy link
Member

Thank you! Once this is green I will pull.

@kokoro-team kokoro-team removed the kokoro:force-run Runs Tests on GPU label Mar 17, 2025
@mattdangerw mattdangerw enabled auto-merge (squash) March 17, 2025 20:41
@mattdangerw mattdangerw disabled auto-merge March 17, 2025 20:42
@mattdangerw mattdangerw merged commit 3061892 into keras-team:master Mar 17, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Qwen 2.5 to KerasHub
6 participants