Add Qwen 2.5 #2088

kanpuriyanawab · 2025-02-09T11:25:40Z

Closes #2078

References:

Qwen 2.5 uses Qwen2 backbone from Huggingface Transformers
HF Config path
HF Source Code

abheesht17 · 2025-02-10T00:38:56Z

Thanks for the PR! Before review, could you please do a forward pass and match the output with HF's Qwen? Also, let's make it a draft PR till then

abheesht17

Took a cursory glance. Let's do the weight conversion and numerics check first!

keras_hub/src/models/qwen/qwen_attention.py

divyashreepathihalli · 2025-02-12T19:15:13Z

to fix code format error
you will need to run shell/api_gen.sh at root
if you don't have ruff install ruff pip install ruff and then run shell/format.sh at root.

abheesht17 · 2025-02-18T16:35:16Z

@shivance - let us know when this PR is ready for review. Thanks!

kanpuriyanawab · 2025-02-18T16:53:26Z

@abheesht17 I have got tokenizer working currently, I am working on matching output of HF model and keras model.
Thanks for patience!

abheesht17 · 2025-02-18T17:00:05Z

Great, no hurry. Was just checking. Do ping if you hit any blockers :)

keras_hub/src/models/qwen/qwen_attention.py

kanpuriyanawab · 2025-02-18T17:40:43Z

@abheesht17 I see that in newer checkpoint conversion script we use set_weights method, eg.

keras_hub_model.transformer_layers[
            i
        ]._self_attention_layer._query_dense.set_weights(
            [
                hf_model.model.layers[i]
                .self_attn.q_proj.weight.T.reshape(
                    config.hidden_size,
                    config.num_attention_heads,
                    config.hidden_size // config.num_attention_heads,
                )
                .detach()
                .cpu()
                .float()
                .numpy()
            ]
        )

instead of old kernel assign

keras_hub_model.get_layer(
            f"f_net_layer_{i}"
        )._intermediate_dense.kernel.assign(
            hf_wts[f"encoder.layer.{i}.intermediate.dense.weight"]
            .transpose(1, 0)
            .numpy()
        )

Has API changed for assigning bias as well? Why was the new method created, What is the difference?

kanpuriyanawab · 2025-02-18T19:09:00Z

@abheesht17 upon weight loading, outputs look like this!
there is still some delta here,

np.testing.assert_allclose(
            keras_hub_logits, hf_output_logits, atol=1e-3
        )

succeeds, i.e. absolute tolerance 1e-3.

I am testing at fp32, since it's a 0.5B model.

kanpuriyanawab · 2025-02-19T05:28:55Z

@abheesht17 i have marked this PR as ready for review

abheesht17 · 2025-02-20T06:18:57Z

@abheesht17 i have marked this PR as ready for review

Great. Were you able to bring the difference in numerics down to 1e-5? Might be worth checking layer-by-layer which one's causing an issue.

abheesht17 · 2025-02-21T05:28:59Z

~~@shivance - can you please share the weight conversion Colab as well?~~

Edit: never mind, the conversion script is part of the PR.

kanpuriyanawab · 2025-02-22T12:57:45Z

@abheesht17 here is the colab version of conversion script.

kanpuriyanawab · 2025-02-24T06:08:08Z

@abheesht17 did you get a chance to inspect the delta in output?

mattdangerw

Thanks! Just some initial comments and questions.

keras_hub/src/models/causal_lm_preprocessor.py

keras_hub/src/models/qwen/qwen_backbone.py

mattdangerw · 2025-02-24T20:13:02Z

keras_hub/src/models/qwen/qwen_tokenizer.py

+        misc_special_tokens -= {eos_token}
+
+        # Add misc special tokens
+        for i, token in enumerate(misc_special_tokens):


What are these used for? I don't see these used anywhere. A lot of tokenizers have reserved and unused tokens (e.g. for bert the first thousand I think), we don't generally give them special treatment.

I just followed llama3 tokenizer!

keras_hub/src/models/qwen/qwen_tokenizer.py

tools/checkpoint_conversion/convert_qwen_checkpoints.py

pass-lin · 2025-03-01T12:04:23Z

I think it's necessary to check in detail where the error is. As much as possible, we should ensure that the fp32 error is around 1e-5 under the torch backend. The maximum error of bf16 should not exceed 1e-2.
I've also implemented a Keras model with a similar error, and this level of error would cause a significant decrease in inference performance, as well as repetition.

kanpuriyanawab · 2025-03-08T11:11:10Z

@mattdangerw / @abheesht17 / @divyashreepathihalli How do you completely disable MPS backend with Keras?

Please take a look at latest conversion script, despite I am moving model to cpu using keras.device and also moving inputs, reversible embedding call step exits with

stacktrace

-> Keras 3 model and tokenizer loaded.
Traceback (most recent call last):
  File "/Users/anshuman/.pyenv/versions/3.11.10/lib/python3.11/runpy.py", line 198, in _run_module_as_main
    return _run_code(code, main_globals, None,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/anshuman/.pyenv/versions/3.11.10/lib/python3.11/runpy.py", line 88, in _run_code
    exec(code, run_globals)
  File "/Users/anshuman/.cursor/extensions/ms-python.debugpy-2024.6.0-darwin-arm64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/__main__.py", line 39, in <module>
    cli.main()
  File "/Users/anshuman/.cursor/extensions/ms-python.debugpy-2024.6.0-darwin-arm64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main
    run()
  File "/Users/anshuman/.cursor/extensions/ms-python.debugpy-2024.6.0-darwin-arm64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 284, in run_file
    runpy.run_path(target, run_name="__main__")
  File "/Users/anshuman/.cursor/extensions/ms-python.debugpy-2024.6.0-darwin-arm64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 321, in run_path
    return _run_module_code(code, init_globals, run_name,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/anshuman/.cursor/extensions/ms-python.debugpy-2024.6.0-darwin-arm64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 135, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/Users/anshuman/.cursor/extensions/ms-python.debugpy-2024.6.0-darwin-arm64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code
    exec(code, run_globals)
  File "/Users/anshuman/Desktop/Projects/keras-hub/tools/checkpoint_conversion/convert_qwen_checkpoints.py", line 353, in <module>
    app.run(main)
  File "/Users/anshuman/.pyenv/versions/3.11.10/envs/qwen/lib/python3.11/site-packages/absl/app.py", line 308, in run
    _run_main(main, args)
  File "/Users/anshuman/.pyenv/versions/3.11.10/envs/qwen/lib/python3.11/site-packages/absl/app.py", line 254, in _run_main
    sys.exit(main(argv))
             ^^^^^^^^^^
  File "/Users/anshuman/Desktop/Projects/keras-hub/tools/checkpoint_conversion/convert_qwen_checkpoints.py", line 336, in main
    test_model(keras_hub_model, keras_hub_tokenizer, hf_model, hf_tokenizer)
  File "/Users/anshuman/Desktop/Projects/keras-hub/tools/checkpoint_conversion/convert_qwen_checkpoints.py", line 228, in test_model
    keras_hub_output = keras_hub_model(keras_hub_inputs)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/anshuman/.pyenv/versions/3.11.10/envs/qwen/lib/python3.11/site-packages/keras/src/utils/traceback_utils.py", line 122, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/Users/anshuman/.pyenv/versions/3.11.10/envs/qwen/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/anshuman/.pyenv/versions/3.11.10/envs/qwen/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/anshuman/.pyenv/versions/3.11.10/envs/qwen/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/anshuman/.pyenv/versions/3.11.10/envs/qwen/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/anshuman/Desktop/Projects/keras-hub/keras_hub/src/layers/modeling/reversible_embedding.py", line 129, in call
    return super().call(inputs)
           ^^^^^^^^^^^^^^^^^^^^
  File "/Users/anshuman/.pyenv/versions/3.11.10/envs/qwen/lib/python3.11/site-packages/torch/nn/functional.py", line 2516, in embedding
    return handle_torch_function(
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/anshuman/.pyenv/versions/3.11.10/envs/qwen/lib/python3.11/site-packages/torch/overrides.py", line 1720, in handle_torch_function
    result = mode.__torch_function__(public_api, types, args, kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/anshuman/.pyenv/versions/3.11.10/envs/qwen/lib/python3.11/site-packages/torch/utils/_device.py", line 104, in __torch_function__
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/anshuman/.pyenv/versions/3.11.10/envs/qwen/lib/python3.11/site-packages/torch/nn/functional.py", line 2551, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Exception encountered when calling ReversibleEmbedding.call().

Placeholder storage has not been allocated on MPS device!

Arguments received by ReversibleEmbedding.call():
  • inputs=torch.Tensor(shape=torch.Size([1, 5]), dtype=int32)
  • reverse=False

the stacktrace points that somewhere allocation is still happening on MPS, which I have already disabled !!

mattdangerw

Awesome work! This is overall looking great. Still a few comments to work through.

keras_hub/src/models/causal_lm_preprocessor.py

keras_hub/src/models/qwen/qwen_attention.py

keras_hub/src/models/qwen/qwen_backbone.py

keras_hub/src/models/qwen/qwen_layernorm.py

keras_hub/src/models/qwen/qwen_presets.py

keras_hub/src/models/qwen/qwen_tokenizer.py

tools/checkpoint_conversion/convert_qwen_checkpoints.py

mattdangerw

Thanks! LGTM. Will merge after the final nit on exporting the attention layer.

keras_hub/src/models/qwen/qwen_attention.py

keras_hub/src/models/qwen/qwen_backbone.py

mattdangerw · 2025-03-17T17:11:54Z

keras_hub/src/models/qwen/qwen_backbone.py

+        return config
+
+    @staticmethod
+    def get_layout_map(


have we tested this at all? if not, maybe leave as a follow up

nope didn't test this.

keras_hub/src/models/qwen/qwen_tokenizer.py

mattdangerw · 2025-03-17T17:19:54Z

tools/checkpoint_conversion/convert_qwen_checkpoints.py

+
+    print("\n-> Huggingface model and tokenizer loaded")
+
+    # === Check that the models and tokenizers outputs match ===


You probably want to save the actual model from this (so we can upload the artifacts to kaggle when we convert). And could add some fancier print output, see

keras-hub/tools/checkpoint_conversion/convert_gemma_checkpoints.py

Lines 262 to 266 in 6b76c07

print("✅ Output validated")

keras_model.save_to_preset(preset)

keras_tokenizer.save_to_preset(preset)

print(f"🏁 Preset saved to ./{preset}")

This can be a follow up though!

kanpuriyanawab · 2025-03-17T17:58:11Z

@mattdangerw addressed all the comments!

mattdangerw · 2025-03-17T18:03:41Z

Thank you! Once this is green I will pull.

qwen 2 backbone initial commit

7124dcf

kanpuriyanawab self-assigned this Feb 9, 2025

kanpuriyanawab requested review from mattdangerw, abheesht17 and divyashreepathihalli February 9, 2025 17:09

kanpuriyanawab changed the title ~~Add Qwen 2.5~~ [WIP] Add Qwen 2.5 Feb 9, 2025

abheesht17 requested changes Feb 10, 2025

View reviewed changes

keras_hub/src/models/qwen/qwen_attention.py Outdated Show resolved Hide resolved

kanpuriyanawab marked this pull request as draft February 10, 2025 10:00

kanpuriyanawab added 3 commits February 11, 2025 19:48

refactor: remove dependency on llama

f7ec4ef

add preset

8f3092d

checkpoint conversion wip

f7e8ee5

kanpuriyanawab added 2 commits February 18, 2025 23:06

add bias to attention

ea89608

add bias init to config

e6bf5f7

kanpuriyanawab commented Feb 18, 2025

View reviewed changes

keras_hub/src/models/qwen/qwen_attention.py Show resolved Hide resolved

lint + format

25f412b

kanpuriyanawab added 2 commits February 19, 2025 00:40

wip on weight matching

10c4d9e

lint + format

4b9d952

kanpuriyanawab changed the title ~~[WIP] Add Qwen 2.5~~ Add Qwen 2.5 Feb 18, 2025

kanpuriyanawab marked this pull request as ready for review February 18, 2025 19:20

Merge branch 'master' into qwen2.5

472a274

change tolerance to 1e-3

e38764c

add model conversion util

d747c8f

mattdangerw reviewed Feb 24, 2025

View reviewed changes

lint format + docstrings

6e00afd

test on cpu

9cdc3c7

kanpuriyanawab and others added 3 commits March 12, 2025 01:10

atol diff 1e-4 yayy

c37829f

lint format

7bd1e5e

Merge branch 'keras-team:master' into qwen2.5

5ce61c3

mattdangerw reviewed Mar 13, 2025

View reviewed changes

kanpuriyanawab added 7 commits March 14, 2025 15:47

address comments

caed442

Merge branch 'master' into qwen2.5

8994657

update conversion script

8f5f602

fix weight loading in qwen checkpoint loader utility

20c6a39

code format ci fix

eeb2640

fix lint in tokenizer loading

b2cf491

run api_gen

631357c

mattdangerw approved these changes Mar 17, 2025

View reviewed changes

address comments

a8572d0

mattdangerw added the kokoro:force-run Runs Tests on GPU label Mar 17, 2025

kokoro-team removed the kokoro:force-run Runs Tests on GPU label Mar 17, 2025

mattdangerw enabled auto-merge (squash) March 17, 2025 20:41

mattdangerw disabled auto-merge March 17, 2025 20:42

mattdangerw merged commit 3061892 into keras-team:master Mar 17, 2025
10 checks passed


		print("\n-> Huggingface model and tokenizer loaded")

		# === Check that the models and tokenizers outputs match ===

	print("✅ Output validated")

	keras_model.save_to_preset(preset)
	keras_tokenizer.save_to_preset(preset)
	print(f"🏁 Preset saved to ./{preset}")

Add Qwen 2.5 #2088

Add Qwen 2.5 #2088

Uh oh!

Conversation

kanpuriyanawab commented Feb 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abheesht17 commented Feb 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abheesht17 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

divyashreepathihalli commented Feb 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abheesht17 commented Feb 18, 2025

Uh oh!

kanpuriyanawab commented Feb 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abheesht17 commented Feb 18, 2025

Uh oh!

Uh oh!

kanpuriyanawab commented Feb 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kanpuriyanawab commented Feb 18, 2025

Uh oh!

kanpuriyanawab commented Feb 19, 2025

Uh oh!

abheesht17 commented Feb 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abheesht17 commented Feb 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kanpuriyanawab commented Feb 22, 2025

Uh oh!

kanpuriyanawab commented Feb 24, 2025

Uh oh!

mattdangerw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mattdangerw Feb 24, 2025

Choose a reason for hiding this comment

Uh oh!

kanpuriyanawab Mar 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

pass-lin commented Mar 1, 2025

Uh oh!

kanpuriyanawab commented Mar 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattdangerw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mattdangerw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kanpuriyanawab commented Feb 9, 2025 •

edited

Loading

abheesht17 commented Feb 10, 2025 •

edited

Loading

divyashreepathihalli commented Feb 12, 2025 •

edited

Loading

kanpuriyanawab commented Feb 18, 2025 •

edited

Loading

kanpuriyanawab commented Feb 18, 2025 •

edited

Loading

abheesht17 commented Feb 20, 2025 •

edited

Loading

abheesht17 commented Feb 21, 2025 •

edited

Loading

kanpuriyanawab commented Mar 8, 2025 •

edited

Loading