Always ignore freqs_cis #1338

fegin · 2025-06-25T07:03:19Z

We should always ignore freq_cis and other parameters in excluded_parameters_for_model_only to avoid confusion.

TODO: Is this going to break PP with seed checkpoint?

Summary: We should always ignore freq_cis and other parameters in excluded_parameters_for_model_only to avoid confusion.

tianyu-l · 2025-06-25T21:29:49Z

TODO: Is this going to break PP with seed checkpoint?

What's the reason for breaking PP?
Currently init_weights (including the buffer) is called, regardless of whether loading from a checkpoint (which doesn't sound optimal). So shouldn't cause issues?

torchtitan/components/checkpoint.py

fegin · 2025-06-26T01:26:07Z

What's the reason for breaking PP?
Currently init_weights (including the buffer) is called, regardless of whether loading from a checkpoint (which doesn't sound optimal). So shouldn't cause issues?

Not talking about the issue that there will be an exception. I'm wondering do we still need freqs_cis in the seed checkpoint to ensure that freqs_cis being consistent across PP stages?

tianyu-l

I'm wondering do we still need freqs_cis in the seed checkpoint to ensure that freqs_cis being consistent across PP stages?

I don't think we need it in the seed checkpoint.
Every PP rank would init the same freqs_cis before loading a checkpoint
https://github.com/pytorch/torchtitan/blob/main/torchtitan/models/llama3/model/model.py#L372

tianyu-l · 2025-06-26T02:28:06Z

The test generate script failed because it's not using CheckpointManager, but using dcp.load directly. Shall we change it and also use CheckpointManager?

fegin · 2025-06-26T16:17:58Z

Okay, let's merge the PR since seed checkpoint doesn't seem to be a concern.

We should always ignore freq_cis and other parameters in excluded_parameters_for_model_only to avoid confusion. TODO: Is this going to break PP with seed checkpoint?

Since #1338 the `freqs_cis` buffer is no longer persisted/read in any code path with the intention being that it is re-calculated at the model loading/initialization. However this requires calling `init_weights` on the model, which `scripts/test_generate.py` currently is not doing. As of right now running generation on the pretrained Llama 3 models will result in garbled outputs Convert weights: `python ./scripts/convert_llama_to_dcp.py /home/emozilla/hf/Llama-3-8B/original /home/emozilla/dcp/Llama-3-8B` Run generation: `CONFIG_FILE=./torchtitan/models/llama3/train_configs/llama3_8b.toml CHECKPOINT_DIR=/home/emozilla/dcp/Llama-3-8B PROMPT="A long time ago in a galaxy far, far away" ./scripts/generate/run_llama_generate.sh` HEAD ``` <|begin_of_text|>A long time ago in a galaxy far, far away000 centershift Equity KelleyYe требаyrais& Romgraph1Kォ IDEA globalčil at390dagThe,inLikeBelow uptimeRoman_constsBothtz_RATE phủ ``` With fix ``` <|begin_of_text|>A long time ago in a galaxy far, far away… Aspirations were bursting and Jedi were making a big imprint in the arts, in the government, and in our lives. That was 34 or ```

Always ignore freq_cis

2540836

Summary: We should always ignore freq_cis and other parameters in excluded_parameters_for_model_only to avoid confusion.

fegin requested review from tianyu-l and wwwjn as code owners June 25, 2025 07:03

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jun 25, 2025

misc

6b3ba49

tianyu-l reviewed Jun 25, 2025

View reviewed changes

torchtitan/components/checkpoint.py Show resolved Hide resolved

fegin added 2 commits June 25, 2025 18:19

misc

95656fd

misc

2d83645

fegin changed the title ~~Always ignore freq_cis~~ Always ignore freqs_cis Jun 26, 2025

tianyu-l approved these changes Jun 26, 2025

View reviewed changes

misc

2e1eb51

fegin merged commit aefe15a into main Jun 26, 2025
6 of 7 checks passed

fegin deleted the fegin/ignore_freq_cis branch June 26, 2025 16:18

wwwjn pushed a commit that referenced this pull request Jul 1, 2025

Always ignore freqs_cis (#1338)

f3811a9

We should always ignore freq_cis and other parameters in excluded_parameters_for_model_only to avoid confusion. TODO: Is this going to break PP with seed checkpoint?

jquesnelle mentioned this pull request Jul 8, 2025

call init_weights before generation #1371

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Always ignore freqs_cis #1338

Always ignore freqs_cis #1338

fegin commented Jun 25, 2025

Uh oh!

tianyu-l commented Jun 25, 2025

Uh oh!

Uh oh!

fegin commented Jun 26, 2025 •

edited

Loading

Uh oh!

tianyu-l left a comment

Uh oh!

tianyu-l commented Jun 26, 2025

Uh oh!

fegin commented Jun 26, 2025

Uh oh!

Uh oh!

Uh oh!

Always ignore freqs_cis #1338

Always ignore freqs_cis #1338

Conversation

fegin commented Jun 25, 2025

Uh oh!

tianyu-l commented Jun 25, 2025

Uh oh!

Uh oh!

fegin commented Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tianyu-l left a comment

Choose a reason for hiding this comment

Uh oh!

tianyu-l commented Jun 26, 2025

Uh oh!

fegin commented Jun 26, 2025

Uh oh!

Uh oh!

Uh oh!

fegin commented Jun 26, 2025 •

edited

Loading