Skip to content

Always ignore freqs_cis #1338

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 26, 2025
Merged

Always ignore freqs_cis #1338

merged 5 commits into from
Jun 26, 2025

Conversation

fegin
Copy link
Contributor

@fegin fegin commented Jun 25, 2025

We should always ignore freq_cis and other parameters in excluded_parameters_for_model_only to avoid confusion.

TODO: Is this going to break PP with seed checkpoint?

Summary:
We should always ignore freq_cis and other parameters in excluded_parameters_for_model_only to avoid confusion.
@fegin fegin requested review from tianyu-l and wwwjn as code owners June 25, 2025 07:03
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jun 25, 2025
@tianyu-l
Copy link
Contributor

TODO: Is this going to break PP with seed checkpoint?

What's the reason for breaking PP?
Currently init_weights (including the buffer) is called, regardless of whether loading from a checkpoint (which doesn't sound optimal). So shouldn't cause issues?

@fegin
Copy link
Contributor Author

fegin commented Jun 26, 2025

What's the reason for breaking PP?
Currently init_weights (including the buffer) is called, regardless of whether loading from a checkpoint (which doesn't sound optimal). So shouldn't cause issues?

Not talking about the issue that there will be an exception. I'm wondering do we still need freqs_cis in the seed checkpoint to ensure that freqs_cis being consistent across PP stages?

@fegin fegin changed the title Always ignore freq_cis Always ignore freqs_cis Jun 26, 2025
Copy link
Contributor

@tianyu-l tianyu-l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering do we still need freqs_cis in the seed checkpoint to ensure that freqs_cis being consistent across PP stages?

I don't think we need it in the seed checkpoint.
Every PP rank would init the same freqs_cis before loading a checkpoint
https://github.com/pytorch/torchtitan/blob/main/torchtitan/models/llama3/model/model.py#L372

@tianyu-l
Copy link
Contributor

The test generate script failed because it's not using CheckpointManager, but using dcp.load directly. Shall we change it and also use CheckpointManager?

@fegin
Copy link
Contributor Author

fegin commented Jun 26, 2025

Okay, let's merge the PR since seed checkpoint doesn't seem to be a concern.

@fegin fegin merged commit aefe15a into main Jun 26, 2025
6 of 7 checks passed
@fegin fegin deleted the fegin/ignore_freq_cis branch June 26, 2025 16:18
wwwjn pushed a commit that referenced this pull request Jul 1, 2025
We should always ignore freq_cis and other parameters in
excluded_parameters_for_model_only to avoid confusion.

TODO: Is this going to break PP with seed checkpoint?
mori360 pushed a commit to mori360/torchtitan that referenced this pull request Jul 8, 2025
We should always ignore freq_cis and other parameters in
excluded_parameters_for_model_only to avoid confusion.

TODO: Is this going to break PP with seed checkpoint?
tianyu-l pushed a commit that referenced this pull request Jul 8, 2025
Since #1338 the `freqs_cis`
buffer is no longer persisted/read in any code path with the intention
being that it is re-calculated at the model loading/initialization.
However this requires calling `init_weights` on the model, which
`scripts/test_generate.py` currently is not doing.

As of right now running generation on the pretrained Llama 3 models will
result in garbled outputs

Convert weights: `python ./scripts/convert_llama_to_dcp.py
/home/emozilla/hf/Llama-3-8B/original /home/emozilla/dcp/Llama-3-8B`


Run generation:
`CONFIG_FILE=./torchtitan/models/llama3/train_configs/llama3_8b.toml
CHECKPOINT_DIR=/home/emozilla/dcp/Llama-3-8B PROMPT="A long time ago in
a galaxy far, far away" ./scripts/generate/run_llama_generate.sh`

HEAD
```
<|begin_of_text|>A long time ago in a galaxy far, far away000 centershift Equity KelleyYe требаyrais&
 Romgraph1Kォ IDEA globalčil at390dagThe,inLikeBelow uptimeRoman_constsBothtz_RATE phủ
 ```
 
 With fix
 ```
 <|begin_of_text|>A long time ago in a galaxy far, far away… Aspirations were bursting and Jedi were making a big imprint in the arts, in the government, and in our lives.  That was 34 or
 ```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants