-
Notifications
You must be signed in to change notification settings - Fork 425
Always ignore freqs_cis #1338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Always ignore freqs_cis #1338
Conversation
Summary: We should always ignore freq_cis and other parameters in excluded_parameters_for_model_only to avoid confusion.
What's the reason for breaking PP? |
Not talking about the issue that there will be an exception. I'm wondering do we still need freqs_cis in the seed checkpoint to ensure that freqs_cis being consistent across PP stages? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering do we still need freqs_cis in the seed checkpoint to ensure that freqs_cis being consistent across PP stages?
I don't think we need it in the seed checkpoint.
Every PP rank would init the same freqs_cis
before loading a checkpoint
https://github.com/pytorch/torchtitan/blob/main/torchtitan/models/llama3/model/model.py#L372
The test generate script failed because it's not using |
Okay, let's merge the PR since seed checkpoint doesn't seem to be a concern. |
We should always ignore freq_cis and other parameters in excluded_parameters_for_model_only to avoid confusion. TODO: Is this going to break PP with seed checkpoint?
We should always ignore freq_cis and other parameters in excluded_parameters_for_model_only to avoid confusion. TODO: Is this going to break PP with seed checkpoint?
Since #1338 the `freqs_cis` buffer is no longer persisted/read in any code path with the intention being that it is re-calculated at the model loading/initialization. However this requires calling `init_weights` on the model, which `scripts/test_generate.py` currently is not doing. As of right now running generation on the pretrained Llama 3 models will result in garbled outputs Convert weights: `python ./scripts/convert_llama_to_dcp.py /home/emozilla/hf/Llama-3-8B/original /home/emozilla/dcp/Llama-3-8B` Run generation: `CONFIG_FILE=./torchtitan/models/llama3/train_configs/llama3_8b.toml CHECKPOINT_DIR=/home/emozilla/dcp/Llama-3-8B PROMPT="A long time ago in a galaxy far, far away" ./scripts/generate/run_llama_generate.sh` HEAD ``` <|begin_of_text|>A long time ago in a galaxy far, far away000 centershift Equity KelleyYe требаyrais& Romgraph1Kォ IDEA globalčil at390dagThe,inLikeBelow uptimeRoman_constsBothtz_RATE phủ ``` With fix ``` <|begin_of_text|>A long time ago in a galaxy far, far away… Aspirations were bursting and Jedi were making a big imprint in the arts, in the government, and in our lives. That was 34 or ```
We should always ignore freq_cis and other parameters in excluded_parameters_for_model_only to avoid confusion.
TODO: Is this going to break PP with seed checkpoint?