Skip to content

refactor ParallelDims and CheckpointManager #1384

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

tianyu-l
Copy link
Contributor

This PR does the following:

  1. move world_mesh into ParallelDims, as they have a close relationship
  2. move enable_loss_parallel out of ParallelDims constructor
  3. add a convenient property seq_len_divisor to ParallelDims
  4. set dataloader and ft_manager as optional in CheckpointManager

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jul 12, 2025
@tianyu-l
Copy link
Contributor Author

cc @ebsmothers

@@ -180,17 +180,19 @@ class CheckpointManager:

def __init__(
self,
dataloader: DataLoader,
dataloader: BaseDataLoader | None,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ebsmothers
I kept this field to be required and value to be optional -- the code still works. I didn't make it completely optional with a None default because that would require more if-else in this file.

I think it won't look too bad when I specify dataloader=None in forge_engine.py. Let me know if it's ok to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants