Adding torch accelerator and requirements file to FSDP2 example #1375

dggaytan · 2025-07-21T20:21:13Z

Adding torch accelerator support to FSDP2 example and

Updates to FSDP2 example:

Script Renaming and Documentation Updates:
- Renamed train.py to example.py and updated references in README.md to reflect the new filename. Added instructions to install dependencies via requirements.txt before running the example.
GPU Verification and Device Initialization:
- Added a verify_min_gpu_count function to ensure at least two GPUs are available before running the example.
- Updated device initialization in main() to dynamically detect and configure the device type using torch.accelerator. This improves compatibility with different hardware setups.

New supporting files:

Dependency Management:
- Added a requirements.txt file listing required dependencies (torch>=2.7 and numpy).
Script for Running Examples:
- Introduced run_example.sh to simplify launching FSDP2 example.
Integration into Distributed Examples:
- Added a new function distributed_FSDP2 in run_distributed_examples.sh to include the FSDP2 example in the distributed testing workflow.
CC: @msaroufim @malfet @dvrogozh

netlify · 2025-07-21T20:21:19Z

✅ Deploy Preview for pytorch-examples-preview canceled.

Name	Link
🔨 Latest commit	`5e960d8`
🔍 Latest deploy log	https://app.netlify.com/projects/pytorch-examples-preview/deploys/68826ce9e58ebb000857417b

dvrogozh · 2025-07-23T18:10:47Z

distributed/FSDP2/example.py

-    torch.distributed.init_process_group(backend="nccl", device_id=device)
+    if torch.accelerator.is_available():
+        device_type = torch.accelerator.current_accelerator()
+        device: torch.device = torch.device(f"{device_type}:{rank}")


Why do we need device: torch.device = instead of just device =?

It was just a flag for me, but I'll change it to use just torch.device

dvrogozh · 2025-07-23T18:15:08Z

distributed/FSDP2/example.py

+        backend = torch.distributed.get_default_backend_for_device(device)
+        torch.distributed.init_process_group(backend=backend, device_id=device)


I think these 2 lines should work for cpu as well. You can simplify the code:

if torch.accelerator.is_available(): ... else: device = torch.device("cpu") backend = torch.distributed.get_default_backend_for_device(device) torch.distributed.init_process_group(backend=backend, device_id=device)

Signed-off-by: dggaytan <[email protected]>

soumith · 2025-08-06T01:37:53Z

thank you!

meta-cla bot added the cla signed label Jul 21, 2025

dvrogozh reviewed Jul 23, 2025

View reviewed changes

Adding torch accelerator and requirements file to FSDP2 example

5e960d8

Signed-off-by: dggaytan <[email protected]>

dggaytan force-pushed the dggaytan/distributed_FSDP2 branch from 1f0d7d3 to 5e960d8 Compare July 24, 2025 17:27

dggaytan requested a review from dvrogozh July 24, 2025 17:27

soumith approved these changes Aug 6, 2025

View reviewed changes

soumith merged commit 5a4ca92 into pytorch:main Aug 6, 2025
9 checks passed

dvrogozh mentioned this pull request Aug 6, 2025

Adding torch accelerator to ddp-tutorial-series example #1376

Closed

dggaytan mentioned this pull request Aug 14, 2025

Updated device-agnostic code on tutorials gera-aldama/audio#1

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adding torch accelerator and requirements file to FSDP2 example #1375

Adding torch accelerator and requirements file to FSDP2 example #1375

dggaytan commented Jul 21, 2025

Uh oh!

netlify bot commented Jul 21, 2025 •

edited

Loading

Uh oh!

dvrogozh Jul 23, 2025

Uh oh!

dggaytan Jul 24, 2025

Uh oh!

dggaytan Jul 24, 2025

Uh oh!

dvrogozh Jul 23, 2025

Uh oh!

dggaytan Jul 24, 2025

Uh oh!

Uh oh!

soumith commented Aug 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		backend = torch.distributed.get_default_backend_for_device(device)
		torch.distributed.init_process_group(backend=backend, device_id=device)

Adding torch accelerator and requirements file to FSDP2 example #1375

Adding torch accelerator and requirements file to FSDP2 example #1375

Conversation

dggaytan commented Jul 21, 2025

Updates to FSDP2 example:

New supporting files:

Uh oh!

netlify bot commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for pytorch-examples-preview canceled.

Uh oh!

dvrogozh Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

dggaytan Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

dggaytan Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

dvrogozh Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

dggaytan Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

soumith commented Aug 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

netlify bot commented Jul 21, 2025 •

edited

Loading