[DP] Create placement groups by ray_device_key #25026

xinyu-intel · 2025-09-17T03:19:44Z

to support create placement groups for dp with other kinds of ray device.

github-actions · 2025-09-17T03:19:53Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

gemini-code-assist

Code Review

This pull request refactors the data parallel placement group creation logic to use a dynamic device key from current_platform.ray_device_key instead of a hardcoded "GPU" string. This is a good improvement for supporting various hardware backends like TPUs. My review identifies a potential KeyError if data parallelism is attempted on a platform where ray_device_key is empty. I've suggested adding assertions to provide a clear error message in such cases.

gemini-code-assist · 2025-09-17T03:20:57Z

vllm/v1/engine/utils.py

            "No nodes with resources found in Ray cluster.")
        assert dp_master_ip_key in nodes[0], (
            "The DP master node (ip: %s) is missing or dead", dp_master_ip)
+        device_str = current_platform.ray_device_key


The ray_device_key can be an empty string for platforms that do not support Ray. If device_str is empty, it will cause a KeyError when used to access node_resources on line 345. It's better to add an assertion to ensure device_str is not empty, which provides an early and clear error message if data parallelism is attempted on an unsupported platform.

device_str = current_platform.ray_device_key assert device_str, ( "current_platform.ray_device_key is empty, indicating that data " "parallelism with Ray is not supported on this platform.")

gemini-code-assist · 2025-09-17T03:20:57Z

vllm/v1/engine/utils.py

        local_dp_ranks = []
        num_pg_created = 0

+        device_str = current_platform.ray_device_key


Similar to the create_dp_placement_groups function, ray_device_key can be an empty string here. If device_str is empty, it will cause a KeyError when used to access available_resources and total_resources on lines 427 and 431. An assertion should be added to ensure device_str is not empty and fail early on unsupported platforms.

device_str = current_platform.ray_device_key assert device_str, ( "current_platform.ray_device_key is empty, indicating that data " "parallelism with Ray is not supported on this platform.")

jikunshang

LGTM.

Signed-off-by: Xinyu Chen <[email protected]>

Signed-off-by: Xinyu Chen <[email protected]> Co-authored-by: Kunshang Ji <[email protected]>

Signed-off-by: Xinyu Chen <[email protected]> Co-authored-by: Kunshang Ji <[email protected]> Signed-off-by: charlifu <[email protected]>

Signed-off-by: Xinyu Chen <[email protected]> Co-authored-by: Kunshang Ji <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

Signed-off-by: Xinyu Chen <[email protected]> Co-authored-by: Kunshang Ji <[email protected]>

xinyu-intel requested review from WoosukKwon, alexm-redhat, comaniac, njhill, robertgshaw2-redhat and ywang96 as code owners September 17, 2025 03:19

mergify bot added the v1 label Sep 17, 2025

gemini-code-assist bot reviewed Sep 17, 2025

View reviewed changes

jikunshang approved these changes Sep 17, 2025

View reviewed changes

jikunshang added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 17, 2025

xinyu-intel and others added 2 commits September 17, 2025 11:32

[DP] Create placement groups by ray_device_key

f58c5fe

Signed-off-by: Xinyu Chen <[email protected]>

Merge branch 'main' into dev/xinyu/dp-ray-device-key

539be8c

jikunshang enabled auto-merge (squash) September 17, 2025 06:47

jikunshang merged commit bb58dc8 into vllm-project:main Sep 17, 2025
42 checks passed

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

[DP] Create placement groups by ray_device_key (vllm-project#25026)

74e9bcc

Signed-off-by: Xinyu Chen <[email protected]> Co-authored-by: Kunshang Ji <[email protected]>

choprahetarth pushed a commit to Tandemn-Labs/vllm that referenced this pull request Oct 11, 2025

[DP] Create placement groups by ray_device_key (vllm-project#25026)

29f3fd8

Signed-off-by: Xinyu Chen <[email protected]> Co-authored-by: Kunshang Ji <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[DP] Create placement groups by ray_device_key #25026

[DP] Create placement groups by ray_device_key #25026

xinyu-intel commented Sep 17, 2025

Uh oh!

github-actions bot commented Sep 17, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Sep 17, 2025

Uh oh!

gemini-code-assist bot Sep 17, 2025

Uh oh!

jikunshang left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

[DP] Create placement groups by ray_device_key #25026

[DP] Create placement groups by ray_device_key #25026

Conversation

xinyu-intel commented Sep 17, 2025

Uh oh!

github-actions bot commented Sep 17, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

jikunshang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants