[core] add bucket padding to tpu_model_runner #14995

Chenyaaang · 2025-03-18T03:34:29Z

Add bucket padding to tpu, instead of padding to the power of 2, if num_token < bucket_padding_gap, pad to the nearest power of 2, if num_token > bucket_padding_gap, the padding size is increased by bucket_padding_gap.
For example, bucket_padding_gap = 64, max_num_batch_tokens = 512, then the paddings will be 16, 32, 64, 128, 192, 256, 320, 384, 448, 512. This helps reduce the computation cost for large num_tokens, e.g. num_tokens = 300, instead of padding to 512, now pad to 320.

FIX #14581

Signed-off-by: Chenyaaang <[email protected]>

github-actions · 2025-03-18T03:34:40Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: Chenyaaang <[email protected]>

vllm/config.py

vllm/v1/worker/tpu_model_runner.py

yaochengji · 2025-03-18T18:06:59Z

@Chenyaaang Thanks for your contribution, left some comments above!

@robertgshaw2-redhat I know there's another configuration option cudagraph_capture_sizes, which is similar to bucket_padding_gap in this PR, do you think we should merge them into one, and has default value for different platform?

Signed-off-by: Chenyaaang <[email protected]>

mergify · 2025-03-18T23:04:21Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Chenyaaang.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Chenyaaang <[email protected]>

alexm-redhat

Good idea!

DarkLight1337 · 2025-03-20T04:37:19Z

Please fix the merge conflict

NickLucche

looks good! Also in favor of unifying with cudagraph_capture_sizes

vllm/v1/worker/tpu_model_runner.py

mergify · 2025-03-20T14:04:31Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Chenyaaang.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Chenyaaang <[email protected]>

robertgshaw2-redhat · 2025-03-21T14:24:15Z

This PR seems to have broken CUDA

alexm-redhat · 2025-03-21T14:25:48Z

@Chenyaaang thanks for reducing the padding gap, this is useful. Could you please address @robertgshaw2-redhat comment and fix the build (so we can merge it). Thanks!

Signed-off-by: Chenyaaang <[email protected]>

Chenyaaang · 2025-03-21T20:27:43Z

@Chenyaaang thanks for reducing the padding gap, this is useful. Could you please address @robertgshaw2-redhat comment and fix the build (so we can merge it). Thanks!

I've fixed the build and replied to @robertgshaw2-redhat's comment.

Signed-off-by: [email protected] <[email protected]>

Signed-off-by: Chenyaaang <[email protected]>

lsy323 · 2025-03-24T20:56:02Z

tests/tpu/test_compilation.py

-# Check we have 4 compiled codes
-assert len(compiled_codes) == 4
+# Check we have 3 compiled codes
+assert len(compiled_codes) == 3


@Chenyaaang Thanks for looking into this! Could we branch out for v0 and v1? In v0 it should be 4 compiled code.

Done, thanks!

Signed-off-by: Chenyaaang <[email protected]>

Signed-off-by: [email protected] <[email protected]>

This reverts commit 7d92244. Signed-off-by: Chenyaaang <[email protected]>

This reverts commit f7bdb02. Signed-off-by: Chenyaaang <[email protected]>

Signed-off-by: Chenyaaang <[email protected]>

Signed-off-by: Chenyaaang <[email protected]> Signed-off-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Signed-off-by: Wes Medford <[email protected]>

Signed-off-by: Chenyaaang <[email protected]> Signed-off-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>

Signed-off-by: Chenyaaang <[email protected]> Signed-off-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]>

Signed-off-by: Chenyaaang <[email protected]> Signed-off-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Signed-off-by: Mu Huai <[email protected]>

add bucket padding to tpu

048e3b1

Signed-off-by: Chenyaaang <[email protected]>

Chenyaaang requested review from WoosukKwon, alexm-redhat, comaniac, njhill, robertgshaw2-redhat and ywang96 as code owners March 18, 2025 03:34

mergify bot added ci/build v1 labels Mar 18, 2025

revert test.txt

5c93dfd

Signed-off-by: Chenyaaang <[email protected]>

yaochengji reviewed Mar 18, 2025

View reviewed changes

vllm/config.py Outdated Show resolved Hide resolved

vllm/v1/worker/tpu_model_runner.py Outdated Show resolved Hide resolved

vllm/v1/worker/tpu_model_runner.py Outdated Show resolved Hide resolved

move bucket_padding to compilationConfig and add unit test

4f8f3d2

Signed-off-by: Chenyaaang <[email protected]>

mergify bot added the needs-rebase label Mar 18, 2025

Merge remote-tracking branch 'origin/main' into bucket_padding

da80880

Signed-off-by: Chenyaaang <[email protected]>

mergify bot removed the needs-rebase label Mar 18, 2025

fix bug

3869ec4

Signed-off-by: Chenyaaang <[email protected]>

alexm-redhat approved these changes Mar 19, 2025

View reviewed changes

alexm-redhat enabled auto-merge (squash) March 19, 2025 21:09

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 19, 2025

NickLucche requested changes Mar 20, 2025

View reviewed changes

vllm/v1/worker/tpu_model_runner.py Outdated Show resolved Hide resolved

vllm/v1/worker/tpu_model_runner.py Outdated Show resolved Hide resolved

vllm/v1/worker/tpu_model_runner.py Outdated Show resolved Hide resolved

mergify bot added the needs-rebase label Mar 20, 2025

Chenyaaang added 2 commits March 20, 2025 16:46

fix comments

21ba037

Signed-off-by: Chenyaaang <[email protected]>

Merge remote-tracking branch 'origin/main' into bucket_padding

7e18e5a

Signed-off-by: Chenyaaang <[email protected]>

auto-merge was automatically disabled March 20, 2025 17:22
Head branch was pushed to by a user without write access

mergify bot removed the needs-rebase label Mar 20, 2025

initialize compilation config in vllmconfig

2bd994a

Signed-off-by: Chenyaaang <[email protected]>

Chenyaaang and others added 5 commits March 24, 2025 11:01

Merge branch 'vllm-project:main' into bucket_padding

d39a642

covert to enviornment variable

88c56b6

Signed-off-by: [email protected] <[email protected]>

nit

f34b27e

Signed-off-by: [email protected] <[email protected]>

rever

8d3a1e8

Signed-off-by: [email protected] <[email protected]>

update test

f7bdb02

Signed-off-by: Chenyaaang <[email protected]>

lsy323 reviewed Mar 24, 2025

View reviewed changes

update tpu/test_compilation to differentiate V0 and V1

7d92244

Signed-off-by: Chenyaaang <[email protected]>

Chenyaaang force-pushed the bucket_padding branch from a725d2b to 7d92244 Compare March 24, 2025 21:15

robertgshaw2-redhat and others added 3 commits March 24, 2025 21:17

updated

3f4f850

Signed-off-by: [email protected] <[email protected]>

Revert "update tpu/test_compilation to differentiate V0 and V1"

dcfd108

This reverts commit 7d92244. Signed-off-by: Chenyaaang <[email protected]>

Revert "update test"

7535bd9

This reverts commit f7bdb02. Signed-off-by: Chenyaaang <[email protected]>

Chenyaaang force-pushed the bucket_padding branch from 33a8bbc to 7535bd9 Compare March 24, 2025 22:15

Chenyaaang added 2 commits March 25, 2025 18:02

Merge remote-tracking branch 'origin/main' into bucket_padding

3e25f0b

Signed-off-by: Chenyaaang <[email protected]>

add unit test to the existing test_tpu_model_runner

ea00dca

Signed-off-by: Chenyaaang <[email protected]>

robertgshaw2-redhat enabled auto-merge (squash) March 25, 2025 21:25

robertgshaw2-redhat approved these changes Mar 25, 2025

View reviewed changes

robertgshaw2-redhat merged commit ac3cd6e into vllm-project:main Mar 25, 2025
33 checks passed

ckhordiasma mentioned this pull request Apr 17, 2025

[do not merge] pr test for nm changes into 2.20 red-hat-data-services/vllm#107

Closed

Chenyaaang mentioned this pull request Apr 23, 2025

Introduce PaddingConfig to combine GPU cudagraph_capture_sizes and TPU num_tokens_paddings #17081

Draft

Uh oh!

[core] add bucket padding to tpu_model_runner #14995

[core] add bucket padding to tpu_model_runner #14995

Uh oh!

Conversation

Chenyaaang commented Mar 18, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 18, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yaochengji commented Mar 18, 2025

Uh oh!

mergify bot commented Mar 18, 2025

Uh oh!

alexm-redhat left a comment

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 commented Mar 20, 2025

Uh oh!

NickLucche left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Mar 20, 2025

Uh oh!

robertgshaw2-redhat commented Mar 21, 2025

Uh oh!

alexm-redhat commented Mar 21, 2025

Uh oh!

Chenyaaang commented Mar 21, 2025

Uh oh!

lsy323 Mar 24, 2025

Choose a reason for hiding this comment

Uh oh!

Chenyaaang Mar 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Chenyaaang commented Mar 18, 2025 •

edited by github-actions bot

Loading