[TPU][V1] Fix exponential padding when max-num-batched-tokens
is not a power of 2
#16596
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Currently if passing a non power of 2
max-num-batched-tokens
will trigger the assertvllm/vllm/v1/worker/tpu_model_runner.py
Line 1067 in 1dd2338
This is because we're not padding for next bigger multiple of 2 power.
Eg.