[CI/Build] [TPU] Fix TPU CI exit code #18282

CAROLZXYZXY · 2025-05-16T23:03:57Z

Make TPU CI pipeline so that

Tests run sequentially because TPU can only one process at the same time.
If one of the tests fails, the command would exit with non-zero code.

github-actions · 2025-05-16T23:04:06Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

yaochengji · 2025-05-17T04:15:09Z

I saw a lot of

RuntimeError: Bad StatusOr access: UNKNOWN: TPU initialization failed: open(/dev/vfio/0): Device or resource busy: Device or resource busy; Couldn't open iommu group /dev/vfio/0

These tests cannot run in parallel because two processes cannot use the TPU at the same time.

CAROLZXYZXY · 2025-05-17T23:38:01Z

Done. Changed to sequentially run tests. With the current set up, I see code-level errors.

PASSEDWARNING 05-17 22:37:41 [parallel_state.py:1229] torch._C._host_emptyCache() only available in Pytorch >=2.5
--
  |  
  |  
  | =================================== FAILURES ===================================
  | ________________________ test_update_states_new_request ________________________
  |  
  | model_runner = <vllm.v1.worker.tpu_model_runner.TPUModelRunner object at 0x7bd274e80070>
  |  
  | def test_update_states_new_request(model_runner):
  | req_id = "req_0"
  |  
  | # new req
  | scheduler_output = _schedule_new_request(req_id)
  |  
  | >       model_runner._update_states(scheduler_output)
  |  
  | tests/v1/tpu/worker/test_tpu_model_runner.py:131:
  | _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
  |  
  | self = <vllm.v1.worker.tpu_model_runner.TPUModelRunner object at 0x7bd274e80070>
  | scheduler_output = SchedulerOutput(scheduled_new_reqs=[NewRequestData(req_id=req_0,prompt_token_ids=[1, 2, 3],mm_inputs=[],mm_hashes=[],m...s=set(), free_encoder_input_ids=[], structured_output_request_ids={}, grammar_bitmask=None, kv_connector_metadata=None)
  |  
  | def _update_states(self, scheduler_output: "SchedulerOutput") -> bool:
  | """Update the cached states and the persistent batch with the scheduler
  | output.
  |  
  | The updated states are used by the `_prepare_inputs` function to create
  | the input GPU tensors for the model.
  |  
  | Returns:
  | True if there is a new/resumed/paused/finished request.
  | If False, we can skip copying SamplingMetadata to the GPU.
  | """
  | # Remove finished requests from the cached states.
  | for req_id in scheduler_output.finished_req_ids:
  | self.requests.pop(req_id, None)
  | self.encoder_cache.pop(req_id, None)
  |  
  | # Remove the finished requests from the persistent batch.
  | # NOTE(woosuk): There could be an edge case where finished_req_ids and
  | # scheduled_req_ids overlap. This happens when a request is aborted and
  | # then resubmitted with the same ID. In this case, we treat them as two
  | # distinct requests - clearing the cached states for the first request
  | # and handling the second as a new request.
  | removed_req_indices: list[int] = []
  | for req_id in scheduler_output.finished_req_ids:
  | req_index = self.input_batch.remove_request(req_id)
  | if req_index is not None:
  | removed_req_indices.append(req_index)
  |  
  | # Free the cached encoder outputs.
  | for req_id, input_id in scheduler_output.free_encoder_input_ids:
  | encoder_outputs = self.encoder_cache.get(req_id)
  | if encoder_outputs is not None:
  | encoder_outputs.pop(input_id, None)
  | if not encoder_outputs:
  | self.encoder_cache.pop(req_id, None)
  |  
  | # Remove the unscheduled requests from the persistent batch.
  | # NOTE(woosuk): The unscheduled requests are either preempted requests
  | # or running requests that are not scheduled in this step. We remove
  | # them from the persistent batch but keep their cached states since
  | # they will be scheduled again sometime in the future.
  | scheduled_req_ids = scheduler_output.num_scheduled_tokens.keys()
  | >       cached_req_ids = self.input_batch.req_id_to_index.keys()
  | E       AttributeError: 'TPUModelRunner' object has no attribute 'input_batch'
  |  
  | vllm/v1/worker/tpu_model_runner.py:327: AttributeError
  | _____________________ test_update_states_request_finished ______________________
  |  
  | model_runner = <vllm.v1.worker.tpu_model_runner.TPUModelRunner object at 0x7bd274d48520>
  |  
  | def test_update_states_request_finished(model_runner):
  | req_id = "req_0"
  |  
  | # new req
  | scheduler_output = _schedule_new_request(req_id)
  |  
  | >       model_runner._update_states(scheduler_output)
  |  
  | tests/v1/tpu/worker/test_tpu_model_runner.py:144:
  | _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
  |  
  | self = <vllm.v1.worker.tpu_model_runner.TPUModelRunner object at 0x7bd274d48520>
  | scheduler_output = SchedulerOutput(scheduled_new_reqs=[NewRequestData(req_id=req_0,prompt_token_ids=[1, 2, 3],mm_inputs=[],mm_hashes=[],m...s=set(), free_encoder_input_ids=[], structured_output_request_ids={}, grammar_bitmask=None, kv_connector_metadata=None)
  |  
  | def _update_states(self, scheduler_output: "SchedulerOutput") -> bool:
  | """Update the cached states and the persistent batch with the scheduler
  | output.
  |  
  | The updated states are used by the `_prepare_inputs` function to create
  | the input GPU tensors for the model.
  |  
  | Returns:
  | True if there is a new/resumed/paused/finished request.
  | If False, we can skip copying SamplingMetadata to the GPU.
  | """
  | # Remove finished requests from the cached states.
  | for req_id in scheduler_output.finished_req_ids:
  | self.requests.pop(req_id, None)
  | self.encoder_cache.pop(req_id, None)
  |  
  | # Remove the finished requests from the persistent batch.
  | # NOTE(woosuk): There could be an edge case where finished_req_ids and
  | # scheduled_req_ids overlap. This happens when a request is aborted and
  | # then resubmitted with the same ID. In this case, we treat them as two
  | # distinct requests - clearing the cached states for the first request
  | # and handling the second as a new request.
  | removed_req_indices: list[int] = []
  | for req_id in scheduler_output.finished_req_ids:
  | req_index = self.input_batch.remove_request(req_id)
  | if req_index is not None:
  | removed_req_indices.append(req_index)
  |  
  | # Free the cached encoder outputs.
  | for req_id, input_id in scheduler_output.free_encoder_input_ids:
  | encoder_outputs = self.encoder_cache.get(req_id)
  | if encoder_outputs is not None:
  | encoder_outputs.pop(input_id, None)
  | if not encoder_outputs:
  | self.encoder_cache.pop(req_id, None)
  |  
  | # Remove the unscheduled requests from the persistent batch.
  | # NOTE(woosuk): The unscheduled requests are either preempted requests
  | # or running requests that are not scheduled in this step. We remove
  | # them from the persistent batch but keep their cached states since
  | # they will be scheduled again sometime in the future.
  | scheduled_req_ids = scheduler_output.num_scheduled_tokens.keys()
  | >       cached_req_ids = self.input_batch.req_id_to_index.keys()
  | E       AttributeError: 'TPUModelRunner' object has no attribute 'input_batch'
  |  
  | vllm/v1/worker/tpu_model_runner.py:327: AttributeError
  | ______________________ test_update_states_request_resumed ______________________
  |  
  | model_runner = <vllm.v1.worker.tpu_model_runner.TPUModelRunner object at 0x7bd274e92500>
  |  
  | def test_update_states_request_resumed(model_runner):
  | req_id = "req_0"
  |  
  | # new req
  | scheduler_output = _schedule_new_request(req_id)
  |  
  | >       model_runner._update_states(scheduler_output)
  |  
  | tests/v1/tpu/worker/test_tpu_model_runner.py:174:
  | _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
  |  
  | self = <vllm.v1.worker.tpu_model_runner.TPUModelRunner object at 0x7bd274e92500>
  | scheduler_output = SchedulerOutput(scheduled_new_reqs=[NewRequestData(req_id=req_0,prompt_token_ids=[1, 2, 3],mm_inputs=[],mm_hashes=[],m...s=set(), free_encoder_input_ids=[], structured_output_request_ids={}, grammar_bitmask=None, kv_connector_metadata=None)
  |  
  | def _update_states(self, scheduler_output: "SchedulerOutput") -> bool:
  | """Update the cached states and the persistent batch with the scheduler
  | output.
  |  
  | The updated states are used by the `_prepare_inputs` function to create
  | the input GPU tensors for the model.
  |  
  | Returns:
  | True if there is a new/resumed/paused/finished request.
  | If False, we can skip copying SamplingMetadata to the GPU.
  | """
  | # Remove finished requests from the cached states.
  | for req_id in scheduler_output.finished_req_ids:
  | self.requests.pop(req_id, None)
  | self.encoder_cache.pop(req_id, None)
  |  
  | # Remove the finished requests from the persistent batch.
  | # NOTE(woosuk): There could be an edge case where finished_req_ids and
  | # scheduled_req_ids overlap. This happens when a request is aborted and
  | # then resubmitted with the same ID. In this case, we treat them as two
  | # distinct requests - clearing the cached states for the first request
  | # and handling the second as a new request.
  | removed_req_indices: list[int] = []
  | for req_id in scheduler_output.finished_req_ids:
  | req_index = self.input_batch.remove_request(req_id)
  | if req_index is not None:
  | removed_req_indices.append(req_index)
  |  
  | # Free the cached encoder outputs.
  | for req_id, input_id in scheduler_output.free_encoder_input_ids:
  | encoder_outputs = self.encoder_cache.get(req_id)
  | if encoder_outputs is not None:
  | encoder_outputs.pop(input_id, None)
  | if not encoder_outputs:
  | self.encoder_cache.pop(req_id, None)
  |  
  | # Remove the unscheduled requests from the persistent batch.
  | # NOTE(woosuk): The unscheduled requests are either preempted requests
  | # or running requests that are not scheduled in this step. We remove
  | # them from the persistent batch but keep their cached states since
  | # they will be scheduled again sometime in the future.
  | scheduled_req_ids = scheduler_output.num_scheduled_tokens.keys()
  | >       cached_req_ids = self.input_batch.req_id_to_index.keys()
  | E       AttributeError: 'TPUModelRunner' object has no attribute 'input_batch'
  |  
  | vllm/v1/worker/tpu_model_runner.py:327: AttributeError

The code level failure could be addressed in follow-up PRs.

I saw a lot of
RuntimeError: Bad StatusOr access: UNKNOWN: TPU initialization failed: open(/dev/vfio/0): Device or resource busy: Device or resource busy; Couldn't open iommu group /dev/vfio/0
These tests cannot run in parallel because two processes cannot use the TPU at the same time.

.buildkite/scripts/hardware_ci/run-tpu-v1-test.sh

yaochengji · 2025-05-19T20:23:00Z

For the error

AttributeError: 'TPUModelRunner' object has no attribute 'input_batch'

tpu model runner v0 doesn't have input_batch, did you happen to use v0 not v1?

.buildkite/scripts/hardware_ci/run-tpu-v1-test.sh

Signed-off-by: Carol Zheng <[email protected]>

yaochengji · 2025-05-27T21:36:11Z

There're 12 tests in total.

For the 11th test, it printed # Received cancellation signal, interrupting. And the 12th test didn't run. Is it intended?

yaochengji

LGTM, thanks!

The 12th test didn't because of 3 hours timeout. (BUILDKITE_TIMEOUT="180")
We can seperate these tests in a following PR.

Signed-off-by: Carol Zheng <[email protected]> Signed-off-by: amit <[email protected]>

mergify bot added the ci/build label May 16, 2025

yaochengji added the ready ONLY add when PR is ready to merge/full CI is needed label May 17, 2025

CAROLZXYZXY force-pushed the cazheng/fix-tpu-ci branch from 9c2a526 to 8976aa4 Compare May 17, 2025 01:00

CAROLZXYZXY force-pushed the cazheng/fix-tpu-ci branch 2 times, most recently from 819dc10 to 1f42686 Compare May 17, 2025 21:57

CAROLZXYZXY force-pushed the cazheng/fix-tpu-ci branch from 1f42686 to 6f8edc8 Compare May 18, 2025 18:19

NickLucche requested changes May 19, 2025

View reviewed changes

.buildkite/scripts/hardware_ci/run-tpu-v1-test.sh Outdated Show resolved Hide resolved

CAROLZXYZXY force-pushed the cazheng/fix-tpu-ci branch 5 times, most recently from afa4372 to 4dfd0a7 Compare May 23, 2025 00:45

yaochengji reviewed May 23, 2025

View reviewed changes

.buildkite/scripts/hardware_ci/run-tpu-v1-test.sh Outdated Show resolved Hide resolved

CAROLZXYZXY force-pushed the cazheng/fix-tpu-ci branch 3 times, most recently from a5ee5ff to d66369f Compare May 27, 2025 17:51

CAROLZXYZXY added 10 commits May 27, 2025 18:13

fix tpu ci

5abb239

Signed-off-by: Carol Zheng <[email protected]>

run tests in parallel

48d4dd2

Signed-off-by: Carol Zheng <[email protected]>

Fix test path

234d0f3

Signed-off-by: Carol Zheng <[email protected]>

fix test6 path.

6a59c0f

Signed-off-by: Carol Zheng <[email protected]>

add back -t

b1e36cf

Signed-off-by: Carol Zheng <[email protected]>

Explicitly set vllm v1

cea660e

Signed-off-by: Carol Zheng <[email protected]>

disable xla cachae

0c8a7b4

Signed-off-by: Carol Zheng <[email protected]>

Add new line

d01e82d

Signed-off-by: Carol Zheng <[email protected]>

Enable running all tests

7d1aba7

Signed-off-by: Carol Zheng <[email protected]>

Structure tests

bc9fc3c

Signed-off-by: Carol Zheng <[email protected]>

CAROLZXYZXY force-pushed the cazheng/fix-tpu-ci branch from ba6479d to bc9fc3c Compare May 27, 2025 18:13

yaochengji approved these changes May 27, 2025

View reviewed changes

yaochengji merged commit b48d5cc into vllm-project:main May 27, 2025
43 checks passed

amitm02 pushed a commit to amitm02/vllm that referenced this pull request Jun 1, 2025

[CI/Build] [TPU] Fix TPU CI exit code (vllm-project#18282)

8f5f094

Signed-off-by: Carol Zheng <[email protected]> Signed-off-by: amit <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[CI/Build] [TPU] Fix TPU CI exit code #18282

[CI/Build] [TPU] Fix TPU CI exit code #18282

Uh oh!

CAROLZXYZXY commented May 16, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented May 16, 2025

Uh oh!

yaochengji commented May 17, 2025

Uh oh!

CAROLZXYZXY commented May 17, 2025

Uh oh!

Uh oh!

yaochengji commented May 19, 2025

Uh oh!

Uh oh!

yaochengji commented May 27, 2025

Uh oh!

yaochengji left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

[CI/Build] [TPU] Fix TPU CI exit code #18282

[CI/Build] [TPU] Fix TPU CI exit code #18282

Uh oh!

Conversation

CAROLZXYZXY commented May 16, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented May 16, 2025

Uh oh!

yaochengji commented May 17, 2025

Uh oh!

CAROLZXYZXY commented May 17, 2025

Uh oh!

Uh oh!

yaochengji commented May 19, 2025

Uh oh!

Uh oh!

yaochengji commented May 27, 2025

Uh oh!

yaochengji left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

CAROLZXYZXY commented May 16, 2025 •

edited by github-actions bot

Loading