-
-
Notifications
You must be signed in to change notification settings - Fork 10.6k
[CI/Build] [TPU] Fix TPU CI exit code #18282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI/Build] [TPU] Fix TPU CI exit code #18282
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
9c2a526
to
8976aa4
Compare
I saw a lot of
These tests cannot run in parallel because two processes cannot use the TPU at the same time. |
819dc10
to
1f42686
Compare
Done. Changed to sequentially run tests. With the current set up, I see code-level errors.
The code level failure could be addressed in follow-up PRs.
|
1f42686
to
6f8edc8
Compare
For the error
tpu model runner v0 doesn't have |
afa4372
to
4dfd0a7
Compare
a5ee5ff
to
d66369f
Compare
Signed-off-by: Carol Zheng <[email protected]>
Signed-off-by: Carol Zheng <[email protected]>
Signed-off-by: Carol Zheng <[email protected]>
Signed-off-by: Carol Zheng <[email protected]>
Signed-off-by: Carol Zheng <[email protected]>
Signed-off-by: Carol Zheng <[email protected]>
Signed-off-by: Carol Zheng <[email protected]>
Signed-off-by: Carol Zheng <[email protected]>
Signed-off-by: Carol Zheng <[email protected]>
Signed-off-by: Carol Zheng <[email protected]>
ba6479d
to
bc9fc3c
Compare
There're 12 tests in total. For the 11th test, it printed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
The 12th test didn't because of 3 hours timeout. (BUILDKITE_TIMEOUT="180")
We can seperate these tests in a following PR.
Signed-off-by: Carol Zheng <[email protected]> Signed-off-by: amit <[email protected]>
Make TPU CI pipeline so that