Skip to content

Conversation

R3hankhan123
Copy link
Contributor

@R3hankhan123 R3hankhan123 commented Sep 1, 2025

Purpose

Fix outlines-core for s390x

Test Plan

Inferencing is working as expected, tested locally.

Test Result

(APIServer pid=1) INFO 09-01 09:12:19 [launcher.py:44] Route: /v1/score, Methods: POST
(APIServer pid=1) INFO 09-01 09:12:19 [launcher.py:44] Route: /v1/audio/transcriptions, Methods: POST
(APIServer pid=1) INFO 09-01 09:12:19 [launcher.py:44] Route: /v1/audio/translations, Methods: POST
(APIServer pid=1) INFO 09-01 09:12:19 [launcher.py:44] Route: /rerank, Methods: POST
(APIServer pid=1) INFO 09-01 09:12:19 [launcher.py:44] Route: /v1/rerank, Methods: POST
(APIServer pid=1) INFO 09-01 09:12:19 [launcher.py:44] Route: /v2/rerank, Methods: POST
(APIServer pid=1) INFO 09-01 09:12:19 [launcher.py:44] Route: /scale_elastic_ep, Methods: POST
(APIServer pid=1) INFO 09-01 09:12:19 [launcher.py:44] Route: /is_scaling_elastic_ep, Methods: POST
(APIServer pid=1) INFO 09-01 09:12:19 [launcher.py:44] Route: /invocations, Methods: POST
(APIServer pid=1) INFO 09-01 09:12:19 [launcher.py:44] Route: /metrics, Methods: GET
(APIServer pid=1) INFO:     Started server process [1]
(APIServer pid=1) INFO:     Waiting for application startup.
(APIServer pid=1) INFO:     Application startup complete.
(EngineCore_0 pid=21) WARNING 09-01 09:14:07 [cudagraph_dispatcher.py:102] cudagraph dispatching keys are not initialized. No cudagraph will be used.
(APIServer pid=1) INFO 09-01 09:14:09 [loggers.py:123] Engine 000: Avg prompt throughput: 0.4 tokens/s, Avg generation throughput: 4.7 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.1%, Prefix cache hit rate: 0.0%
(APIServer pid=1) INFO:     172.17.0.1:55114 - "POST /v1/completions HTTP/1.1" 200 OK
(APIServer pid=1) INFO 09-01 09:14:19 [loggers.py:123] Engine 000: Avg prompt throughput: 0.4 tokens/s, Avg generation throughput: 5.1 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.1%, Prefix cache hit rate: 0.0%
(APIServer pid=1) INFO:     172.17.0.1:36300 - "POST /v1/completions HTTP/1.1" 200 OK
(APIServer pid=1) INFO 09-01 09:14:29 [loggers.py:123] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.2 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
(APIServer pid=1) INFO 09-01 09:14:39 [loggers.py:123] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
[root@b314lp81 ~]# curl http://localhost:8000/v1/completions   -H "Content-Type: application/json"   -d '{
    "model": "gpt2",
    "prompt": "Once upon a time",
    "max_tokens": 50
  }' | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   741  100   660  100    81    242     29  0:00:02  0:00:02 --:--:--   272
{
  "id": "cmpl-ab9c7302a9054f4ba037cce9033b55bd",
  "object": "text_completion",
  "created": 1756718057,
  "model": "gpt2",
  "choices": [
    {
      "index": 0,
      "text": ", there being only a single ray of light in this world, there will be four sides, and the last one will be divided into three sets (standards) by Fire and Light. If any records of his blade enter the valley below the others",
      "logprobs": null,
      "finish_reason": "length",
      "stop_reason": null,
      "token_ids": null,
      "prompt_logprobs": null,
      "prompt_token_ids": null
    }
  ],
  "service_tier": null,
  "system_fingerprint": null,
  "usage": {
    "prompt_tokens": 4,
    "total_tokens": 54,
    "completion_tokens": 50,
    "prompt_tokens_details": null
  },
  "kv_transfer_params": null
}

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@mergify mergify bot added the ci/build label Sep 1, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to fix outlines-core for the s390x architecture by building it from source with necessary patches. The overall approach is sound, but the implementation uses sed commands with hardcoded line numbers and string replacements to patch dependencies in the Dockerfile. This is a fragile practice that can easily break with upstream changes, making the build process difficult to maintain. My review includes suggestions to replace these sed commands with more robust patch files to improve the long-term stability of the build.

Comment on lines +219 to +221
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using sed with hardcoded line numbers to patch source files is highly fragile. If the upstream file aws-lc/crypto/pem/pem_lib.c is modified, these line numbers may become incorrect, which could cause the build to fail or apply the patch to the wrong lines. A more robust and maintainable approach is to use a patch file.

I recommend creating a patch file (e.g., aws-lc-s390x.patch) with the required changes and applying it using git apply or patch. This would make the build process more resilient to upstream changes.

Comment on lines +237 to +239
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using sed and echo to modify Cargo.toml is fragile and can lead to build failures.

  • The sed command on line 237 is sensitive to the exact formatting of the version string.
  • The echo commands on lines 238-239 could create a duplicate [patch.crates-io] section if one already exists, resulting in an invalid TOML file.

A more robust solution is to use a patch file for these modifications. This ensures the changes are applied atomically and are not dependent on the exact formatting of the original file.

@R3hankhan123
Copy link
Contributor Author

#22725 (comment)
@youkaichao @JaheimLee This is the PR to fix the outlines core issue

Copy link

github-actions bot commented Sep 1, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

Copy link
Contributor

@dilipgb dilipgb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@JaheimLee
Copy link

@youkaichao Hi, could you merge this pr

@dilipgb
Copy link
Contributor

dilipgb commented Sep 5, 2025

@JaheimLee @youkaichao can you please help adding the ready label so all CI jobs starts running.

@JaheimLee
Copy link

@JaheimLee @youkaichao can you please help adding the ready label so all CI jobs starts running.

I don't have access rights. Maybe you need to add other reviewers who have bandwidth.

@simon-mo simon-mo merged commit e10fef0 into vllm-project:main Sep 8, 2025
12 checks passed
@R3hankhan123 R3hankhan123 deleted the s390x-fix branch September 9, 2025 05:05
eicherseiji pushed a commit to eicherseiji/vllm that referenced this pull request Sep 9, 2025
skyloevil pushed a commit to skyloevil/vllm that referenced this pull request Sep 13, 2025
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants