Skip to content

inference error with vllm 0.8.5 #4063

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
zhwk031554 opened this issue May 1, 2025 · 0 comments
Open

inference error with vllm 0.8.5 #4063

zhwk031554 opened this issue May 1, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@zhwk031554
Copy link

Describe the bug
What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图)

When infer with vllm V1 engine v0.8.5, it runs into outputs_queue error below,

VLLM_USE_V1=1 VLLM_WORKER_MULTIPROC_METHOD=spawn

from swift.llm import InferArguments
from swift.llm import InferRequest
from swift.llm.infer.infer import SwiftInfer
from swift.llm.infer.infer_engine import InferEngine

infer_args = InferArguments(
        ckpt_dir="<ckpt_path>",
        model="Qwen/Qwen2-VL-7B-Instruct",
        infer_backend="vllm",
        merge_lora=True,
    )

swift_infer = SwiftInfer(infer_args)

infer_requests = [
    InferRequest(
        messages=[
            {
                'role': 'user',
                'content': [
                    {
                        "type": "text",
                        "text": "Describe this image.",
                    },
                    {
                        "type": "image",
                        "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
                    },
                ],
            }
        ]
    )
]

infer_kwargs = {**swift_infer.infer_kwargs}

infer_responses = InferEngine.infer(
    swift_infer.infer_engine,
    infer_requests=infer_requests,
    request_config=infer_args.get_request_config(),
    template=swift_infer.template,
    use_tqdm=False,
    **infer_kwargs,
)

Error message:

Traceback (most recent call last):
  File ".../vllm/v1/engine/async_llm.py", line 306, in generate
    out = q.get_nowait() or await q.get()
                            ^^^^^^^^^^^^^
  File ".../vllm/v1/engine/output_processor.py", line 51, in get
    raise output
  File ".../vllm/v1/engine/async_llm.py", line 357, in output_handler
    outputs = await engine_core.get_output_async()
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../vllm/v1/engine/core_client.py", line 713, in get_output_async
    assert self.outputs_queue is not None
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError

    infer_responses = InferEngine.infer(
                      ^^^^^^^^^^^^^^^^^^
  File ".../swift/llm/infer/infer_engine/infer_engine.py", line 172, in infer
    res = self._batch_infer_stream(tasks_samples, False, use_tqdm, metrics)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../swift/llm/infer/infer_engine/vllm_engine.py", line 401, in _batch_infer_stream
    return super()._batch_infer_stream(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../swift/llm/infer/infer_engine/infer_engine.py", line 115, in _batch_infer_stream
    return self.safe_asyncio_run(self.batch_run(new_tasks))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../swift/llm/infer/infer_engine/infer_engine.py", line 276, in safe_asyncio_run
    return InferEngine.thread_run(asyncio.run, args=(coro, ))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../swift/llm/infer/infer_engine/infer_engine.py", line 271, in thread_run
    raise result
  File ".../swift/llm/infer/infer_engine/infer_engine.py", line 261, in func
    queue.put(target(*args, **kwargs))
              ^^^^^^^^^^^^^^^^^^^^^^^
  File ".../lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File ".../lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../lib/python3.11/asyncio/base_events.py", line 654, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File ".../swift/llm/infer/infer_engine/infer_engine.py", line 88, in batch_run
    return await asyncio.gather(*tasks)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../swift/llm/infer/infer_engine/infer_engine.py", line 105, in _new_run
    res = await task
          ^^^^^^^^^^
  File ".../swift/llm/infer/infer_engine/vllm_engine.py", line 489, in infer_async
    return await self._infer_full_async(**kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../swift/llm/infer/infer_engine/vllm_engine.py", line 389, in _infer_full_async
    async for result in result_generator:
  File ".../vllm/v1/engine/async_llm.py", line 338, in generate
    raise EngineGenerateError() from e
vllm.v1.engine.exceptions.EngineGenerateError

Your hardware and system info
Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息,如CUDA版本,系统,GPU型号和torch版本等)

CUDA Version: 12.4 / NVIDIA L40S / torch 2.6.0 / vllm 0.8.5

Additional context
Add any other context about the problem here(在这里补充其他信息)

It may be related to a recent change in vllm:
https://github.com/modelscope/ms-swift/blob/v3.4.0/swift/llm/infer/infer_engine/vllm_engine.py#L398
vllm-project/vllm@2b05b8c

@Jintao-Huang Jintao-Huang added the bug Something isn't working label May 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants