Skip to content

inference error with vllm 0.8.5 #4063

Closed
@zhwk031554

Description

@zhwk031554

Describe the bug
What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图)

When infer with vllm V1 engine v0.8.5, it runs into outputs_queue error below,

VLLM_USE_V1=1 VLLM_WORKER_MULTIPROC_METHOD=spawn

from swift.llm import InferArguments
from swift.llm import InferRequest
from swift.llm.infer.infer import SwiftInfer
from swift.llm.infer.infer_engine import InferEngine

infer_args = InferArguments(
        ckpt_dir="<ckpt_path>",
        model="Qwen/Qwen2-VL-7B-Instruct",
        infer_backend="vllm",
        merge_lora=True,
    )

swift_infer = SwiftInfer(infer_args)

infer_requests = [
    InferRequest(
        messages=[
            {
                'role': 'user',
                'content': [
                    {
                        "type": "text",
                        "text": "Describe this image.",
                    },
                    {
                        "type": "image",
                        "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
                    },
                ],
            }
        ]
    )
]

infer_kwargs = {**swift_infer.infer_kwargs}

infer_responses = InferEngine.infer(
    swift_infer.infer_engine,
    infer_requests=infer_requests,
    request_config=infer_args.get_request_config(),
    template=swift_infer.template,
    use_tqdm=False,
    **infer_kwargs,
)

Error message:

Traceback (most recent call last):
  File ".../vllm/v1/engine/async_llm.py", line 306, in generate
    out = q.get_nowait() or await q.get()
                            ^^^^^^^^^^^^^
  File ".../vllm/v1/engine/output_processor.py", line 51, in get
    raise output
  File ".../vllm/v1/engine/async_llm.py", line 357, in output_handler
    outputs = await engine_core.get_output_async()
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../vllm/v1/engine/core_client.py", line 713, in get_output_async
    assert self.outputs_queue is not None
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError

    infer_responses = InferEngine.infer(
                      ^^^^^^^^^^^^^^^^^^
  File ".../swift/llm/infer/infer_engine/infer_engine.py", line 172, in infer
    res = self._batch_infer_stream(tasks_samples, False, use_tqdm, metrics)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../swift/llm/infer/infer_engine/vllm_engine.py", line 401, in _batch_infer_stream
    return super()._batch_infer_stream(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../swift/llm/infer/infer_engine/infer_engine.py", line 115, in _batch_infer_stream
    return self.safe_asyncio_run(self.batch_run(new_tasks))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../swift/llm/infer/infer_engine/infer_engine.py", line 276, in safe_asyncio_run
    return InferEngine.thread_run(asyncio.run, args=(coro, ))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../swift/llm/infer/infer_engine/infer_engine.py", line 271, in thread_run
    raise result
  File ".../swift/llm/infer/infer_engine/infer_engine.py", line 261, in func
    queue.put(target(*args, **kwargs))
              ^^^^^^^^^^^^^^^^^^^^^^^
  File ".../lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File ".../lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../lib/python3.11/asyncio/base_events.py", line 654, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File ".../swift/llm/infer/infer_engine/infer_engine.py", line 88, in batch_run
    return await asyncio.gather(*tasks)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../swift/llm/infer/infer_engine/infer_engine.py", line 105, in _new_run
    res = await task
          ^^^^^^^^^^
  File ".../swift/llm/infer/infer_engine/vllm_engine.py", line 489, in infer_async
    return await self._infer_full_async(**kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../swift/llm/infer/infer_engine/vllm_engine.py", line 389, in _infer_full_async
    async for result in result_generator:
  File ".../vllm/v1/engine/async_llm.py", line 338, in generate
    raise EngineGenerateError() from e
vllm.v1.engine.exceptions.EngineGenerateError

Your hardware and system info
Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息,如CUDA版本,系统,GPU型号和torch版本等)

CUDA Version: 12.4 / NVIDIA L40S / torch 2.6.0 / vllm 0.8.5

Additional context
Add any other context about the problem here(在这里补充其他信息)

It may be related to a recent change in vllm:
https://github.com/modelscope/ms-swift/blob/v3.4.0/swift/llm/infer/infer_engine/vllm_engine.py#L398
vllm-project/vllm@2b05b8c

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions