Closed
Description
Describe the bug
What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图)
When infer with vllm V1 engine v0.8.5, it runs into outputs_queue error below,
VLLM_USE_V1=1 VLLM_WORKER_MULTIPROC_METHOD=spawn
from swift.llm import InferArguments
from swift.llm import InferRequest
from swift.llm.infer.infer import SwiftInfer
from swift.llm.infer.infer_engine import InferEngine
infer_args = InferArguments(
ckpt_dir="<ckpt_path>",
model="Qwen/Qwen2-VL-7B-Instruct",
infer_backend="vllm",
merge_lora=True,
)
swift_infer = SwiftInfer(infer_args)
infer_requests = [
InferRequest(
messages=[
{
'role': 'user',
'content': [
{
"type": "text",
"text": "Describe this image.",
},
{
"type": "image",
"image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
},
],
}
]
)
]
infer_kwargs = {**swift_infer.infer_kwargs}
infer_responses = InferEngine.infer(
swift_infer.infer_engine,
infer_requests=infer_requests,
request_config=infer_args.get_request_config(),
template=swift_infer.template,
use_tqdm=False,
**infer_kwargs,
)
Error message:
Traceback (most recent call last):
File ".../vllm/v1/engine/async_llm.py", line 306, in generate
out = q.get_nowait() or await q.get()
^^^^^^^^^^^^^
File ".../vllm/v1/engine/output_processor.py", line 51, in get
raise output
File ".../vllm/v1/engine/async_llm.py", line 357, in output_handler
outputs = await engine_core.get_output_async()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../vllm/v1/engine/core_client.py", line 713, in get_output_async
assert self.outputs_queue is not None
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError
infer_responses = InferEngine.infer(
^^^^^^^^^^^^^^^^^^
File ".../swift/llm/infer/infer_engine/infer_engine.py", line 172, in infer
res = self._batch_infer_stream(tasks_samples, False, use_tqdm, metrics)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../swift/llm/infer/infer_engine/vllm_engine.py", line 401, in _batch_infer_stream
return super()._batch_infer_stream(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../swift/llm/infer/infer_engine/infer_engine.py", line 115, in _batch_infer_stream
return self.safe_asyncio_run(self.batch_run(new_tasks))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../swift/llm/infer/infer_engine/infer_engine.py", line 276, in safe_asyncio_run
return InferEngine.thread_run(asyncio.run, args=(coro, ))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../swift/llm/infer/infer_engine/infer_engine.py", line 271, in thread_run
raise result
File ".../swift/llm/infer/infer_engine/infer_engine.py", line 261, in func
queue.put(target(*args, **kwargs))
^^^^^^^^^^^^^^^^^^^^^^^
File ".../lib/python3.11/asyncio/runners.py", line 190, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File ".../lib/python3.11/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../lib/python3.11/asyncio/base_events.py", line 654, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File ".../swift/llm/infer/infer_engine/infer_engine.py", line 88, in batch_run
return await asyncio.gather(*tasks)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../swift/llm/infer/infer_engine/infer_engine.py", line 105, in _new_run
res = await task
^^^^^^^^^^
File ".../swift/llm/infer/infer_engine/vllm_engine.py", line 489, in infer_async
return await self._infer_full_async(**kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../swift/llm/infer/infer_engine/vllm_engine.py", line 389, in _infer_full_async
async for result in result_generator:
File ".../vllm/v1/engine/async_llm.py", line 338, in generate
raise EngineGenerateError() from e
vllm.v1.engine.exceptions.EngineGenerateError
Your hardware and system info
Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息,如CUDA版本,系统,GPU型号和torch版本等)
CUDA Version: 12.4 / NVIDIA L40S / torch 2.6.0 / vllm 0.8.5
Additional context
Add any other context about the problem here(在这里补充其他信息)
It may be related to a recent change in vllm:
https://github.com/modelscope/ms-swift/blob/v3.4.0/swift/llm/infer/infer_engine/vllm_engine.py#L398
vllm-project/vllm@2b05b8c