You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Traceback (most recent call last):
File ".../vllm/v1/engine/async_llm.py", line 306, in generate
out = q.get_nowait() or await q.get()
^^^^^^^^^^^^^
File ".../vllm/v1/engine/output_processor.py", line 51, in get
raise output
File ".../vllm/v1/engine/async_llm.py", line 357, in output_handler
outputs = await engine_core.get_output_async()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../vllm/v1/engine/core_client.py", line 713, in get_output_async
assert self.outputs_queue is not None
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError
infer_responses = InferEngine.infer(
^^^^^^^^^^^^^^^^^^
File ".../swift/llm/infer/infer_engine/infer_engine.py", line 172, in infer
res = self._batch_infer_stream(tasks_samples, False, use_tqdm, metrics)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../swift/llm/infer/infer_engine/vllm_engine.py", line 401, in _batch_infer_stream
return super()._batch_infer_stream(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../swift/llm/infer/infer_engine/infer_engine.py", line 115, in _batch_infer_stream
return self.safe_asyncio_run(self.batch_run(new_tasks))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../swift/llm/infer/infer_engine/infer_engine.py", line 276, in safe_asyncio_run
return InferEngine.thread_run(asyncio.run, args=(coro, ))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../swift/llm/infer/infer_engine/infer_engine.py", line 271, in thread_run
raise result
File ".../swift/llm/infer/infer_engine/infer_engine.py", line 261, in func
queue.put(target(*args, **kwargs))
^^^^^^^^^^^^^^^^^^^^^^^
File ".../lib/python3.11/asyncio/runners.py", line 190, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File ".../lib/python3.11/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../lib/python3.11/asyncio/base_events.py", line 654, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File ".../swift/llm/infer/infer_engine/infer_engine.py", line 88, in batch_run
return await asyncio.gather(*tasks)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../swift/llm/infer/infer_engine/infer_engine.py", line 105, in _new_run
res = await task
^^^^^^^^^^
File ".../swift/llm/infer/infer_engine/vllm_engine.py", line 489, in infer_async
return await self._infer_full_async(**kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../swift/llm/infer/infer_engine/vllm_engine.py", line 389, in _infer_full_async
async for result in result_generator:
File ".../vllm/v1/engine/async_llm.py", line 338, in generate
raise EngineGenerateError() from e
vllm.v1.engine.exceptions.EngineGenerateError
Your hardware and system info
Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息,如CUDA版本,系统,GPU型号和torch版本等)
Describe the bug
What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图)
When infer with vllm V1 engine v0.8.5, it runs into outputs_queue error below,
VLLM_USE_V1=1 VLLM_WORKER_MULTIPROC_METHOD=spawn
Error message:
Your hardware and system info
Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息,如CUDA版本,系统,GPU型号和torch版本等)
CUDA Version: 12.4 / NVIDIA L40S / torch 2.6.0 / vllm 0.8.5
Additional context
Add any other context about the problem here(在这里补充其他信息)
It may be related to a recent change in vllm:
https://github.com/modelscope/ms-swift/blob/v3.4.0/swift/llm/infer/infer_engine/vllm_engine.py#L398
vllm-project/vllm@2b05b8c
The text was updated successfully, but these errors were encountered: