Skip to content

Conversation

petersalas
Copy link
Contributor

@petersalas petersalas commented Sep 2, 2025

Purpose

When Ultravox wraps a multi-modal model (e.g. Gemma) vLLM fails to load because the UltravoxModel.text_config is the multi-modal model's config. This changes UltravoxConfig.text_config to point to the wrapped text config instead. (However, we still instantiate the wrapped multi-modal model in its entirety when using init_vllm_registered_model.)

Additionally, support replacing the text model with a quantized variant by overriding text_model_id.

Test Plan

Confirm that Llama/Gemma/Qwen Ultravox models are able to be loaded in vLLM, and confirm that quantized variants can be loaded as well.

vllm serve fixie-ai/ultravox-v0_6-gemma-3-27b --trust-remote-code
vllm serve fixie-ai/ultravox-v0_5-llama-3_1-8b --trust-remote-code --hf-overrides.text_model_id=nvidia/Llama-3.1-8B-Instruct-FP8

Test Result

The models load.

Confirmed


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the configuration handling for Ultravox models to correctly support wrapping multi-modal models like Gemma. The changes introduce a wrapped_model_config to store the full configuration of the wrapped model, while text_config now correctly points to the inner text model's configuration. This is a good clarification that should improve robustness. I've found one potential issue with hardcoded trust_remote_code that could prevent loading of certain custom models.

Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, can you fix the pre-commit errors?

@DarkLight1337 DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 9, 2025
@DarkLight1337 DarkLight1337 enabled auto-merge (squash) September 9, 2025 03:32
@petersalas
Copy link
Contributor Author

@DarkLight1337 actually, since I'm going to merge main anyway I'm going to fold in an additional fix for supporting quantized models/overriding text_model_id via --hf-overrides -- so please hold off merging for now :)

@petersalas petersalas force-pushed the psalas/inner-text-config branch from 15caeaf to 7331c69 Compare September 10, 2025 19:12
@petersalas petersalas changed the title [Ultravox] Fix gemma instantiation [Ultravox] Fix Gemma instantiation, support quantization via --hf-overrides Sep 10, 2025
Signed-off-by: Peter Salas <[email protected]>
Signed-off-by: Peter Salas <[email protected]>
@DarkLight1337
Copy link
Member

Can I merge this now?

@petersalas
Copy link
Contributor Author

Can I merge this now?

Fine with me!

@vllm-bot vllm-bot merged commit f17a6aa into vllm-project:main Sep 11, 2025
39 of 41 checks passed
@canercan7
Copy link

(EngineCore_DP0 pid=115978) INFO 09-11 14:57:14 [gpu_model_runner.py:2338] Starting to load model fixie-ai/ultravox-v0_6-gemma-3-27b...
(EngineCore_DP0 pid=115978) INFO 09-11 14:57:14 [gpu_model_runner.py:2370] Loading model from scratch...
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] EngineCore failed to start.
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] Traceback (most recent call last):
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/v1/engine/core.py", line 709, in run_engine_core
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/v1/engine/core.py", line 505, in init
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] super().init(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/v1/engine/core.py", line 82, in init
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/executor/executor_base.py", line 54, in init
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] self._init_executor()
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/executor/uniproc_executor.py", line 49, in _init_executor
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] self.collective_rpc("load_model")
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/executor/uniproc_executor.py", line 58, in collective_rpc
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] answer = run_method(self.driver_worker, method, args, kwargs)
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/utils/init.py", line 3057, in run_method
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] return func(*args, **kwargs)
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/v1/worker/gpu_worker.py", line 213, in load_model
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] self.model_runner.load_model(eep_scale_up=eep_scale_up)
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/v1/worker/gpu_model_runner.py", line 2371, in load_model
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] self.model = model_loader.load_model(
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/model_executor/model_loader/base_loader.py", line 45, in load_model
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] model = initialize_model(vllm_config=vllm_config,
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/model_executor/model_loader/utils.py", line 63, in initialize_model
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] return model_class(vllm_config=vllm_config, prefix=prefix)
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/model_executor/models/ultravox.py", line 439, in init
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] self.language_model = init_vllm_registered_model(
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/model_executor/models/utils.py", line 316, in init_vllm_registered_model
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] return initialize_model(vllm_config=vllm_config, prefix=prefix)
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/model_executor/model_loader/utils.py", line 51, in initialize_model
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] model_class, _ = get_model_architecture(model_config)
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/model_executor/model_loader/utils.py", line 172, in get_model_architecture
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] model_cls, arch = model_config.registry.resolve_model_cls(
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/model_executor/models/registry.py", line 687, in resolve_model_cls
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] raise ValueError("No model architectures are specified")
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] ValueError: No model architectures are specified
(EngineCore_DP0 pid=115978) Process EngineCore_DP0:
(EngineCore_DP0 pid=115978) Traceback (most recent call last):
(EngineCore_DP0 pid=115978) File "/root/miniconda3/envs/py3.10/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP0 pid=115978) self.run()
(EngineCore_DP0 pid=115978) File "/root/miniconda3/envs/py3.10/lib/python3.10/multiprocessing/process.py", line 108, in run
(EngineCore_DP0 pid=115978) self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=115978) File "/home/vllm/vllm/v1/engine/core.py", line 722, in run_engine_core
(EngineCore_DP0 pid=115978) raise e
(EngineCore_DP0 pid=115978) File "/home/vllm/vllm/v1/engine/core.py", line 709, in run_engine_core
(EngineCore_DP0 pid=115978) engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=115978) File "/home/vllm/vllm/v1/engine/core.py", line 505, in init
(EngineCore_DP0 pid=115978) super().init(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=115978) File "/home/vllm/vllm/v1/engine/core.py", line 82, in init
(EngineCore_DP0 pid=115978) self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=115978) File "/home/vllm/vllm/executor/executor_base.py", line 54, in init
(EngineCore_DP0 pid=115978) self._init_executor()
(EngineCore_DP0 pid=115978) File "/home/vllm/vllm/executor/uniproc_executor.py", line 49, in _init_executor
(EngineCore_DP0 pid=115978) self.collective_rpc("load_model")
(EngineCore_DP0 pid=115978) File "/home/vllm/vllm/executor/uniproc_executor.py", line 58, in collective_rpc
(EngineCore_DP0 pid=115978) answer = run_method(self.driver_worker, method, args, kwargs)
(EngineCore_DP0 pid=115978) File "/home/vllm/vllm/utils/init.py", line 3057, in run_method
(EngineCore_DP0 pid=115978) return func(*args, **kwargs)
(EngineCore_DP0 pid=115978) File "/home/vllm/vllm/v1/worker/gpu_worker.py", line 213, in load_model
(EngineCore_DP0 pid=115978) self.model_runner.load_model(eep_scale_up=eep_scale_up)
(EngineCore_DP0 pid=115978) File "/home/vllm/vllm/v1/worker/gpu_model_runner.py", line 2371, in load_model
(EngineCore_DP0 pid=115978) self.model = model_loader.load_model(
(EngineCore_DP0 pid=115978) File "/home/vllm/vllm/model_executor/model_loader/base_loader.py", line 45, in load_model
(EngineCore_DP0 pid=115978) model = initialize_model(vllm_config=vllm_config,
(EngineCore_DP0 pid=115978) File "/home/vllm/vllm/model_executor/model_loader/utils.py", line 63, in initialize_model
(EngineCore_DP0 pid=115978) return model_class(vllm_config=vllm_config, prefix=prefix)
(EngineCore_DP0 pid=115978) File "/home/vllm/vllm/model_executor/models/ultravox.py", line 439, in init
(EngineCore_DP0 pid=115978) self.language_model = init_vllm_registered_model(
(EngineCore_DP0 pid=115978) File "/home/vllm/vllm/model_executor/models/utils.py", line 316, in init_vllm_registered_model
(EngineCore_DP0 pid=115978) return initialize_model(vllm_config=vllm_config, prefix=prefix)
(EngineCore_DP0 pid=115978) File "/home/vllm/vllm/model_executor/model_loader/utils.py", line 51, in initialize_model
(EngineCore_DP0 pid=115978) model_class, _ = get_model_architecture(model_config)
(EngineCore_DP0 pid=115978) File "/home/vllm/vllm/model_executor/model_loader/utils.py", line 172, in get_model_architecture
(EngineCore_DP0 pid=115978) model_cls, arch = model_config.registry.resolve_model_cls(
(EngineCore_DP0 pid=115978) File "/home/vllm/vllm/model_executor/models/registry.py", line 687, in resolve_model_cls
(EngineCore_DP0 pid=115978) raise ValueError("No model architectures are specified")
(EngineCore_DP0 pid=115978) ValueError: No model architectures are specified
[rank0]:[W911 14:57:16.196909771 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
(APIServer pid=115702) Traceback (most recent call last):
(APIServer pid=115702) File "/root/miniconda3/envs/py3.10/bin/vllm", line 8, in
(APIServer pid=115702) sys.exit(main())
(APIServer pid=115702) File "/home/vllm/vllm/entrypoints/cli/main.py", line 54, in main
(APIServer pid=115702) args.dispatch_function(args)
(APIServer pid=115702) File "/home/vllm/vllm/entrypoints/cli/serve.py", line 50, in cmd
(APIServer pid=115702) uvloop.run(run_server(args))
(APIServer pid=115702) File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/uvloop/init.py", line 82, in run
(APIServer pid=115702) return loop.run_until_complete(wrapper())
(APIServer pid=115702) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=115702) File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/uvloop/init.py", line 61, in wrapper
(APIServer pid=115702) return await main
(APIServer pid=115702) File "/home/vllm/vllm/entrypoints/openai/api_server.py", line 1941, in run_server
(APIServer pid=115702) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=115702) File "/home/vllm/vllm/entrypoints/openai/api_server.py", line 1961, in run_server_worker
(APIServer pid=115702) async with build_async_engine_client(
(APIServer pid=115702) File "/root/miniconda3/envs/py3.10/lib/python3.10/contextlib.py", line 199, in aenter
(APIServer pid=115702) return await anext(self.gen)
(APIServer pid=115702) File "/home/vllm/vllm/entrypoints/openai/api_server.py", line 179, in build_async_engine_client
(APIServer pid=115702) async with build_async_engine_client_from_engine_args(
(APIServer pid=115702) File "/root/miniconda3/envs/py3.10/lib/python3.10/contextlib.py", line 199, in aenter
(APIServer pid=115702) return await anext(self.gen)
(APIServer pid=115702) File "/home/vllm/vllm/entrypoints/openai/api_server.py", line 221, in build_async_engine_client_from_engine_args
(APIServer pid=115702) async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=115702) File "/home/vllm/vllm/utils/init.py", line 1589, in inner
(APIServer pid=115702) return fn(*args, **kwargs)
(APIServer pid=115702) File "/home/vllm/vllm/v1/engine/async_llm.py", line 205, in from_vllm_config
(APIServer pid=115702) return cls(
(APIServer pid=115702) File "/home/vllm/vllm/v1/engine/async_llm.py", line 129, in init
(APIServer pid=115702) self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=115702) File "/home/vllm/vllm/v1/engine/core_client.py", line 102, in make_async_mp_client
(APIServer pid=115702) return AsyncMPClient(*client_args)
(APIServer pid=115702) File "/home/vllm/vllm/v1/engine/core_client.py", line 769, in init
(APIServer pid=115702) super().init(
(APIServer pid=115702) File "/home/vllm/vllm/v1/engine/core_client.py", line 448, in init
(APIServer pid=115702) with launch_core_engines(vllm_config, executor_class,
(APIServer pid=115702) File "/root/miniconda3/envs/py3.10/lib/python3.10/contextlib.py", line 142, in exit
(APIServer pid=115702) next(self.gen)
(APIServer pid=115702) File "/home/vllm/vllm/v1/engine/utils.py", line 729, in launch_core_engines
(APIServer pid=115702) wait_for_engine_startup(
(APIServer pid=115702) File "/home/vllm/vllm/v1/engine/utils.py", line 782, in wait_for_engine_startup
(APIServer pid=115702) raise RuntimeError("Engine core initialization failed. "
(APIServer pid=115702) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

@petersalas
Copy link
Contributor Author

(EngineCore_DP0 pid=115978) INFO 09-11 14:57:14 [gpu_model_runner.py:2338] Starting to load model fixie-ai/ultravox-v0_6-gemma-3-27b... (EngineCore_DP0 pid=115978) INFO 09-11 14:57:14 [gpu_model_runner.py:2370] Loading model from scratch... (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] EngineCore failed to start. (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] Traceback (most recent call last): (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/v1/engine/core.py", line 709, in run_engine_core (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] engine_core = EngineCoreProc(*args, **kwargs) (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/v1/engine/core.py", line 505, in init (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] super().init(vllm_config, executor_class, log_stats, (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/v1/engine/core.py", line 82, in init (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] self.model_executor = executor_class(vllm_config) (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/executor/executor_base.py", line 54, in init (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] self._init_executor() (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/executor/uniproc_executor.py", line 49, in _init_executor (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] self.collective_rpc("load_model") (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/executor/uniproc_executor.py", line 58, in collective_rpc (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] answer = run_method(self.driver_worker, method, args, kwargs) (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/utils/init.py", line 3057, in run_method (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] return func(*args, **kwargs) (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/v1/worker/gpu_worker.py", line 213, in load_model (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] self.model_runner.load_model(eep_scale_up=eep_scale_up) (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/v1/worker/gpu_model_runner.py", line 2371, in load_model (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] self.model = model_loader.load_model( (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/model_executor/model_loader/base_loader.py", line 45, in load_model (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] model = initialize_model(vllm_config=vllm_config, (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/model_executor/model_loader/utils.py", line 63, in initialize_model (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] return model_class(vllm_config=vllm_config, prefix=prefix) (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/model_executor/models/ultravox.py", line 439, in init (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] self.language_model = init_vllm_registered_model( (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/model_executor/models/utils.py", line 316, in init_vllm_registered_model (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] return initialize_model(vllm_config=vllm_config, prefix=prefix) (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/model_executor/model_loader/utils.py", line 51, in initialize_model (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] model_class, _ = get_model_architecture(model_config) (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/model_executor/model_loader/utils.py", line 172, in get_model_architecture (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] model_cls, arch = model_config.registry.resolve_model_cls( (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/model_executor/models/registry.py", line 687, in resolve_model_cls (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] raise ValueError("No model architectures are specified") (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] ValueError: No model architectures are specified (EngineCore_DP0 pid=115978) Process EngineCore_DP0: (EngineCore_DP0 pid=115978) Traceback (most recent call last): (EngineCore_DP0 pid=115978) File "/root/miniconda3/envs/py3.10/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap (EngineCore_DP0 pid=115978) self.run() (EngineCore_DP0 pid=115978) File "/root/miniconda3/envs/py3.10/lib/python3.10/multiprocessing/process.py", line 108, in run (EngineCore_DP0 pid=115978) self._target(*self._args, **self._kwargs) (EngineCore_DP0 pid=115978) File "/home/vllm/vllm/v1/engine/core.py", line 722, in run_engine_core (EngineCore_DP0 pid=115978) raise e (EngineCore_DP0 pid=115978) File "/home/vllm/vllm/v1/engine/core.py", line 709, in run_engine_core (EngineCore_DP0 pid=115978) engine_core = EngineCoreProc(*args, **kwargs) (EngineCore_DP0 pid=115978) File "/home/vllm/vllm/v1/engine/core.py", line 505, in init (EngineCore_DP0 pid=115978) super().init(vllm_config, executor_class, log_stats, (EngineCore_DP0 pid=115978) File "/home/vllm/vllm/v1/engine/core.py", line 82, in init (EngineCore_DP0 pid=115978) self.model_executor = executor_class(vllm_config) (EngineCore_DP0 pid=115978) File "/home/vllm/vllm/executor/executor_base.py", line 54, in init (EngineCore_DP0 pid=115978) self._init_executor() (EngineCore_DP0 pid=115978) File "/home/vllm/vllm/executor/uniproc_executor.py", line 49, in _init_executor (EngineCore_DP0 pid=115978) self.collective_rpc("load_model") (EngineCore_DP0 pid=115978) File "/home/vllm/vllm/executor/uniproc_executor.py", line 58, in collective_rpc (EngineCore_DP0 pid=115978) answer = run_method(self.driver_worker, method, args, kwargs) (EngineCore_DP0 pid=115978) File "/home/vllm/vllm/utils/init.py", line 3057, in run_method (EngineCore_DP0 pid=115978) return func(*args, **kwargs) (EngineCore_DP0 pid=115978) File "/home/vllm/vllm/v1/worker/gpu_worker.py", line 213, in load_model (EngineCore_DP0 pid=115978) self.model_runner.load_model(eep_scale_up=eep_scale_up) (EngineCore_DP0 pid=115978) File "/home/vllm/vllm/v1/worker/gpu_model_runner.py", line 2371, in load_model (EngineCore_DP0 pid=115978) self.model = model_loader.load_model( (EngineCore_DP0 pid=115978) File "/home/vllm/vllm/model_executor/model_loader/base_loader.py", line 45, in load_model (EngineCore_DP0 pid=115978) model = initialize_model(vllm_config=vllm_config, (EngineCore_DP0 pid=115978) File "/home/vllm/vllm/model_executor/model_loader/utils.py", line 63, in initialize_model (EngineCore_DP0 pid=115978) return model_class(vllm_config=vllm_config, prefix=prefix) (EngineCore_DP0 pid=115978) File "/home/vllm/vllm/model_executor/models/ultravox.py", line 439, in init (EngineCore_DP0 pid=115978) self.language_model = init_vllm_registered_model( (EngineCore_DP0 pid=115978) File "/home/vllm/vllm/model_executor/models/utils.py", line 316, in init_vllm_registered_model (EngineCore_DP0 pid=115978) return initialize_model(vllm_config=vllm_config, prefix=prefix) (EngineCore_DP0 pid=115978) File "/home/vllm/vllm/model_executor/model_loader/utils.py", line 51, in initialize_model (EngineCore_DP0 pid=115978) model_class, _ = get_model_architecture(model_config) (EngineCore_DP0 pid=115978) File "/home/vllm/vllm/model_executor/model_loader/utils.py", line 172, in get_model_architecture (EngineCore_DP0 pid=115978) model_cls, arch = model_config.registry.resolve_model_cls( (EngineCore_DP0 pid=115978) File "/home/vllm/vllm/model_executor/models/registry.py", line 687, in resolve_model_cls (EngineCore_DP0 pid=115978) raise ValueError("No model architectures are specified") (EngineCore_DP0 pid=115978) ValueError: No model architectures are specified [rank0]:[W911 14:57:16.196909771 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) (APIServer pid=115702) Traceback (most recent call last): (APIServer pid=115702) File "/root/miniconda3/envs/py3.10/bin/vllm", line 8, in (APIServer pid=115702) sys.exit(main()) (APIServer pid=115702) File "/home/vllm/vllm/entrypoints/cli/main.py", line 54, in main (APIServer pid=115702) args.dispatch_function(args) (APIServer pid=115702) File "/home/vllm/vllm/entrypoints/cli/serve.py", line 50, in cmd (APIServer pid=115702) uvloop.run(run_server(args)) (APIServer pid=115702) File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/uvloop/init.py", line 82, in run (APIServer pid=115702) return loop.run_until_complete(wrapper()) (APIServer pid=115702) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete (APIServer pid=115702) File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/uvloop/init.py", line 61, in wrapper (APIServer pid=115702) return await main (APIServer pid=115702) File "/home/vllm/vllm/entrypoints/openai/api_server.py", line 1941, in run_server (APIServer pid=115702) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs) (APIServer pid=115702) File "/home/vllm/vllm/entrypoints/openai/api_server.py", line 1961, in run_server_worker (APIServer pid=115702) async with build_async_engine_client( (APIServer pid=115702) File "/root/miniconda3/envs/py3.10/lib/python3.10/contextlib.py", line 199, in aenter (APIServer pid=115702) return await anext(self.gen) (APIServer pid=115702) File "/home/vllm/vllm/entrypoints/openai/api_server.py", line 179, in build_async_engine_client (APIServer pid=115702) async with build_async_engine_client_from_engine_args( (APIServer pid=115702) File "/root/miniconda3/envs/py3.10/lib/python3.10/contextlib.py", line 199, in aenter (APIServer pid=115702) return await anext(self.gen) (APIServer pid=115702) File "/home/vllm/vllm/entrypoints/openai/api_server.py", line 221, in build_async_engine_client_from_engine_args (APIServer pid=115702) async_llm = AsyncLLM.from_vllm_config( (APIServer pid=115702) File "/home/vllm/vllm/utils/init.py", line 1589, in inner (APIServer pid=115702) return fn(*args, **kwargs) (APIServer pid=115702) File "/home/vllm/vllm/v1/engine/async_llm.py", line 205, in from_vllm_config (APIServer pid=115702) return cls( (APIServer pid=115702) File "/home/vllm/vllm/v1/engine/async_llm.py", line 129, in init (APIServer pid=115702) self.engine_core = EngineCoreClient.make_async_mp_client( (APIServer pid=115702) File "/home/vllm/vllm/v1/engine/core_client.py", line 102, in make_async_mp_client (APIServer pid=115702) return AsyncMPClient(*client_args) (APIServer pid=115702) File "/home/vllm/vllm/v1/engine/core_client.py", line 769, in init (APIServer pid=115702) super().init( (APIServer pid=115702) File "/home/vllm/vllm/v1/engine/core_client.py", line 448, in init (APIServer pid=115702) with launch_core_engines(vllm_config, executor_class, (APIServer pid=115702) File "/root/miniconda3/envs/py3.10/lib/python3.10/contextlib.py", line 142, in exit (APIServer pid=115702) next(self.gen) (APIServer pid=115702) File "/home/vllm/vllm/v1/engine/utils.py", line 729, in launch_core_engines (APIServer pid=115702) wait_for_engine_startup( (APIServer pid=115702) File "/home/vllm/vllm/v1/engine/utils.py", line 782, in wait_for_engine_startup (APIServer pid=115702) raise RuntimeError("Engine core initialization failed. " (APIServer pid=115702) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

Oops, I accidentally reverted part of the change when I pushed the quant changes, will put up a fix shortly.

skyloevil pushed a commit to skyloevil/vllm that referenced this pull request Sep 13, 2025
dsxsteven pushed a commit to dsxsteven/vllm_splitPR that referenced this pull request Sep 15, 2025
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants