[Ultravox] Fix Gemma instantiation, support quantization via --hf-overrides #24131

petersalas · 2025-09-02T23:42:04Z

Purpose

When Ultravox wraps a multi-modal model (e.g. Gemma) vLLM fails to load because the UltravoxModel.text_config is the multi-modal model's config. This changes UltravoxConfig.text_config to point to the wrapped text config instead. (However, we still instantiate the wrapped multi-modal model in its entirety when using init_vllm_registered_model.)

Additionally, support replacing the text model with a quantized variant by overriding text_model_id.

Test Plan

Confirm that Llama/Gemma/Qwen Ultravox models are able to be loaded in vLLM, and confirm that quantized variants can be loaded as well.

vllm serve fixie-ai/ultravox-v0_6-gemma-3-27b --trust-remote-code
vllm serve fixie-ai/ultravox-v0_5-llama-3_1-8b --trust-remote-code --hf-overrides.text_model_id=nvidia/Llama-3.1-8B-Instruct-FP8

Test Result

The models load.

Confirmed

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

gemini-code-assist

Code Review

This pull request refactors the configuration handling for Ultravox models to correctly support wrapping multi-modal models like Gemma. The changes introduce a wrapped_model_config to store the full configuration of the wrapped model, while text_config now correctly points to the inner text model's configuration. This is a good clarification that should improve robustness. I've found one potential issue with hardcoded trust_remote_code that could prevent loading of certain custom models.

vllm/transformers_utils/configs/ultravox.py

DarkLight1337

Thanks, can you fix the pre-commit errors?

petersalas · 2025-09-09T22:29:24Z

@DarkLight1337 actually, since I'm going to merge main anyway I'm going to fold in an additional fix for supporting quantized models/overriding text_model_id via --hf-overrides -- so please hold off merging for now :)

…rride.text_model_id Signed-off-by: Peter Salas <[email protected]>

Signed-off-by: Peter Salas <[email protected]>

DarkLight1337 · 2025-09-11T04:25:48Z

Can I merge this now?

petersalas · 2025-09-11T05:21:23Z

Can I merge this now?

Fine with me!

canercan7 · 2025-09-11T14:58:05Z

(EngineCore_DP0 pid=115978) INFO 09-11 14:57:14 [gpu_model_runner.py:2338] Starting to load model fixie-ai/ultravox-v0_6-gemma-3-27b...
(EngineCore_DP0 pid=115978) INFO 09-11 14:57:14 [gpu_model_runner.py:2370] Loading model from scratch...
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] EngineCore failed to start.
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] Traceback (most recent call last):
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/v1/engine/core.py", line 709, in run_engine_core
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/v1/engine/core.py", line 505, in init
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] super().init(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/v1/engine/core.py", line 82, in init
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/executor/executor_base.py", line 54, in init
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] self._init_executor()
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/executor/uniproc_executor.py", line 49, in _init_executor
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] self.collective_rpc("load_model")
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/executor/uniproc_executor.py", line 58, in collective_rpc
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] answer = run_method(self.driver_worker, method, args, kwargs)
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/utils/init.py", line 3057, in run_method
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] return func(*args, **kwargs)
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/v1/worker/gpu_worker.py", line 213, in load_model
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] self.model_runner.load_model(eep_scale_up=eep_scale_up)
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/v1/worker/gpu_model_runner.py", line 2371, in load_model
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] self.model = model_loader.load_model(
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/model_executor/model_loader/base_loader.py", line 45, in load_model
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] model = initialize_model(vllm_config=vllm_config,
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/model_executor/model_loader/utils.py", line 63, in initialize_model
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] return model_class(vllm_config=vllm_config, prefix=prefix)
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/model_executor/models/ultravox.py", line 439, in init
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] self.language_model = init_vllm_registered_model(
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/model_executor/models/utils.py", line 316, in init_vllm_registered_model
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] return initialize_model(vllm_config=vllm_config, prefix=prefix)
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/model_executor/model_loader/utils.py", line 51, in initialize_model
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] model_class, _ = get_model_architecture(model_config)
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/model_executor/model_loader/utils.py", line 172, in get_model_architecture
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] model_cls, arch = model_config.registry.resolve_model_cls(
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/model_executor/models/registry.py", line 687, in resolve_model_cls
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] raise ValueError("No model architectures are specified")
(EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] ValueError: No model architectures are specified
(EngineCore_DP0 pid=115978) Process EngineCore_DP0:
(EngineCore_DP0 pid=115978) Traceback (most recent call last):
(EngineCore_DP0 pid=115978) File "/root/miniconda3/envs/py3.10/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP0 pid=115978) self.run()
(EngineCore_DP0 pid=115978) File "/root/miniconda3/envs/py3.10/lib/python3.10/multiprocessing/process.py", line 108, in run
(EngineCore_DP0 pid=115978) self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=115978) File "/home/vllm/vllm/v1/engine/core.py", line 722, in run_engine_core
(EngineCore_DP0 pid=115978) raise e
(EngineCore_DP0 pid=115978) File "/home/vllm/vllm/v1/engine/core.py", line 709, in run_engine_core
(EngineCore_DP0 pid=115978) engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=115978) File "/home/vllm/vllm/v1/engine/core.py", line 505, in init
(EngineCore_DP0 pid=115978) super().init(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=115978) File "/home/vllm/vllm/v1/engine/core.py", line 82, in init
(EngineCore_DP0 pid=115978) self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=115978) File "/home/vllm/vllm/executor/executor_base.py", line 54, in init
(EngineCore_DP0 pid=115978) self._init_executor()
(EngineCore_DP0 pid=115978) File "/home/vllm/vllm/executor/uniproc_executor.py", line 49, in _init_executor
(EngineCore_DP0 pid=115978) self.collective_rpc("load_model")
(EngineCore_DP0 pid=115978) File "/home/vllm/vllm/executor/uniproc_executor.py", line 58, in collective_rpc
(EngineCore_DP0 pid=115978) answer = run_method(self.driver_worker, method, args, kwargs)
(EngineCore_DP0 pid=115978) File "/home/vllm/vllm/utils/init.py", line 3057, in run_method
(EngineCore_DP0 pid=115978) return func(*args, **kwargs)
(EngineCore_DP0 pid=115978) File "/home/vllm/vllm/v1/worker/gpu_worker.py", line 213, in load_model
(EngineCore_DP0 pid=115978) self.model_runner.load_model(eep_scale_up=eep_scale_up)
(EngineCore_DP0 pid=115978) File "/home/vllm/vllm/v1/worker/gpu_model_runner.py", line 2371, in load_model
(EngineCore_DP0 pid=115978) self.model = model_loader.load_model(
(EngineCore_DP0 pid=115978) File "/home/vllm/vllm/model_executor/model_loader/base_loader.py", line 45, in load_model
(EngineCore_DP0 pid=115978) model = initialize_model(vllm_config=vllm_config,
(EngineCore_DP0 pid=115978) File "/home/vllm/vllm/model_executor/model_loader/utils.py", line 63, in initialize_model
(EngineCore_DP0 pid=115978) return model_class(vllm_config=vllm_config, prefix=prefix)
(EngineCore_DP0 pid=115978) File "/home/vllm/vllm/model_executor/models/ultravox.py", line 439, in init
(EngineCore_DP0 pid=115978) self.language_model = init_vllm_registered_model(
(EngineCore_DP0 pid=115978) File "/home/vllm/vllm/model_executor/models/utils.py", line 316, in init_vllm_registered_model
(EngineCore_DP0 pid=115978) return initialize_model(vllm_config=vllm_config, prefix=prefix)
(EngineCore_DP0 pid=115978) File "/home/vllm/vllm/model_executor/model_loader/utils.py", line 51, in initialize_model
(EngineCore_DP0 pid=115978) model_class, _ = get_model_architecture(model_config)
(EngineCore_DP0 pid=115978) File "/home/vllm/vllm/model_executor/model_loader/utils.py", line 172, in get_model_architecture
(EngineCore_DP0 pid=115978) model_cls, arch = model_config.registry.resolve_model_cls(
(EngineCore_DP0 pid=115978) File "/home/vllm/vllm/model_executor/models/registry.py", line 687, in resolve_model_cls
(EngineCore_DP0 pid=115978) raise ValueError("No model architectures are specified")
(EngineCore_DP0 pid=115978) ValueError: No model architectures are specified
[rank0]:[W911 14:57:16.196909771 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
(APIServer pid=115702) Traceback (most recent call last):
(APIServer pid=115702) File "/root/miniconda3/envs/py3.10/bin/vllm", line 8, in
(APIServer pid=115702) sys.exit(main())
(APIServer pid=115702) File "/home/vllm/vllm/entrypoints/cli/main.py", line 54, in main
(APIServer pid=115702) args.dispatch_function(args)
(APIServer pid=115702) File "/home/vllm/vllm/entrypoints/cli/serve.py", line 50, in cmd
(APIServer pid=115702) uvloop.run(run_server(args))
(APIServer pid=115702) File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/uvloop/init.py", line 82, in run
(APIServer pid=115702) return loop.run_until_complete(wrapper())
(APIServer pid=115702) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=115702) File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/uvloop/init.py", line 61, in wrapper
(APIServer pid=115702) return await main
(APIServer pid=115702) File "/home/vllm/vllm/entrypoints/openai/api_server.py", line 1941, in run_server
(APIServer pid=115702) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=115702) File "/home/vllm/vllm/entrypoints/openai/api_server.py", line 1961, in run_server_worker
(APIServer pid=115702) async with build_async_engine_client(
(APIServer pid=115702) File "/root/miniconda3/envs/py3.10/lib/python3.10/contextlib.py", line 199, in aenter
(APIServer pid=115702) return await anext(self.gen)
(APIServer pid=115702) File "/home/vllm/vllm/entrypoints/openai/api_server.py", line 179, in build_async_engine_client
(APIServer pid=115702) async with build_async_engine_client_from_engine_args(
(APIServer pid=115702) File "/root/miniconda3/envs/py3.10/lib/python3.10/contextlib.py", line 199, in aenter
(APIServer pid=115702) return await anext(self.gen)
(APIServer pid=115702) File "/home/vllm/vllm/entrypoints/openai/api_server.py", line 221, in build_async_engine_client_from_engine_args
(APIServer pid=115702) async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=115702) File "/home/vllm/vllm/utils/init.py", line 1589, in inner
(APIServer pid=115702) return fn(*args, **kwargs)
(APIServer pid=115702) File "/home/vllm/vllm/v1/engine/async_llm.py", line 205, in from_vllm_config
(APIServer pid=115702) return cls(
(APIServer pid=115702) File "/home/vllm/vllm/v1/engine/async_llm.py", line 129, in init
(APIServer pid=115702) self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=115702) File "/home/vllm/vllm/v1/engine/core_client.py", line 102, in make_async_mp_client
(APIServer pid=115702) return AsyncMPClient(*client_args)
(APIServer pid=115702) File "/home/vllm/vllm/v1/engine/core_client.py", line 769, in init
(APIServer pid=115702) super().init(
(APIServer pid=115702) File "/home/vllm/vllm/v1/engine/core_client.py", line 448, in init
(APIServer pid=115702) with launch_core_engines(vllm_config, executor_class,
(APIServer pid=115702) File "/root/miniconda3/envs/py3.10/lib/python3.10/contextlib.py", line 142, in exit
(APIServer pid=115702) next(self.gen)
(APIServer pid=115702) File "/home/vllm/vllm/v1/engine/utils.py", line 729, in launch_core_engines
(APIServer pid=115702) wait_for_engine_startup(
(APIServer pid=115702) File "/home/vllm/vllm/v1/engine/utils.py", line 782, in wait_for_engine_startup
(APIServer pid=115702) raise RuntimeError("Engine core initialization failed. "
(APIServer pid=115702) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

petersalas · 2025-09-11T15:18:00Z

(EngineCore_DP0 pid=115978) INFO 09-11 14:57:14 [gpu_model_runner.py:2338] Starting to load model fixie-ai/ultravox-v0_6-gemma-3-27b... (EngineCore_DP0 pid=115978) INFO 09-11 14:57:14 [gpu_model_runner.py:2370] Loading model from scratch... (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] EngineCore failed to start. (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] Traceback (most recent call last): (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/v1/engine/core.py", line 709, in run_engine_core (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] engine_core = EngineCoreProc(*args, **kwargs) (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/v1/engine/core.py", line 505, in init (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] super().init(vllm_config, executor_class, log_stats, (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/v1/engine/core.py", line 82, in init (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] self.model_executor = executor_class(vllm_config) (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/executor/executor_base.py", line 54, in init (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] self._init_executor() (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/executor/uniproc_executor.py", line 49, in _init_executor (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] self.collective_rpc("load_model") (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/executor/uniproc_executor.py", line 58, in collective_rpc (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] answer = run_method(self.driver_worker, method, args, kwargs) (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/utils/init.py", line 3057, in run_method (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] return func(*args, **kwargs) (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/v1/worker/gpu_worker.py", line 213, in load_model (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] self.model_runner.load_model(eep_scale_up=eep_scale_up) (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/v1/worker/gpu_model_runner.py", line 2371, in load_model (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] self.model = model_loader.load_model( (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/model_executor/model_loader/base_loader.py", line 45, in load_model (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] model = initialize_model(vllm_config=vllm_config, (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/model_executor/model_loader/utils.py", line 63, in initialize_model (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] return model_class(vllm_config=vllm_config, prefix=prefix) (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/model_executor/models/ultravox.py", line 439, in init (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] self.language_model = init_vllm_registered_model( (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/model_executor/models/utils.py", line 316, in init_vllm_registered_model (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] return initialize_model(vllm_config=vllm_config, prefix=prefix) (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/model_executor/model_loader/utils.py", line 51, in initialize_model (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] model_class, _ = get_model_architecture(model_config) (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/model_executor/model_loader/utils.py", line 172, in get_model_architecture (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] model_cls, arch = model_config.registry.resolve_model_cls( (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] File "/home/vllm/vllm/model_executor/models/registry.py", line 687, in resolve_model_cls (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] raise ValueError("No model architectures are specified") (EngineCore_DP0 pid=115978) ERROR 09-11 14:57:15 [core.py:718] ValueError: No model architectures are specified (EngineCore_DP0 pid=115978) Process EngineCore_DP0: (EngineCore_DP0 pid=115978) Traceback (most recent call last): (EngineCore_DP0 pid=115978) File "/root/miniconda3/envs/py3.10/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap (EngineCore_DP0 pid=115978) self.run() (EngineCore_DP0 pid=115978) File "/root/miniconda3/envs/py3.10/lib/python3.10/multiprocessing/process.py", line 108, in run (EngineCore_DP0 pid=115978) self._target(*self._args, **self._kwargs) (EngineCore_DP0 pid=115978) File "/home/vllm/vllm/v1/engine/core.py", line 722, in run_engine_core (EngineCore_DP0 pid=115978) raise e (EngineCore_DP0 pid=115978) File "/home/vllm/vllm/v1/engine/core.py", line 709, in run_engine_core (EngineCore_DP0 pid=115978) engine_core = EngineCoreProc(*args, **kwargs) (EngineCore_DP0 pid=115978) File "/home/vllm/vllm/v1/engine/core.py", line 505, in init (EngineCore_DP0 pid=115978) super().init(vllm_config, executor_class, log_stats, (EngineCore_DP0 pid=115978) File "/home/vllm/vllm/v1/engine/core.py", line 82, in init (EngineCore_DP0 pid=115978) self.model_executor = executor_class(vllm_config) (EngineCore_DP0 pid=115978) File "/home/vllm/vllm/executor/executor_base.py", line 54, in init (EngineCore_DP0 pid=115978) self._init_executor() (EngineCore_DP0 pid=115978) File "/home/vllm/vllm/executor/uniproc_executor.py", line 49, in _init_executor (EngineCore_DP0 pid=115978) self.collective_rpc("load_model") (EngineCore_DP0 pid=115978) File "/home/vllm/vllm/executor/uniproc_executor.py", line 58, in collective_rpc (EngineCore_DP0 pid=115978) answer = run_method(self.driver_worker, method, args, kwargs) (EngineCore_DP0 pid=115978) File "/home/vllm/vllm/utils/init.py", line 3057, in run_method (EngineCore_DP0 pid=115978) return func(*args, **kwargs) (EngineCore_DP0 pid=115978) File "/home/vllm/vllm/v1/worker/gpu_worker.py", line 213, in load_model (EngineCore_DP0 pid=115978) self.model_runner.load_model(eep_scale_up=eep_scale_up) (EngineCore_DP0 pid=115978) File "/home/vllm/vllm/v1/worker/gpu_model_runner.py", line 2371, in load_model (EngineCore_DP0 pid=115978) self.model = model_loader.load_model( (EngineCore_DP0 pid=115978) File "/home/vllm/vllm/model_executor/model_loader/base_loader.py", line 45, in load_model (EngineCore_DP0 pid=115978) model = initialize_model(vllm_config=vllm_config, (EngineCore_DP0 pid=115978) File "/home/vllm/vllm/model_executor/model_loader/utils.py", line 63, in initialize_model (EngineCore_DP0 pid=115978) return model_class(vllm_config=vllm_config, prefix=prefix) (EngineCore_DP0 pid=115978) File "/home/vllm/vllm/model_executor/models/ultravox.py", line 439, in init (EngineCore_DP0 pid=115978) self.language_model = init_vllm_registered_model( (EngineCore_DP0 pid=115978) File "/home/vllm/vllm/model_executor/models/utils.py", line 316, in init_vllm_registered_model (EngineCore_DP0 pid=115978) return initialize_model(vllm_config=vllm_config, prefix=prefix) (EngineCore_DP0 pid=115978) File "/home/vllm/vllm/model_executor/model_loader/utils.py", line 51, in initialize_model (EngineCore_DP0 pid=115978) model_class, _ = get_model_architecture(model_config) (EngineCore_DP0 pid=115978) File "/home/vllm/vllm/model_executor/model_loader/utils.py", line 172, in get_model_architecture (EngineCore_DP0 pid=115978) model_cls, arch = model_config.registry.resolve_model_cls( (EngineCore_DP0 pid=115978) File "/home/vllm/vllm/model_executor/models/registry.py", line 687, in resolve_model_cls (EngineCore_DP0 pid=115978) raise ValueError("No model architectures are specified") (EngineCore_DP0 pid=115978) ValueError: No model architectures are specified [rank0]:[W911 14:57:16.196909771 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) (APIServer pid=115702) Traceback (most recent call last): (APIServer pid=115702) File "/root/miniconda3/envs/py3.10/bin/vllm", line 8, in (APIServer pid=115702) sys.exit(main()) (APIServer pid=115702) File "/home/vllm/vllm/entrypoints/cli/main.py", line 54, in main (APIServer pid=115702) args.dispatch_function(args) (APIServer pid=115702) File "/home/vllm/vllm/entrypoints/cli/serve.py", line 50, in cmd (APIServer pid=115702) uvloop.run(run_server(args)) (APIServer pid=115702) File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/uvloop/init.py", line 82, in run (APIServer pid=115702) return loop.run_until_complete(wrapper()) (APIServer pid=115702) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete (APIServer pid=115702) File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/uvloop/init.py", line 61, in wrapper (APIServer pid=115702) return await main (APIServer pid=115702) File "/home/vllm/vllm/entrypoints/openai/api_server.py", line 1941, in run_server (APIServer pid=115702) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs) (APIServer pid=115702) File "/home/vllm/vllm/entrypoints/openai/api_server.py", line 1961, in run_server_worker (APIServer pid=115702) async with build_async_engine_client( (APIServer pid=115702) File "/root/miniconda3/envs/py3.10/lib/python3.10/contextlib.py", line 199, in aenter (APIServer pid=115702) return await anext(self.gen) (APIServer pid=115702) File "/home/vllm/vllm/entrypoints/openai/api_server.py", line 179, in build_async_engine_client (APIServer pid=115702) async with build_async_engine_client_from_engine_args( (APIServer pid=115702) File "/root/miniconda3/envs/py3.10/lib/python3.10/contextlib.py", line 199, in aenter (APIServer pid=115702) return await anext(self.gen) (APIServer pid=115702) File "/home/vllm/vllm/entrypoints/openai/api_server.py", line 221, in build_async_engine_client_from_engine_args (APIServer pid=115702) async_llm = AsyncLLM.from_vllm_config( (APIServer pid=115702) File "/home/vllm/vllm/utils/init.py", line 1589, in inner (APIServer pid=115702) return fn(*args, **kwargs) (APIServer pid=115702) File "/home/vllm/vllm/v1/engine/async_llm.py", line 205, in from_vllm_config (APIServer pid=115702) return cls( (APIServer pid=115702) File "/home/vllm/vllm/v1/engine/async_llm.py", line 129, in init (APIServer pid=115702) self.engine_core = EngineCoreClient.make_async_mp_client( (APIServer pid=115702) File "/home/vllm/vllm/v1/engine/core_client.py", line 102, in make_async_mp_client (APIServer pid=115702) return AsyncMPClient(*client_args) (APIServer pid=115702) File "/home/vllm/vllm/v1/engine/core_client.py", line 769, in init (APIServer pid=115702) super().init( (APIServer pid=115702) File "/home/vllm/vllm/v1/engine/core_client.py", line 448, in init (APIServer pid=115702) with launch_core_engines(vllm_config, executor_class, (APIServer pid=115702) File "/root/miniconda3/envs/py3.10/lib/python3.10/contextlib.py", line 142, in exit (APIServer pid=115702) next(self.gen) (APIServer pid=115702) File "/home/vllm/vllm/v1/engine/utils.py", line 729, in launch_core_engines (APIServer pid=115702) wait_for_engine_startup( (APIServer pid=115702) File "/home/vllm/vllm/v1/engine/utils.py", line 782, in wait_for_engine_startup (APIServer pid=115702) raise RuntimeError("Engine core initialization failed. " (APIServer pid=115702) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

Oops, I accidentally reverted part of the change when I pushed the quant changes, will put up a fix shortly.

…rrides (vllm-project#24131) Signed-off-by: Peter Salas <[email protected]>

…rrides (vllm-project#24131) Signed-off-by: Peter Salas <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

gemini-code-assist bot reviewed Sep 2, 2025

View reviewed changes

vllm/transformers_utils/configs/ultravox.py Outdated Show resolved Hide resolved

DarkLight1337 approved these changes Sep 3, 2025

View reviewed changes

petersalas requested a review from DarkLight1337 September 9, 2025 03:04

DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 9, 2025

DarkLight1337 approved these changes Sep 9, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) September 9, 2025 03:32

DarkLight1337 disabled auto-merge September 10, 2025 00:27

[Ultravox] Fix Gemma instantiation, support quantization via --hf-ove…

7331c69

…rride.text_model_id Signed-off-by: Peter Salas <[email protected]>

petersalas force-pushed the psalas/inner-text-config branch from 15caeaf to 7331c69 Compare September 10, 2025 19:12

petersalas requested review from ProExpertProg, WoosukKwon, hmellor, houseroad, mgoin, robertgshaw2-redhat, simon-mo, tlrmchlsmth, yewentao256 and youkaichao as code owners September 10, 2025 19:12

petersalas changed the title ~~[Ultravox] Fix gemma instantiation~~ [Ultravox] Fix Gemma instantiation, support quantization via --hf-overrides Sep 10, 2025

petersalas added 2 commits September 10, 2025 12:18

Fix types

7738856

Signed-off-by: Peter Salas <[email protected]>

Fix comment

f180d5e

Signed-off-by: Peter Salas <[email protected]>

vllm-bot merged commit f17a6aa into vllm-project:main Sep 11, 2025
39 of 41 checks passed

petersalas mentioned this pull request Sep 11, 2025

[Ultravox] Use wrapped_model_config to instantiate inner model #24679

Merged

5 tasks

skyloevil pushed a commit to skyloevil/vllm that referenced this pull request Sep 13, 2025

[Ultravox] Fix Gemma instantiation, support quantization via --hf-ove…

b97a919

…rrides (vllm-project#24131) Signed-off-by: Peter Salas <[email protected]>

dsxsteven pushed a commit to dsxsteven/vllm_splitPR that referenced this pull request Sep 15, 2025

[Ultravox] Fix Gemma instantiation, support quantization via --hf-ove…

e5d6ef6

…rrides (vllm-project#24131) Signed-off-by: Peter Salas <[email protected]>

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

[Ultravox] Fix Gemma instantiation, support quantization via --hf-ove…

6b2de1f

…rrides (vllm-project#24131) Signed-off-by: Peter Salas <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Ultravox] Fix Gemma instantiation, support quantization via --hf-overrides #24131

[Ultravox] Fix Gemma instantiation, support quantization via --hf-overrides #24131

Uh oh!

petersalas commented Sep 2, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

DarkLight1337 left a comment

Uh oh!

petersalas commented Sep 9, 2025

Uh oh!

DarkLight1337 commented Sep 11, 2025

Uh oh!

petersalas commented Sep 11, 2025

Uh oh!

Uh oh!

canercan7 commented Sep 11, 2025

Uh oh!

petersalas commented Sep 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

[Ultravox] Fix Gemma instantiation, support quantization via --hf-overrides #24131

[Ultravox] Fix Gemma instantiation, support quantization via --hf-overrides #24131

Uh oh!

Conversation

petersalas commented Sep 2, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

petersalas commented Sep 9, 2025

Uh oh!

DarkLight1337 commented Sep 11, 2025

Uh oh!

petersalas commented Sep 11, 2025

Uh oh!

Uh oh!

canercan7 commented Sep 11, 2025

Uh oh!

petersalas commented Sep 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

petersalas commented Sep 2, 2025 •

edited by github-actions bot

Loading