Skip to content

Conversation

hmellor
Copy link
Member

@hmellor hmellor commented Aug 27, 2025

The docstrings for config class attributes are only actually needed if we're generating --help text or documentation.

This saves 235ms on my machine by completely eliminating the cost of extracting the docstrings on start up.


$ vllm serve --help
INFO 08-27 10:36:28 [__init__.py:241] Automatically detected platform cuda.
INFO 08-27 10:36:30 [arg_utils.py:165] Computed kwargs for FrontendArgs in 0.0038s (cumulative: 0.0038s)
INFO 08-27 10:36:30 [arg_utils.py:165] Computed kwargs for ModelConfig in 0.0383s (cumulative: 0.0421s)
INFO 08-27 10:36:31 [arg_utils.py:165] Computed kwargs for LoadConfig in 0.0263s (cumulative: 0.0684s)
INFO 08-27 10:36:31 [arg_utils.py:165] Computed kwargs for DecodingConfig in 0.0302s (cumulative: 0.0985s)
INFO 08-27 10:36:31 [arg_utils.py:165] Computed kwargs for ParallelConfig in 0.0061s (cumulative: 0.1046s)
INFO 08-27 10:36:31 [arg_utils.py:165] Computed kwargs for CacheConfig in 0.0032s (cumulative: 0.1078s)
INFO 08-27 10:36:31 [arg_utils.py:165] Computed kwargs for MultiModalConfig in 0.0284s (cumulative: 0.1362s)
INFO 08-27 10:36:31 [arg_utils.py:165] Computed kwargs for LoRAConfig in 0.0283s (cumulative: 0.1645s)
INFO 08-27 10:36:31 [arg_utils.py:165] Computed kwargs for ObservabilityConfig in 0.0302s (cumulative: 0.1947s)
INFO 08-27 10:36:31 [arg_utils.py:165] Computed kwargs for SchedulerConfig in 0.0034s (cumulative: 0.1980s)
INFO 08-27 10:36:31 [arg_utils.py:165] Computed kwargs for VllmConfig in 0.0370s (cumulative: 0.2350s)
...
$ vllm serve 
INFO 08-27 10:36:57 [__init__.py:241] Automatically detected platform cuda.
INFO 08-27 10:37:00 [arg_utils.py:165] Computed kwargs for FrontendArgs in 0.0000s (cumulative: 0.0000s)
INFO 08-27 10:37:00 [arg_utils.py:165] Computed kwargs for ModelConfig in 0.0000s (cumulative: 0.0000s)
INFO 08-27 10:37:00 [arg_utils.py:165] Computed kwargs for LoadConfig in 0.0000s (cumulative: 0.0000s)
INFO 08-27 10:37:00 [arg_utils.py:165] Computed kwargs for DecodingConfig in 0.0000s (cumulative: 0.0000s)
INFO 08-27 10:37:00 [arg_utils.py:165] Computed kwargs for ParallelConfig in 0.0000s (cumulative: 0.0000s)
INFO 08-27 10:37:00 [arg_utils.py:165] Computed kwargs for CacheConfig in 0.0000s (cumulative: 0.0000s)
INFO 08-27 10:37:00 [arg_utils.py:165] Computed kwargs for MultiModalConfig in 0.0000s (cumulative: 0.0000s)
INFO 08-27 10:37:00 [arg_utils.py:165] Computed kwargs for LoRAConfig in 0.0000s (cumulative: 0.0000s)
INFO 08-27 10:37:00 [arg_utils.py:165] Computed kwargs for ObservabilityConfig in 0.0000s (cumulative: 0.0000s)
INFO 08-27 10:37:00 [arg_utils.py:165] Computed kwargs for SchedulerConfig in 0.0000s (cumulative: 0.0000s)
INFO 08-27 10:37:00 [arg_utils.py:165] Computed kwargs for VllmConfig in 0.0000s (cumulative: 0.0000s)
...

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to optimize startup time by avoiding the extraction of docstrings when not generating help text. While the performance improvement is valuable, the current implementation introduces a critical caching bug by making a cached function's behavior dependent on a global variable (sys.argv). My review provides a detailed explanation of this bug and a robust solution to fix it, ensuring both performance and correctness.

Signed-off-by: Harry Mellor <[email protected]>
@hmellor hmellor mentioned this pull request Aug 27, 2025
@mgoin
Copy link
Member

mgoin commented Aug 27, 2025

Thank you!

@mgoin mgoin enabled auto-merge (squash) August 27, 2025 09:06
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 27, 2025
@ZJY0516
Copy link
Contributor

ZJY0516 commented Aug 27, 2025

Hi @hmellor . I was wondering how to measure time like this?

INFO 08-27 10:36:30 [arg_utils.py:165] Computed kwargs for FrontendArgs in 0.0038s (cumulative: 0.0038s)

@hmellor
Copy link
Member Author

hmellor commented Aug 27, 2025

I added some temporary logs that I did not commit, the code was:

cumulative_time = 0

@functools.lru_cache(maxsize=30)
def _compute_kwargs(cls: ConfigType, needs_help: bool) -> dict[str, Any]:
    import time
    global cumulative_time
    start = time.perf_counter()
    # Save time only getting attr docs if we're generating help text
    cls_docs = get_attr_docs(cls) if NEEDS_HELP else {}
    duration = time.perf_counter() - start
    cumulative_time += duration
    logger.info(f"Computed kwargs for {cls.__name__} in {duration:.4f}s (cumulative: {cumulative_time:.4f}s)")
    ...

@hmellor hmellor disabled auto-merge August 27, 2025 09:39
@hmellor
Copy link
Member Author

hmellor commented Aug 27, 2025

RTD docs build currently isn't triggering get_attr_docs just fixing it then I'll re-enable auto-merge

@hmellor
Copy link
Member Author

hmellor commented Aug 27, 2025

Docs look good again

@hmellor hmellor enabled auto-merge (squash) August 27, 2025 12:29
@ProExpertProg
Copy link
Collaborator

Perhaps a follow up but could we also extract this during build time? Because now we pay this cost during vllm help when we could avoid it all together.

@hmellor
Copy link
Member Author

hmellor commented Aug 27, 2025

Yeah this could be a build time constant. I'm not sure how it would work though (I've never tried to do it before, I'm sure it can be done)

@hmellor hmellor merged commit 513c1fe into vllm-project:main Aug 27, 2025
40 checks passed
@hmellor hmellor deleted the start-up-speed branch August 27, 2025 13:55
epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025
xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025
zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025
zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Sep 3, 2025
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants