[KV Connector] More async support for `get_num_new_matched_tokens` #23620

ApostaC · 2025-08-26T05:32:41Z

Inspired by the Dynamo team, this PR adds support for the "async remote lookup" into the connector API.

Purpose

For remote KV cache connectors, it may need some RPC calls to look up the number of remote hit tokens. However, this may slow down the scheduler loop and cause performance degradation on prefill heavy workloads.

This PR gets the first step towards addressing this problem -- it introduces a new semantics of "try again" in get_num_new_matched_tokens function.

When calling the function, it may return None as the number of matched tokens, indicating that the connector needs more time to look up the hit tokens for this request. In this case, the scheduler should schedule other requests first.

Test Plan

There is a PoC showing how much scheduling overhead this PR can reduce in #23622

Test Result

see #23622

Follow-ups

[KV Connector] Async lookup policy support for MultiConnector ([Feature][KV Connector]: Async lookup policy support for MultiConnector #24059).

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: ApostaC <[email protected]>

gemini-code-assist

Code Review

This pull request introduces an important enhancement to the KV connector API by allowing get_num_new_matched_tokens to return None, indicating an asynchronous lookup is in progress. This change will help prevent the scheduler from being blocked by remote KV cache lookups. The implementation in the scheduler to handle this new None state appears correct. However, I've found a critical issue in MultiConnector where the change in the return type of get_num_new_matched_tokens was not correctly handled in its implementation, which will lead to a TypeError. Please see my detailed comment.

vllm/distributed/kv_transfer/kv_connector/v1/multi_connector.py

Signed-off-by: ApostaC <[email protected]>

ApostaC · 2025-08-26T06:05:49Z

@njhill @robertgshaw2-redhat This is the one I mentioned today. Relatively simple change just to open up the new semantics.
Feel free to review and leave your comments, thanks!

njhill

Thanks @ApostaC, LGTM.

Would be good to get @ryanolson's thoughts too

njhill · 2025-08-26T22:33:49Z

vllm/distributed/kv_transfer/kv_connector/v1/multi_connector.py

+            # If there is a connector still looking up the matches,
+            # we return None to indicate that we are not done yet.
+            if toks is None:
+                return (None, False)


This is reasonable but in theory you might prefer a different policy like prefer tokens from a connector that doesn't require a lookup delay.

Yeah, agree with it!
Just to keep this PR simple, I created a follow-up issue (#24059) for this and will add a follow-up PR once this is merged. So far, the code here would not have any impact since none of the connectors would return None for now.

vllm/v1/core/sched/scheduler.py

Signed-off-by: ApostaC <[email protected]>

ApostaC · 2025-09-04T23:23:23Z

@njhill Hey Nick, do we want to merge this anytime soon, or do we still want to wait for anything?

njhill

LGTM, thanks @ApostaC. I've pinged NVIDIA folks again, we might want to hold off a bit longer in case they have comments.

ptarasiewiczNV · 2025-09-09T15:54:29Z

LGTM as well!

…llm-project#23620) Signed-off-by: ApostaC <[email protected]>

…llm-project#23620) Signed-off-by: ApostaC <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

[Add] async support for get_num_new_matched_tokens

02a3a82

Signed-off-by: ApostaC <[email protected]>

ApostaC requested review from WoosukKwon, alexm-redhat, comaniac, njhill, robertgshaw2-redhat and ywang96 as code owners August 26, 2025 05:32

mergify bot added the v1 label Aug 26, 2025

gemini-code-assist bot reviewed Aug 26, 2025

View reviewed changes

vllm/distributed/kv_transfer/kv_connector/v1/multi_connector.py Show resolved Hide resolved

ApostaC mentioned this pull request Aug 26, 2025

[KV Connector][Don't merge] PoC for the async connector lookup functionality #23622

Draft

5 tasks

[Add] fix for multi-connector semantics

6126633

Signed-off-by: ApostaC <[email protected]>

njhill reviewed Aug 26, 2025

View reviewed changes

add identation for new scheduler code

22beca6

Signed-off-by: ApostaC <[email protected]>

ApostaC mentioned this pull request Sep 1, 2025

[Feature][KV Connector]: Async lookup policy support for MultiConnector #24059

Open

1 task

Merge branch 'main' into local-dev/async-lookup

75570aa

njhill approved these changes Sep 5, 2025

View reviewed changes

njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 5, 2025

YaoJiayi mentioned this pull request Sep 6, 2025

Async KV loading LMCache/LMCache#1513

Merged

4 tasks

ApostaC added 4 commits September 7, 2025 17:45

Merge branch 'main' into local-dev/async-lookup

f9ef534

Merge branch 'main' into local-dev/async-lookup

42ffd86

Merge branch 'main' into local-dev/async-lookup

1bf9f77

Merge branch 'main' into local-dev/async-lookup

091a1fa

simon-mo merged commit b4a01aa into vllm-project:main Sep 10, 2025
40 of 43 checks passed

skyloevil pushed a commit to skyloevil/vllm that referenced this pull request Sep 13, 2025

[KV Connector] More async support for get_num_new_matched_tokens (v…

983f717

…llm-project#23620) Signed-off-by: ApostaC <[email protected]>

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

[KV Connector] More async support for get_num_new_matched_tokens (v…

e1db7e4

…llm-project#23620) Signed-off-by: ApostaC <[email protected]>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025

[KV Connector] More async support for get_num_new_matched_tokens (v…

fb2fc18

…llm-project#23620) Signed-off-by: ApostaC <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[KV Connector] More async support for `get_num_new_matched_tokens` #23620

[KV Connector] More async support for `get_num_new_matched_tokens` #23620

Uh oh!

ApostaC commented Aug 26, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

ApostaC commented Aug 26, 2025

Uh oh!

njhill left a comment

Uh oh!

njhill Aug 26, 2025

Uh oh!

ApostaC Sep 1, 2025 •

edited

Loading

Uh oh!

Uh oh!

ApostaC commented Sep 4, 2025

Uh oh!

njhill left a comment

Uh oh!

ptarasiewiczNV commented Sep 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

[KV Connector] More async support for get_num_new_matched_tokens #23620

[KV Connector] More async support for get_num_new_matched_tokens #23620

Uh oh!

Conversation

ApostaC commented Aug 26, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Follow-ups

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

ApostaC commented Aug 26, 2025

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

njhill Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

ApostaC Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ApostaC commented Sep 4, 2025

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

ptarasiewiczNV commented Sep 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[KV Connector] More async support for `get_num_new_matched_tokens` #23620

[KV Connector] More async support for `get_num_new_matched_tokens` #23620

ApostaC commented Aug 26, 2025 •

edited by github-actions bot

Loading

ApostaC Sep 1, 2025 •

edited

Loading