Skip to content

Conversation

wseaton
Copy link
Contributor

@wseaton wseaton commented Oct 14, 2025

Purpose

In some situations an operator may not want to allow KV load failure recovery to result in a local prefill on a decode node at all costs. This provides plumbing to make KV load failures bubble up to the api server as a 502 Bad Gateway, that can be properly handled at the proxy layer in a P/D setup.

We introduce a new FINISHED_ERROR RequestStatus that the API server process can catch to throw the correct semantic error.

Signed-off-by: Will Eaton <[email protected]>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a configurable policy for handling KV cache load failures, allowing operators to choose between recomputing failed blocks or aborting the request. The implementation involves adding a new FinishReason.ERROR and RequestStatus.FINISHED_ERROR, updating the scheduler to handle the new policy, and propagating the error up to the OpenAI API layer to return an appropriate error to the client.

The changes are well-structured. However, I've found one critical issue where an internal data structure (FINISH_REASON_STRINGS) was not updated to reflect the new error state, which will lead to an IndexError and a server crash when an error needs to be reported through the API. Please see the detailed comment.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

@wseaton wseaton force-pushed the configurable-prefill-recovery branch from 7b72907 to 755e628 Compare October 14, 2025 17:58
Signed-off-by: Will Eaton <[email protected]>
return set()

# --- Take action based on policy ---
if abort:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Undecided on extracting this branch into a helper method

@wseaton
Copy link
Contributor Author

wseaton commented Oct 14, 2025

@njhill @NickLucche this is ready for review, also cc @sdavidbd since it interacts with the block level recovery mechanism

Signed-off-by: Will Eaton <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant