-
-
Notifications
You must be signed in to change notification settings - Fork 10.6k
[P/D] KV Load Failure Recovery/Abort Configuration #26813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Will Eaton <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a configurable policy for handling KV cache load failures, allowing operators to choose between recomputing failed blocks or aborting the request. The implementation involves adding a new FinishReason.ERROR
and RequestStatus.FINISHED_ERROR
, updating the scheduler to handle the new policy, and propagating the error up to the OpenAI API layer to return an appropriate error to the client.
The changes are well-structured. However, I've found one critical issue where an internal data structure (FINISH_REASON_STRINGS
) was not updated to reflect the new error state, which will lead to an IndexError
and a server crash when an error needs to be reported through the API. Please see the detailed comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Signed-off-by: Will Eaton <[email protected]>
Signed-off-by: Will Eaton <[email protected]>
7b72907
to
755e628
Compare
Signed-off-by: Will Eaton <[email protected]>
return set() | ||
|
||
# --- Take action based on policy --- | ||
if abort: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Undecided on extracting this branch into a helper method
@njhill @NickLucche this is ready for review, also cc @sdavidbd since it interacts with the block level recovery mechanism |
Signed-off-by: Will Eaton <[email protected]>
Purpose
In some situations an operator may not want to allow KV load failure recovery to result in a local prefill on a decode node at all costs. This provides plumbing to make KV load failures bubble up to the api server as a 502 Bad Gateway, that can be properly handled at the proxy layer in a P/D setup.
We introduce a new
FINISHED_ERROR
RequestStatus
that the API server process can catch to throw the correct semantic error.