Use exponential_backoff_retry for completion call #8023
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Some users reported that their programs were stuck due to Rate limit errors. This PR configures exponential backoff retry for LiteLLM completion call since LiteLLM uses constant backoff even for rate limits (ref), which is ineffective.
One tradeoff here is that we will start using exponential backoff for other types of exceptions (e.g. internal server error) after this change. LiteLLM has a smart logic for async completion that it switches to exponential backoff only for RateLimitError (ref), but this does not exist for sync completion. Therefore, another solution is that we file a PR to LiteLLM side to implement the logic for sync completion to use exponential backoff only to RateLimitError.