Skip to content

[Serve] Prioritize stopping most recently scaled-up replicas during downscaling #52929

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

ktyxx
Copy link

@ktyxx ktyxx commented May 10, 2025

Why are these changes needed?

This PR improves the downscaling behavior in Ray Serve by modifying the logic in _get_replicas_to_stop() within Default DeploymentScheduler.

Previously, the scheduler selected replicas to stop by traversing the least loaded nodes in ascending order. This often resulted in stopping replicas that had been scheduled earlier and placed optimally using the _best_fit_node() strategy.

This led to several drawbacks:

  • Long-lived replicas, which were scheduled on best-fit nodes, were removed first — leading to inefficient reuse of resources.
  • Recently scaled-up replicas, which were placed on less utilized nodes, were kept longer despite being suboptimal.
  • Cold-start overhead increased, as newer replicas were removed before fully warming up.

This PR reverses the node traversal order during downscaling so that more recently added replicas are prioritized for termination, in cases where other conditions (e.g., running state and number of replicas per node) are equal. These newer replicas are typically less optimal in placement and not yet fully warmed up.

Preserving long-lived replicas improves performance stability and reduces unnecessary resource fragmentation.

Related issue number

N/A

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@hainesmichaelc hainesmichaelc added the community-contribution Contributed by the community label May 12, 2025
@masoudcharkhabi masoudcharkhabi added serve Ray Serve Related Issue stability labels May 12, 2025
@ktyxx ktyxx force-pushed the fix-replica-scale-down-order branch from b081d11 to 02a57df Compare May 13, 2025 09:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-contribution Contributed by the community serve Ray Serve Related Issue stability
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants