[Serve] Prioritize stopping most recently scaled-up replicas during downscaling #52929

ktyxx · 2025-05-10T06:51:17Z

Why are these changes needed?

This PR improves the downscaling behavior in Ray Serve by modifying the logic in _get_replicas_to_stop() within Default DeploymentScheduler.

Previously, the scheduler selected replicas to stop by traversing the least loaded nodes in ascending order. This often resulted in stopping replicas that had been scheduled earlier and placed optimally using the _best_fit_node() strategy.

This led to several drawbacks:

Long-lived replicas, which were scheduled on best-fit nodes, were removed first — leading to inefficient reuse of resources.
Recently scaled-up replicas, which were placed on less utilized nodes, were kept longer despite being suboptimal.
Cold-start overhead increased, as newer replicas were removed before fully warming up.

This PR reverses the node traversal order during downscaling so that more recently added replicas are prioritized for termination, in cases where other conditions (e.g., running state and number of replicas per node) are equal. These newer replicas are typically less optimal in placement and not yet fully warmed up.

Preserving long-lived replicas improves performance stability and reduces unnecessary resource fragmentation.

Related issue number

N/A

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: kitae <[email protected]>

fix: prefer stopping most recently scaled-up replicas

6f50cbf

Signed-off-by: kitae <[email protected]>

hainesmichaelc added the community-contribution Contributed by the community label May 12, 2025

masoudcharkhabi added serve Ray Serve Related Issue stability labels May 12, 2025

fix: update test expectations for replica downscale order change

02a57df

Signed-off-by: kitae <[email protected]>

ktyxx force-pushed the fix-replica-scale-down-order branch from b081d11 to 02a57df Compare May 13, 2025 09:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Serve] Prioritize stopping most recently scaled-up replicas during downscaling #52929

[Serve] Prioritize stopping most recently scaled-up replicas during downscaling #52929

ktyxx commented May 10, 2025

[Serve] Prioritize stopping most recently scaled-up replicas during downscaling #52929

Are you sure you want to change the base?

[Serve] Prioritize stopping most recently scaled-up replicas during downscaling #52929

Conversation

ktyxx commented May 10, 2025

Why are these changes needed?

Related issue number

Checks