Fix shard size of initializing restored shard #126783

DaveCTurner · 2025-04-14T15:00:07Z

For shards being restored from a snapshot we use SnapshotShardSizeInfo
to track their sizes while they're unassigned, and then use
ShardRouting#expectedShardSize when they start to recover. However we
were incorrectly ignoring the ShardRouting#expectedShardSize value
when accounting for the movements of shards in the
ClusterInfoSimulator, which would sometimes cause us to assign more
shards to a node than its disk space should have allowed.

Closes #105331

For shards being restored from a snapshot we use `SnapshotShardSizeInfo` to track their sizes while they're unassigned, and then use `ShardRouting#expectedShardSize` when they start to recover. However we were incorrectly ignoring the `ShardRouting#expectedShardSize` value when accounting for the movements of shards in the `ClusterInfoSimulator`, which would sometimes cause us to assign more shards to a node than its disk space should have allowed. Closes elastic#105331

elasticsearchmachine · 2025-04-14T15:00:32Z

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

elasticsearchmachine · 2025-04-14T15:00:32Z

Hi @DaveCTurner, I've created a changelog YAML for you.

DaveCTurner · 2025-04-14T15:03:13Z

...erTest/java/org/elasticsearch/cluster/routing/allocation/decider/DiskThresholdDeciderIT.java

@@ -230,6 +224,110 @@ public void testRestoreSnapshotAllocationDoesNotExceedWatermarkWithMultipleShard
        assertBusyWithDiskUsageRefresh(dataNode0Id, indexName, contains(in(shardSizes.getShardIdsWithSizeSmallerOrEqual(usableSpace))));
    }

+    public void testRestoreSnapshotAllocationDoesNotExceedWatermarkWithMultipleRestores() {


testRestoreSnapshotAllocationDoesNotExceedWatermarkWithMultipleShards was flaky because of this bug, but only failing once every few hundred iterations. This test fails more reliably for the same reason, although still not all that reliably (after around 20-30 iterations on my laptop). I could make it exercise the exact path that hits the bug every time, but it'd be very specific to this one bug and I'd rather have something a little more general to look out for related bugs too.

So testRestoreSnapshotAllocationDoesNotExceedWatermarkWithMultipleShards does a little more disk usage checking as well as the shard assignment. This new test checks that shards are assigned correctly, not so much disk usage.

They feel a little redundant: might we add a couple more touches to the new one and delete the old?

Yeah I was sort of inclined to keep them both but you're right, we're not really testing anything different in the old test.

Great, thanks for updating

DaveCTurner · 2025-04-14T15:03:46Z

server/src/main/java/org/elasticsearch/cluster/ClusterInfoSimulator.java

@@ -92,7 +92,7 @@ public void simulateShardStarted(ShardRouting shard) {
        var project = allocation.metadata().projectFor(shard.index());
        var size = getExpectedShardSize(
            shard,
-            UNAVAILABLE_EXPECTED_SHARD_SIZE,
+            shard.getExpectedShardSize(),


One-line fix \o/

…tRestoreSnapshotAllocationDoesNotExceedWatermarkWithMultipleShards

JeremyDahlgren

LGTM

...erTest/java/org/elasticsearch/cluster/routing/allocation/decider/DiskThresholdDeciderIT.java

…routing/allocation/decider/DiskThresholdDeciderIT.java Co-authored-by: Jeremy Dahlgren <[email protected]>

…tRestoreSnapshotAllocationDoesNotExceedWatermarkWithMultipleShards

DiannaHohensee

I think the changes look good, I'm good with shipping it as is. Though I do wonder if we could delete the old test in favor of the new -- per comment.

DiannaHohensee · 2025-04-15T14:49:14Z

...erTest/java/org/elasticsearch/cluster/routing/allocation/decider/DiskThresholdDeciderIT.java

+        // set up a listener that explicitly forbids more than one shard to be assigned to the tiny node
+        final String dataNodeId = internalCluster().getInstance(NodeEnvironment.class, dataNodeName).nodeId();
+        final var allShardsActiveListener = ClusterServiceUtils.addTemporaryStateListener(cs -> {
+            assertThat(cs.getRoutingNodes().toString(), cs.getRoutingNodes().node(dataNodeId).size(), lessThanOrEqualTo(1));


Oooph, RoutingNode#size() is an unhelpful method name 😓

Naming things is hard, but yeah this is not good. At least it has Javadocs :)

DiannaHohensee · 2025-04-15T14:57:00Z

...erTest/java/org/elasticsearch/cluster/routing/allocation/decider/DiskThresholdDeciderIT.java

+                clusterAdmin().prepareRestoreSnapshot(TEST_REQUEST_TIMEOUT, "repo", "snap")
+                    .setWaitForCompletion(true)
+                    .setRenamePattern(indexName)
+                    .setRenameReplacement(indexName + "-copy")


Huh, is this purely a test-only feature? Doesn't look like it's used anyplace else.

(not actionable, I'm just surprised)

We use the feature in production, see here:

elasticsearch/server/src/main/java/org/elasticsearch/action/admin/cluster/snapshots/restore/RestoreSnapshotRequest.java

Line 564 in 77ef1d4

renameReplacement((String) entry.getValue());

We just don't use the RestoreSnapshotRequestBuilder to build a RestoreSnapshotRequest anywhere, instead building the request directly since it's all mutable anyway. Not a pattern I like, but one that is going to take a long time to completely eliminate.

Ohh 🤔 Got it 👍

DiannaHohensee · 2025-04-15T16:09:11Z

...erTest/java/org/elasticsearch/cluster/routing/allocation/decider/DiskThresholdDeciderIT.java

@@ -230,6 +224,110 @@ public void testRestoreSnapshotAllocationDoesNotExceedWatermarkWithMultipleShard
        assertBusyWithDiskUsageRefresh(dataNode0Id, indexName, contains(in(shardSizes.getShardIdsWithSizeSmallerOrEqual(usableSpace))));
    }

+    public void testRestoreSnapshotAllocationDoesNotExceedWatermarkWithMultipleRestores() {


So testRestoreSnapshotAllocationDoesNotExceedWatermarkWithMultipleShards does a little more disk usage checking as well as the shard assignment. This new test checks that shards are assigned correctly, not so much disk usage.

They feel a little redundant: might we add a couple more touches to the new one and delete the old?

…tRestoreSnapshotAllocationDoesNotExceedWatermarkWithMultipleShards

For shards being restored from a snapshot we use `SnapshotShardSizeInfo` to track their sizes while they're unassigned, and then use `ShardRouting#expectedShardSize` when they start to recover. However we were incorrectly ignoring the `ShardRouting#expectedShardSize` value when accounting for the movements of shards in the `ClusterInfoSimulator`, which would sometimes cause us to assign more shards to a node than its disk space should have allowed. Closes elastic#105331

elasticsearchmachine · 2025-04-22T17:09:59Z

💚 Backport successful

Status	Branch	Result
✅	8.18
✅	8.x
✅	9.0

* Fix shard size of initializing restored shard (#126783) For shards being restored from a snapshot we use `SnapshotShardSizeInfo` to track their sizes while they're unassigned, and then use `ShardRouting#expectedShardSize` when they start to recover. However we were incorrectly ignoring the `ShardRouting#expectedShardSize` value when accounting for the movements of shards in the `ClusterInfoSimulator`, which would sometimes cause us to assign more shards to a node than its disk space should have allowed. Closes #105331 * Backport utils from 4009599

* Fix shard size of initializing restored shard (#126783) For shards being restored from a snapshot we use `SnapshotShardSizeInfo` to track their sizes while they're unassigned, and then use `ShardRouting#expectedShardSize` when they start to recover. However we were incorrectly ignoring the `ShardRouting#expectedShardSize` value when accounting for the movements of shards in the `ClusterInfoSimulator`, which would sometimes cause us to assign more shards to a node than its disk space should have allowed. Closes #105331 * Missing throws

* Fix shard size of initializing restored shard (#126783) For shards being restored from a snapshot we use `SnapshotShardSizeInfo` to track their sizes while they're unassigned, and then use `ShardRouting#expectedShardSize` when they start to recover. However we were incorrectly ignoring the `ShardRouting#expectedShardSize` value when accounting for the movements of shards in the `ClusterInfoSimulator`, which would sometimes cause us to assign more shards to a node than its disk space should have allowed. Closes #105331 * Backport utils from 4009599 * Missing throws

DaveCTurner added >bug :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) auto-backport Automatically create backport pull requests when merged v8.18.1 v8.19.0 v9.0.1 v9.1.0 labels Apr 14, 2025

DaveCTurner requested review from DiannaHohensee and JeremyDahlgren April 14, 2025 15:00

elasticsearchmachine added the Team:Distributed Coordination Meta label for Distributed Coordination team label Apr 14, 2025

Update docs/changelog/126783.yaml

dac2417

DaveCTurner commented Apr 14, 2025

View reviewed changes

elasticsearchmachine and others added 2 commits April 14, 2025 15:20

[CI] Auto commit changes from spotless

8811ec5

Merge branch 'main' into 2025/04/11/105331-DiskThresholdDeciderIT-tes…

4e73411

…tRestoreSnapshotAllocationDoesNotExceedWatermarkWithMultipleShards

JeremyDahlgren approved these changes Apr 14, 2025

View reviewed changes

...erTest/java/org/elasticsearch/cluster/routing/allocation/decider/DiskThresholdDeciderIT.java Outdated Show resolved Hide resolved

...erTest/java/org/elasticsearch/cluster/routing/allocation/decider/DiskThresholdDeciderIT.java Outdated Show resolved Hide resolved

DaveCTurner and others added 2 commits April 14, 2025 22:30

Update server/src/internalClusterTest/java/org/elasticsearch/cluster/…

f36ed4b

…routing/allocation/decider/DiskThresholdDeciderIT.java Co-authored-by: Jeremy Dahlgren <[email protected]>

Merge branch 'main' into 2025/04/11/105331-DiskThresholdDeciderIT-tes…

444b929

…tRestoreSnapshotAllocationDoesNotExceedWatermarkWithMultipleShards

DiannaHohensee approved these changes Apr 15, 2025

View reviewed changes

DaveCTurner added 2 commits April 17, 2025 09:18

Merge branch 'main' into 2025/04/11/105331-DiskThresholdDeciderIT-tes…

a5fd67a

…tRestoreSnapshotAllocationDoesNotExceedWatermarkWithMultipleShards

Remove unnecessary test

85b4267

DaveCTurner added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Apr 17, 2025

DiannaHohensee approved these changes Apr 17, 2025

View reviewed changes

DaveCTurner added 2 commits April 17, 2025 19:58

Merge branch 'main' into 2025/04/11/105331-DiskThresholdDeciderIT-tes…

d47a190

…tRestoreSnapshotAllocationDoesNotExceedWatermarkWithMultipleShards

Merge branch 'main' into 2025/04/11/105331-DiskThresholdDeciderIT-tes…

014fac3

…tRestoreSnapshotAllocationDoesNotExceedWatermarkWithMultipleShards

elasticsearchmachine merged commit a5f935a into elastic:main Apr 22, 2025
17 checks passed

DaveCTurner deleted the 2025/04/11/105331-DiskThresholdDeciderIT-testRestoreSnapshotAllocationDoesNotExceedWatermarkWithMultipleShards branch April 22, 2025 17:08

DaveCTurner mentioned this pull request Apr 22, 2025

[8.18] Fix shard size of initializing restored shard (#126783) #127169

Merged

DaveCTurner mentioned this pull request Apr 22, 2025

[8.x] Fix shard size of initializing restored shard (#126783) #127170

Merged

DaveCTurner mentioned this pull request Apr 22, 2025

[9.0] Fix shard size of initializing restored shard (#126783) #127171

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix shard size of initializing restored shard #126783

Fix shard size of initializing restored shard #126783

DaveCTurner commented Apr 14, 2025

elasticsearchmachine commented Apr 14, 2025

elasticsearchmachine commented Apr 14, 2025

DaveCTurner Apr 14, 2025

DiannaHohensee Apr 15, 2025

DaveCTurner Apr 17, 2025

DiannaHohensee Apr 17, 2025

DaveCTurner Apr 14, 2025

JeremyDahlgren left a comment

DiannaHohensee left a comment

DiannaHohensee Apr 15, 2025

DaveCTurner Apr 17, 2025

DiannaHohensee Apr 15, 2025

DaveCTurner Apr 17, 2025

DiannaHohensee Apr 17, 2025

DiannaHohensee Apr 15, 2025

elasticsearchmachine commented Apr 22, 2025

Fix shard size of initializing restored shard #126783

Fix shard size of initializing restored shard #126783

Conversation

DaveCTurner commented Apr 14, 2025

elasticsearchmachine commented Apr 14, 2025

elasticsearchmachine commented Apr 14, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JeremyDahlgren left a comment

Choose a reason for hiding this comment

DiannaHohensee left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elasticsearchmachine commented Apr 22, 2025

💚 Backport successful