-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Fix shard size of initializing restored shard #126783
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix shard size of initializing restored shard #126783
Conversation
For shards being restored from a snapshot we use `SnapshotShardSizeInfo` to track their sizes while they're unassigned, and then use `ShardRouting#expectedShardSize` when they start to recover. However we were incorrectly ignoring the `ShardRouting#expectedShardSize` value when accounting for the movements of shards in the `ClusterInfoSimulator`, which would sometimes cause us to assign more shards to a node than its disk space should have allowed. Closes elastic#105331
Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination) |
Hi @DaveCTurner, I've created a changelog YAML for you. |
@@ -230,6 +224,110 @@ public void testRestoreSnapshotAllocationDoesNotExceedWatermarkWithMultipleShard | |||
assertBusyWithDiskUsageRefresh(dataNode0Id, indexName, contains(in(shardSizes.getShardIdsWithSizeSmallerOrEqual(usableSpace)))); | |||
} | |||
|
|||
public void testRestoreSnapshotAllocationDoesNotExceedWatermarkWithMultipleRestores() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
testRestoreSnapshotAllocationDoesNotExceedWatermarkWithMultipleShards
was flaky because of this bug, but only failing once every few hundred iterations. This test fails more reliably for the same reason, although still not all that reliably (after around 20-30 iterations on my laptop). I could make it exercise the exact path that hits the bug every time, but it'd be very specific to this one bug and I'd rather have something a little more general to look out for related bugs too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So testRestoreSnapshotAllocationDoesNotExceedWatermarkWithMultipleShards does a little more disk usage checking as well as the shard assignment. This new test checks that shards are assigned correctly, not so much disk usage.
They feel a little redundant: might we add a couple more touches to the new one and delete the old?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I was sort of inclined to keep them both but you're right, we're not really testing anything different in the old test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks for updating
@@ -92,7 +92,7 @@ public void simulateShardStarted(ShardRouting shard) { | |||
var project = allocation.metadata().projectFor(shard.index()); | |||
var size = getExpectedShardSize( | |||
shard, | |||
UNAVAILABLE_EXPECTED_SHARD_SIZE, | |||
shard.getExpectedShardSize(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One-line fix \o/
…tRestoreSnapshotAllocationDoesNotExceedWatermarkWithMultipleShards
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
...erTest/java/org/elasticsearch/cluster/routing/allocation/decider/DiskThresholdDeciderIT.java
Outdated
Show resolved
Hide resolved
...erTest/java/org/elasticsearch/cluster/routing/allocation/decider/DiskThresholdDeciderIT.java
Outdated
Show resolved
Hide resolved
…routing/allocation/decider/DiskThresholdDeciderIT.java Co-authored-by: Jeremy Dahlgren <[email protected]>
…tRestoreSnapshotAllocationDoesNotExceedWatermarkWithMultipleShards
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the changes look good, I'm good with shipping it as is. Though I do wonder if we could delete the old test in favor of the new -- per comment.
// set up a listener that explicitly forbids more than one shard to be assigned to the tiny node | ||
final String dataNodeId = internalCluster().getInstance(NodeEnvironment.class, dataNodeName).nodeId(); | ||
final var allShardsActiveListener = ClusterServiceUtils.addTemporaryStateListener(cs -> { | ||
assertThat(cs.getRoutingNodes().toString(), cs.getRoutingNodes().node(dataNodeId).size(), lessThanOrEqualTo(1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oooph, RoutingNode#size() is an unhelpful method name 😓
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naming things is hard, but yeah this is not good. At least it has Javadocs :)
clusterAdmin().prepareRestoreSnapshot(TEST_REQUEST_TIMEOUT, "repo", "snap") | ||
.setWaitForCompletion(true) | ||
.setRenamePattern(indexName) | ||
.setRenameReplacement(indexName + "-copy") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, is this purely a test-only feature? Doesn't look like it's used anyplace else.
(not actionable, I'm just surprised)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use the feature in production, see here:
Line 564 in 77ef1d4
renameReplacement((String) entry.getValue()); |
We just don't use the RestoreSnapshotRequestBuilder
to build a RestoreSnapshotRequest
anywhere, instead building the request directly since it's all mutable anyway. Not a pattern I like, but one that is going to take a long time to completely eliminate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohh 🤔 Got it 👍
@@ -230,6 +224,110 @@ public void testRestoreSnapshotAllocationDoesNotExceedWatermarkWithMultipleShard | |||
assertBusyWithDiskUsageRefresh(dataNode0Id, indexName, contains(in(shardSizes.getShardIdsWithSizeSmallerOrEqual(usableSpace)))); | |||
} | |||
|
|||
public void testRestoreSnapshotAllocationDoesNotExceedWatermarkWithMultipleRestores() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So testRestoreSnapshotAllocationDoesNotExceedWatermarkWithMultipleShards does a little more disk usage checking as well as the shard assignment. This new test checks that shards are assigned correctly, not so much disk usage.
They feel a little redundant: might we add a couple more touches to the new one and delete the old?
…tRestoreSnapshotAllocationDoesNotExceedWatermarkWithMultipleShards
…tRestoreSnapshotAllocationDoesNotExceedWatermarkWithMultipleShards
…tRestoreSnapshotAllocationDoesNotExceedWatermarkWithMultipleShards
For shards being restored from a snapshot we use `SnapshotShardSizeInfo` to track their sizes while they're unassigned, and then use `ShardRouting#expectedShardSize` when they start to recover. However we were incorrectly ignoring the `ShardRouting#expectedShardSize` value when accounting for the movements of shards in the `ClusterInfoSimulator`, which would sometimes cause us to assign more shards to a node than its disk space should have allowed. Closes elastic#105331
For shards being restored from a snapshot we use `SnapshotShardSizeInfo` to track their sizes while they're unassigned, and then use `ShardRouting#expectedShardSize` when they start to recover. However we were incorrectly ignoring the `ShardRouting#expectedShardSize` value when accounting for the movements of shards in the `ClusterInfoSimulator`, which would sometimes cause us to assign more shards to a node than its disk space should have allowed. Closes elastic#105331
For shards being restored from a snapshot we use `SnapshotShardSizeInfo` to track their sizes while they're unassigned, and then use `ShardRouting#expectedShardSize` when they start to recover. However we were incorrectly ignoring the `ShardRouting#expectedShardSize` value when accounting for the movements of shards in the `ClusterInfoSimulator`, which would sometimes cause us to assign more shards to a node than its disk space should have allowed. Closes elastic#105331
* Fix shard size of initializing restored shard (#126783) For shards being restored from a snapshot we use `SnapshotShardSizeInfo` to track their sizes while they're unassigned, and then use `ShardRouting#expectedShardSize` when they start to recover. However we were incorrectly ignoring the `ShardRouting#expectedShardSize` value when accounting for the movements of shards in the `ClusterInfoSimulator`, which would sometimes cause us to assign more shards to a node than its disk space should have allowed. Closes #105331 * Backport utils from 4009599
* Fix shard size of initializing restored shard (#126783) For shards being restored from a snapshot we use `SnapshotShardSizeInfo` to track their sizes while they're unassigned, and then use `ShardRouting#expectedShardSize` when they start to recover. However we were incorrectly ignoring the `ShardRouting#expectedShardSize` value when accounting for the movements of shards in the `ClusterInfoSimulator`, which would sometimes cause us to assign more shards to a node than its disk space should have allowed. Closes #105331 * Missing throws
* Fix shard size of initializing restored shard (#126783) For shards being restored from a snapshot we use `SnapshotShardSizeInfo` to track their sizes while they're unassigned, and then use `ShardRouting#expectedShardSize` when they start to recover. However we were incorrectly ignoring the `ShardRouting#expectedShardSize` value when accounting for the movements of shards in the `ClusterInfoSimulator`, which would sometimes cause us to assign more shards to a node than its disk space should have allowed. Closes #105331 * Backport utils from 4009599 * Missing throws
For shards being restored from a snapshot we use
SnapshotShardSizeInfo
to track their sizes while they're unassigned, and then use
ShardRouting#expectedShardSize
when they start to recover. However wewere incorrectly ignoring the
ShardRouting#expectedShardSize
valuewhen accounting for the movements of shards in the
ClusterInfoSimulator
, which would sometimes cause us to assign moreshards to a node than its disk space should have allowed.
Closes #105331