Skip to content

Setting for estimated shard heap allocation decider #128722

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jun 17, 2025

Conversation

ywangd
Copy link
Member

@ywangd ywangd commented Jun 2, 2025

This PR adds a new setting to toggle the collection for shard heap usages as well as wiring ShardHeapUsage into ClusterInfoSimulator.

Relates: #128723

@elasticsearchmachine elasticsearchmachine added v9.1.0 serverless-linked Added by automation, don't add manually labels Jun 2, 2025
@ywangd ywangd added the :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) label Jun 2, 2025
@ywangd ywangd requested a review from nicktindall June 12, 2025 09:59
@ywangd ywangd changed the title Stub a heap usage field on ClusterInfo Setting for estimated shard heap allocation decider Jun 12, 2025
@ywangd ywangd marked this pull request as ready for review June 12, 2025 10:02
@ywangd ywangd requested a review from a team as a code owner June 12, 2025 10:02
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Coordination Meta label for Distributed Coordination team label Jun 12, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

}
} else {
logger.trace("skipping collecting shard heap usage from cluster, notifying listeners with empty shard heap usage");
shardHeapUsagePerNode = Map.of();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This complexity looks a bit nasty, but I can't think of a way to make it nicer.

I wonder if it would look better if we made the decisions up front e.g.

var fetchIndicesStats = diskThresholdEnabled;
var fetchNodeStates = diskThresholdEnabled || shardThresholdEnabled;
var fetchShardHeapUsage = shardThresholdEnaled;

probably wouldn't make much difference.

What about if we had something like...

enum Requirement {
   NodeStats,
   IndicesStats,
   ShardHeapUsage
}

record Feature(Set<Requirement> requirements) {
}

Feature DISK_THRESHOLD = new Feature(Set.of(NodeStats, IndicesStats));
Feature SHARD_HEAP = new Feature(Set.if(NodeStats, ShardHeapUsage));

// then

    // ...
    Set<Requirement> requirements = enabledFeatures.stream().flatMap(feat -> feat.requirements().stream()).toSet();
    // ...
    if (requirements.contains(NodeStats) {
       try (var ignored = threadPool.getThreadContext().clearTraceContext()) {
             fetchNodeStats();
       }
    }

You could argue its overkill now, but if we keep doing this stuff it'll make things easier to read.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about the 2nd suggestion. At this point, it seems adding to the complexity, e.g. it still needs the three if/else statements at the end. Maybe it can be better hidden if Feature somehow handles the fetches internally. But that seems a bit too much for what we need here. So I pushed d3dd95a to extract each if/else branch into its own method. This is somewhat aligned with your first suggestion. Though it does not reduce the complexity, I feel the readability is better since it seems clearer about when a particular stats need to be fetched. Please let me know if that works for you. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that improves the readability, thanks.

Copy link
Contributor

@nicktindall nicktindall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with some minor nitpicking

@ywangd ywangd added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Jun 17, 2025
@elasticsearchmachine elasticsearchmachine merged commit adf4d10 into elastic:main Jun 17, 2025
24 checks passed
@ywangd ywangd deleted the ES-11445-cluster-info-stub branch June 17, 2025 03:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >non-issue serverless-linked Added by automation, don't add manually Team:Distributed Coordination Meta label for Distributed Coordination team v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants