-
Notifications
You must be signed in to change notification settings - Fork 25.2k
ES-10063 Add multi-project support for more stats APIs #127650
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
ES-10063 Add multi-project support for more stats APIs #127650
Conversation
This affects the following APIs: - `GET _nodes/stats`: - For `indices`, it now prefixes the index name with the project ID (for non-default projects). Previously, it didn't tell you which project an index was in, and it failed if two projects had the same index name. - For `ingest`, it now gets the pipeline and processor stats for all projects, and prefixes the pipeline ID with the project ID. Previously, it only got them for the default project. - `GET /_cluster/stats`: - For `ingest`, it now aggregates the pipeline and processor stats for all projects. Previously, it only got them for the default project. - `GET /_info`: - For `ingest`, same as for `GET /_nodes/stats`. This is done by making `IndicesService.stats()` and `IngestService.stats()` include project IDs in the `NodeIndicesStats` and `IngestStats` objects they return, and making those stats objects incorporate the project IDs when converting to XContent. The transitive callers of these two methods are rather extensive (including all callers to `NodeService.stats()`, all callers of `TransportNodesStatsAction`, and so on). To ensure the change is safe, the callers were all checked out, and they fall into the following cases: - The behaviour change is one of the desired enhancements described above. - There is no behaviour change because it was getting node stats but neither `indices` nor `ingest` stats were requested. - There is no behaviour change because it was getting `indices` and/or `ingest` stats but only using aggregate values. - In `MachineLearningUsageTransportAction` and `TransportGetTrainedModelsStatsAction`, the `IngestStats` returned will return stats from all projects instead of just the default with this change, but they have been changed to filter the non-default project stats out, so this change is a noop there. (These actions are not MP-ready yet.) - `MonitoringService` will be affected, but this is the legacy monitoring module which is not in use anywhere that MP is going to be enabled. (If anything, the behaviour is probably improved by this change, as it will now include project IDs, rather than producing ambiguous unqualified results and failing in the case of duplicates.)
Pinging @elastic/es-data-management (Team:Data Management) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for working on this, Pete! I did a first round of review by just looking at the production files. I'll have a look at the tests later on. This feels like the right direction; my comments are mostly small and/or thinking out loud.
server/src/main/java/org/elasticsearch/action/admin/cluster/stats/ClusterStatsNodes.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/action/admin/cluster/stats/ClusterStatsNodes.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/indices/NodeIndicesStats.java
Outdated
Show resolved
Hide resolved
if (Objects.equals(projectId, Metadata.DEFAULT_PROJECT_ID)) { | ||
return index.getName(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little bit on the fence about this approach. Thinking out loud: on the one hand, this maintains BwC with little effort. On the other hand, if we ever have indices in the default project this'll be a bit off. It's been mentioned a handful of times that we might want to consider (so, super hypothetically) having a shared cluster-level GeoIP index to avoid downloading the same DB over again. We shouldn't make this decision purely based on the GeoIP index, but the idea of the faint possibility of cluster-level indices is something to think about. At the same time, this approach doesn't "break" anything in that case either, it just makes the output inconsistent. So, we either try to account for that situation proactively now, or we decide to go with this approach and we solve that issue if/when it comes. Any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the options are:
- Include the
default/
prefix on every index in single-project mode. - Live with the fact that people will need to know that the absence of a prefix indicates the default project in multi-project mode.
- Somehow pass in information about whether we're in single- or multi-project mode. I guess that we could go back to making the map nullable and use the null value to indicate that we're in single-project mode and don't need the prefixes.
What I've currently implement is option 2. I think this is certainly better than option 1. But if we're seriously considering having indices in the default project when in MP mode (which I hadn't realized was a possibility) and if we think that people might not realize that no prefix means default (which I'm not sure about) then we could switch to option 3 easily enough. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On reflection, I think I'm leaning towards option 3.
This would also mean that, when deserializing an instance from an older node, we could make the map be null and so not show project prefixes: I think this would be valid, because I assume that we will never have to support BWC with nodes older than the current main
in MP clusters (there's no way that could possibly work!).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, ideally, I'd want to go with option 3 too (not showing the project ID in single-project mode and showing the project ID for all indices in multi-project mode), for consistency. But let's park this discussion for a bit until we resolve #127650 (comment) (or choose not to resolve it and live with it).
.../plugin/core/src/test/java/org/elasticsearch/xpack/core/datatiers/DataTierUsageFixtures.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some YAML tests from the ingest-common
module are muted in this file because they depend on stats APIs:
'^test/ingest/15_info_ingest/*', // uses cluster info API
'^test/ingest/70_bulk/Test bulk request with default pipeline', // uses stats API
'^test/ingest/70_bulk/Test bulk request without default pipeline', // uses stats API
However, because we decided to prefix the project ID, we won't be able to unmute those tests... I don't immediately have a good suggestion, but I thought I'd mention it so we're aware. If we decide that that is OK, we should move these lines to the bottom so they're next to the comment explaining why some stats API tests are muted in MP.
server/src/main/java/org/elasticsearch/indices/IndicesService.java
Outdated
Show resolved
Hide resolved
return indices.values() | ||
.stream() | ||
.map(AbstractIndexComponent::index) | ||
.collect(Collectors.toMap(index -> index, index -> clusterService.state().metadata().projectFor(index).id())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking out loud: by using Metadata#projectFor
, we throw an exception if the index is not found. I'm not familiar enough with the IndicesService
to know whether there is a possibility of the service being "behind" the state in ClusterService
. For instance, when an index is deleted from the cluster state, is the indices service updated atomically? Or if an index is added, is it first added to the cluster state or the IndicesService
? If you don't already have an answer to those questions (which is completely fine), I think it's worth having a look to be sure what kind of behavior we could see here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh. That's a good question.
Looking at the creation flow, I believe that creating the IndexService
and adding it to the indices
map in IndicesService
happens as part of an applyClusterState
method in a ClusterStateApplier
, and looking at ClusterApplierService
it seems that this will be invoked before the state
is updated. So it seems plausible that we could call IndicesService.stats
on another thread and observe it in a state where this fails for an index which is in the process of being created on this node.
At any rate, I think that good defensive coding principles dictate that we should make things work as well as possible in these circumstances.
I think the best option might be to switch to using lookupProject(Index)
. That means that we'll just omit the index from the map in that case. Then we have to figure out how to handle that. Right now, it would use null
for the project ID, which is probably confusing. We could make it use something like <unknown>
, which is sort of horrible but at least it's explicit. (We already have the possibility that the map is empty because it came from an older node, although at least that situation is only temporary.)
I don't like that, but I'm struggling to think of any better alternatives right now.
It would be simple if the IndexService
knew its project ID. I think you said it was a deliberate design decision not to do that... Or did was the deliberate decision just that Index
and/or IndexMetadata
wouldn't know their project ID? I've had a glance, and it would be a moderately messy change: there are dozens of code paths we'd have to wire the project ID through to get back to somewhere it's available (not counting tests).
The refactoring to ensure that things happen in a different order which guarantees it would be okay is waaaaay to large and risky to justify.
So I'm going with the <unknown>
option for now. But I'd be very interested to hear your thoughts!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(We already have the possibility that the map is empty because it came from an older node, although at least that situation is only temporary.)
N.B. If we go with the thing I called option 3 in the other thread, this comment will not be true. The only way that I think this the index could be missing from the map is if we hit this rare timing issue. So I think the question becomes: Do we want to do the plumbing to make an IndexService
know its project ID in order to avoid this? Or is there a better solution that I'm not seeing?
(There is, incidentally, a risk that the stats call could fail if there are two indices with the same name in different projects both of which hit this rare timing issue at the same time, causing a key collision. I'm inclined not to worry about that, as it would be transient, although it could be avoided if we just add a counter to the prefix.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall! I haven't gone over the tests yet, will run another pass for the tests tomorrow.
public NodeIndicesStats( | ||
CommonStats oldStats, | ||
Map<Index, CommonStats> statsByIndex, | ||
Map<Index, List<IndexShardStats>> statsByShard, | ||
Map<Index, ProjectId> projectsByIndex, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I like this approach of passing in the index to project map
Change suggested by Niels. Co-authored-by: Niels Bauman <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I just pushed a commit which responds to most of the comments here. There's some discussion below.
server/src/main/java/org/elasticsearch/action/admin/cluster/stats/ClusterStatsNodes.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/action/admin/cluster/stats/ClusterStatsNodes.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/indices/IndicesService.java
Outdated
Show resolved
Hide resolved
return indices.values() | ||
.stream() | ||
.map(AbstractIndexComponent::index) | ||
.collect(Collectors.toMap(index -> index, index -> clusterService.state().metadata().projectFor(index).id())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh. That's a good question.
Looking at the creation flow, I believe that creating the IndexService
and adding it to the indices
map in IndicesService
happens as part of an applyClusterState
method in a ClusterStateApplier
, and looking at ClusterApplierService
it seems that this will be invoked before the state
is updated. So it seems plausible that we could call IndicesService.stats
on another thread and observe it in a state where this fails for an index which is in the process of being created on this node.
At any rate, I think that good defensive coding principles dictate that we should make things work as well as possible in these circumstances.
I think the best option might be to switch to using lookupProject(Index)
. That means that we'll just omit the index from the map in that case. Then we have to figure out how to handle that. Right now, it would use null
for the project ID, which is probably confusing. We could make it use something like <unknown>
, which is sort of horrible but at least it's explicit. (We already have the possibility that the map is empty because it came from an older node, although at least that situation is only temporary.)
I don't like that, but I'm struggling to think of any better alternatives right now.
It would be simple if the IndexService
knew its project ID. I think you said it was a deliberate design decision not to do that... Or did was the deliberate decision just that Index
and/or IndexMetadata
wouldn't know their project ID? I've had a glance, and it would be a moderately messy change: there are dozens of code paths we'd have to wire the project ID through to get back to somewhere it's available (not counting tests).
The refactoring to ensure that things happen in a different order which guarantees it would be okay is waaaaay to large and risky to justify.
So I'm going with the <unknown>
option for now. But I'd be very interested to hear your thoughts!
server/src/main/java/org/elasticsearch/indices/NodeIndicesStats.java
Outdated
Show resolved
Hide resolved
if (Objects.equals(projectId, Metadata.DEFAULT_PROJECT_ID)) { | ||
return index.getName(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the options are:
- Include the
default/
prefix on every index in single-project mode. - Live with the fact that people will need to know that the absence of a prefix indicates the default project in multi-project mode.
- Somehow pass in information about whether we're in single- or multi-project mode. I guess that we could go back to making the map nullable and use the null value to indicate that we're in single-project mode and don't need the prefixes.
What I've currently implement is option 2. I think this is certainly better than option 1. But if we're seriously considering having indices in the default project when in MP mode (which I hadn't realized was a possibility) and if we think that people might not realize that no prefix means default (which I'm not sure about) then we could switch to option 3 easily enough. What do you think?
.../plugin/core/src/test/java/org/elasticsearch/xpack/core/datatiers/DataTierUsageFixtures.java
Outdated
Show resolved
Hide resolved
@@ -564,6 +571,15 @@ IndexShardStats indexShardStats(final IndicesService indicesService, final Index | |||
); | |||
} | |||
|
|||
private Map<Index, ProjectId> projectsByIndex() { | |||
Map<Index, ProjectId> map = new HashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit:
Map<Index, ProjectId> map = new HashMap<>(); | |
Map<Index, ProjectId> map = new HashMap<>(indices.size()); |
@@ -66,6 +70,7 @@ public class NodeIndicesStats implements Writeable, ChunkedToXContent { | |||
private final CommonStats stats; | |||
private final Map<Index, List<IndexShardStats>> statsByShard; | |||
private final Map<Index, CommonStats> statsByIndex; | |||
private final Map<Index, ProjectId> projectsByIndex; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a shame we have to pass in this map ourselves (and (de)serialize it), as it's pretty much just the Metadata.ProjectLookup
... It looks like we only need it for the XContent serialization, but I can't think of a way to pass the ProjectLookup
to that serialization method. I do think it's worth thinking about that some more (and maybe asking others for suggestions). This is going to result in a lot more serialization over the wire.
COMMON_STATS, | ||
Map.of(), | ||
indexStats, | ||
/* projectsByIndex is only used for rendering as XContent, which not needed here */ Map.of(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Teensiest nit ever:
/* projectsByIndex is only used for rendering as XContent, which not needed here */ Map.of(), | |
// projectsByIndex is only used for rendering as XContent, which not needed here | |
Map.of(), |
The syntax you used is not very common here (AFAIK) and it took me longer than I'm willing to admit to realize the parameter came after the comment.
This affects the following APIs:
GET _nodes/stats
:indices
, it now prefixes the index name with the project ID (for non-default projects). Previously, it didn't tell you which project an index was in, and it failed if two projects had the same index name.ingest
, it now gets the pipeline and processor stats for all projects, and prefixes the pipeline ID with the project ID. Previously, it only got them for the default project.GET /_cluster/stats
:ingest
, it now aggregates the pipeline and processor stats for all projects. Previously, it only got them for the default project.GET /_info
:ingest
, same as forGET /_nodes/stats
.It does not add multi-project support for the
repositories
section on_nodes/stats
, since repositories aren't MP-ready yet AFAIK.This is done by making
IndicesService.stats()
andIngestService.stats()
include project IDs in theNodeIndicesStats
andIngestStats
objects they return, and making those stats objects incorporate the project IDs when converting to XContent.The transitive callers of these two methods are rather extensive (including all callers to
NodeService.stats()
, all callers ofTransportNodesStatsAction
, and so on). To ensure the change is safe, the callers were all checked out, and they fall into the following cases:indices
noringest
stats were requested.indices
and/oringest
stats but only using aggregate values.MachineLearningUsageTransportAction
andTransportGetTrainedModelsStatsAction
, theIngestStats
returned will return stats from all projects instead of just the default with this change, but they have been changed to filter the non-default project stats out, so this change is a noop there. (These actions are not MP-ready yet.)MonitoringService
will be affected, but this is the legacy monitoring module which is not in use anywhere that MP is going to be enabled. (If anything, the behaviour is probably improved by this change, as it will now include project IDs, rather than producing ambiguous unqualified results and failing in the case of duplicates.)