Skip to content

DRAFT: Consume fieldcaps node response while deserializing it #120010

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

original-brownbear
Copy link
Member

@original-brownbear original-brownbear commented Jan 11, 2025

Just to illustrate the point of where we can save memory here, the concrete API could be made nicer but functionally this thing seems fine and comes with a measurable performance boost from some quick and dirty benchmarking.

The API implemented here could definitely use some extra love. But the idea seems to work quite well according to some quick testing and this approach can potentially save a massive amount of memory not just in field caps.

Instead of deserializing a full list of per-index responses before getting to processing them, we should consume each response right after deserializing it without any intermediary list. This should be safe to do since we deserialize responses on the same thread that we consume the full response on today.
Not only does this save a lot of heap by merging responses directly, it also might save quite a bit of runtime from better cache locality. Also, having a stateful deserializer allows for a couple helpful tricks in around deduplication across multiple responses for fan-out style APIs as well.

The API implemented here could definitely use some extra love. But the idea seems to work quite well
according to some quick testing and this approach can potentially save a massive amount of memory
not just in field caps.

Instead of deserializing a full list of per-index responses before getting to processing them,
we should consume each response right after deserializing it without any intermediary list.
This should be safe to do since we deserialize responses on the same thread that we consume
the full response on today.
Not only does this save a lot of heap by merging responses directly, it also might save quite
a bit of runtime from better cache locality.
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Jan 11, 2025
This field is only used (by security) for requests, having it in responses is redundant.
Also, we have a couple of responses that are singletons/quasi-enums where setting the value
needlessly might introduce some strange contention even though it's a plain store.

This isn't just a cosmetic change. It makes it clear at compile time that each response instance
is exclusively defined by the bytes that it is read from. This makes it easier to reason about the
validity of suggested optimizations like elastic#120010
original-brownbear added a commit that referenced this pull request Mar 16, 2025
This field is only used (by security) for requests, having it in responses is redundant.
Also, we have a couple of responses that are singletons/quasi-enums where setting the value
needlessly might introduce some strange contention even though it's a plain store.

This isn't just a cosmetic change. It makes it clear at compile time that each response instance
is exclusively defined by the bytes that it is read from. This makes it easier to reason about the
validity of suggested optimizations like #120010
omricohenn pushed a commit to omricohenn/elasticsearch that referenced this pull request Mar 28, 2025
This field is only used (by security) for requests, having it in responses is redundant.
Also, we have a couple of responses that are singletons/quasi-enums where setting the value
needlessly might introduce some strange contention even though it's a plain store.

This isn't just a cosmetic change. It makes it clear at compile time that each response instance
is exclusively defined by the bytes that it is read from. This makes it easier to reason about the
validity of suggested optimizations like elastic#120010
Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kinda cute but also a bit of a hack :) Especially since the exception handling in TransportResponse#read is now a little tricky because of having to consume the whole response.

Makes you wonder tho: does TransportResponseHandler really need separate read and handleResponse methods if we just call them in sequence?

@original-brownbear
Copy link
Member Author

Makes you wonder tho: does TransportResponseHandler really need separate read and handleResponse methods if we just call them in sequence?

That's why I started with

Just to illustrate the point of where we can save memory

:P

It feels like the answer is clearly no, these should not be different methods. In fact I'd love to go even further and make it so the read method receives a BytesReference that must not necessarily be of full message length and would return an int for how many bytes could be consumed already. If you look at this example, I think it's quite easy to see how repeatedly calling a reader like int read(ReleasableBytesReference buffer) until buffer is fully consumed gives you real streaming reads and could be done today in one or two steps.

I just figured I'd start with this to illustrate the point :)

Also, note that the difference between serialized and deserialized heap use for this kind of message and similar stuff like aggregations is somewhere between 2x and mroe than an order of magnitude. Streaming bytes is nice (and simplifies reasoning about heap consumption and stability) but practically this kind of "pretend streaming" is where most of the win is for messages that can be incrementally reduced on the receiver.

Personally, I'd still go for the hack in step 0 and then just do the next step of moving to an actual byte streaming API by selectively short-circuiting the InboundAggregator based on the action found in the header. That should be a pretty short PR actually if you think about it. You could easily lookup the request- or response handler from the transport service once you have the Header available and selectively bypass the aggregator based on some boolean canReadBytes() on those handlers couldn't you? :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Search Foundations/Search Catch all for Search Foundations v9.1.0 WIP
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants