DRAFT: Consume fieldcaps node response while deserializing it #120010

original-brownbear · 2025-01-11T00:00:53Z

Just to illustrate the point of where we can save memory here, the concrete API could be made nicer but functionally this thing seems fine and comes with a measurable performance boost from some quick and dirty benchmarking.

The API implemented here could definitely use some extra love. But the idea seems to work quite well according to some quick testing and this approach can potentially save a massive amount of memory not just in field caps.

Instead of deserializing a full list of per-index responses before getting to processing them, we should consume each response right after deserializing it without any intermediary list. This should be safe to do since we deserialize responses on the same thread that we consume the full response on today.
Not only does this save a lot of heap by merging responses directly, it also might save quite a bit of runtime from better cache locality. Also, having a stateful deserializer allows for a couple helpful tricks in around deduplication across multiple responses for fan-out style APIs as well.

The API implemented here could definitely use some extra love. But the idea seems to work quite well according to some quick testing and this approach can potentially save a massive amount of memory not just in field caps. Instead of deserializing a full list of per-index responses before getting to processing them, we should consume each response right after deserializing it without any intermediary list. This should be safe to do since we deserialize responses on the same thread that we consume the full response on today. Not only does this save a lot of heap by merging responses directly, it also might save quite a bit of runtime from better cache locality.

This field is only used (by security) for requests, having it in responses is redundant. Also, we have a couple of responses that are singletons/quasi-enums where setting the value needlessly might introduce some strange contention even though it's a plain store. This isn't just a cosmetic change. It makes it clear at compile time that each response instance is exclusively defined by the bytes that it is read from. This makes it easier to reason about the validity of suggested optimizations like elastic#120010

…line-consume-response

This field is only used (by security) for requests, having it in responses is redundant. Also, we have a couple of responses that are singletons/quasi-enums where setting the value needlessly might introduce some strange contention even though it's a plain store. This isn't just a cosmetic change. It makes it clear at compile time that each response instance is exclusively defined by the bytes that it is read from. This makes it easier to reason about the validity of suggested optimizations like #120010

This field is only used (by security) for requests, having it in responses is redundant. Also, we have a couple of responses that are singletons/quasi-enums where setting the value needlessly might introduce some strange contention even though it's a plain store. This isn't just a cosmetic change. It makes it clear at compile time that each response instance is exclusively defined by the bytes that it is read from. This makes it easier to reason about the validity of suggested optimizations like elastic#120010

DaveCTurner

Kinda cute but also a bit of a hack :) Especially since the exception handling in TransportResponse#read is now a little tricky because of having to consume the whole response.

Makes you wonder tho: does TransportResponseHandler really need separate read and handleResponse methods if we just call them in sequence?

original-brownbear · 2025-04-18T10:51:24Z

Makes you wonder tho: does TransportResponseHandler really need separate read and handleResponse methods if we just call them in sequence?

That's why I started with

Just to illustrate the point of where we can save memory

:P

It feels like the answer is clearly no, these should not be different methods. In fact I'd love to go even further and make it so the read method receives a BytesReference that must not necessarily be of full message length and would return an int for how many bytes could be consumed already. If you look at this example, I think it's quite easy to see how repeatedly calling a reader like int read(ReleasableBytesReference buffer) until buffer is fully consumed gives you real streaming reads and could be done today in one or two steps.

I just figured I'd start with this to illustrate the point :)

Also, note that the difference between serialized and deserialized heap use for this kind of message and similar stuff like aggregations is somewhere between 2x and mroe than an order of magnitude. Streaming bytes is nice (and simplifies reasoning about heap consumption and stability) but practically this kind of "pretend streaming" is where most of the win is for messages that can be incrementally reduced on the receiver.

Personally, I'd still go for the hack in step 0 and then just do the next step of moving to an actual byte streaming API by selectively short-circuiting the InboundAggregator based on the action found in the header. That should be a pretty short PR actually if you think about it. You could easily lookup the request- or response handler from the transport service once you have the Header available and selectively bypass the aggregator based on some boolean canReadBytes() on those handlers couldn't you? :)

original-brownbear added WIP :Search Foundations/Search Catch all for Search Foundations labels Jan 11, 2025

elasticsearchmachine added the v9.0.0 label Jan 11, 2025

even faster

6e86b1d

original-brownbear mentioned this pull request Jan 11, 2025

Remove remoteAddress field from TransportResponse #120016

Merged

original-brownbear and others added 4 commits January 11, 2025 17:30

fix

73be1bd

[CI] Auto commit changes from spotless

b399259

fix

8a5d3c9

Merge remote-tracking branch 'origin/inline-consume-response' into in…

2a71fcc

…line-consume-response

original-brownbear mentioned this pull request Jan 25, 2025

Force all FieldCaps response handling onto a single thread per request #120863

Merged

Merge branch 'main' into inline-consume-response

bec3efe

elasticsearchmachine added v9.1.0 and removed v9.0.0 labels Jan 30, 2025

original-brownbear mentioned this pull request Mar 4, 2025

Introduce batched query execution and data-node side reduce #121885

Merged

DaveCTurner reviewed Apr 18, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DRAFT: Consume fieldcaps node response while deserializing it #120010

DRAFT: Consume fieldcaps node response while deserializing it #120010

original-brownbear commented Jan 11, 2025 •

edited

Loading

DaveCTurner left a comment

original-brownbear commented Apr 18, 2025

DRAFT: Consume fieldcaps node response while deserializing it #120010

Are you sure you want to change the base?

DRAFT: Consume fieldcaps node response while deserializing it #120010

Conversation

original-brownbear commented Jan 11, 2025 • edited Loading

DaveCTurner left a comment

Choose a reason for hiding this comment

original-brownbear commented Apr 18, 2025

original-brownbear commented Jan 11, 2025 •

edited

Loading