-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Force all FieldCaps response handling onto a single thread per request #120863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Force all FieldCaps response handling onto a single thread per request #120863
Conversation
The current implementation uses a task per response model. This is unnecessarily costly when all results are merged into a single synchronzied map. The concurrency across requests allows for concurrent deserialization of responses but becomes incredibly costly when responses collide because of the hot lock acquisitions. Also, deserializing more than a single response at a time comes with higher than necessary heap overheads (especially for the remote cluster use-case) because we hold multiple responses on-heap and deserialized but merging is sequential in the end anyways.
Pinging @elastic/es-search-foundations (Team:Search Foundations) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
This change will queue handling of the responses, and process them one at a time (rather than concurrently). I guess this could slow things a little in some cases, but that is the desired outcome to avoid unnecessary concurrency and resource utilisation in many cases.
Thanks Chris! In my benchmarking via the many_shards run I couldn't really make out any slowdown. That said my hope/assumption is that doing something like #120010 we could actually get a serious speedup out of single-threadedness and a smaller working set :) 🤞 |
Thanks Chris! |
Same reasoning as for field_caps in elastic#120863, no need to have multiple threads contending the same mutex(s) when the heavy lifting step in handling the results is sequential anyway.
The current implementation uses a task per response model. This is unnecessarily costly when all results are merged into a single synchronized map. The concurrency across requests allows for concurrent deserialization of responses (this advantage of running on multiple threads would effectively disappear with #120010) but becomes incredibly costly when responses collide because of the hot lock acquisitions.
Also, deserializing more than a single response at a time comes with higher than necessary heap overheads (especially for the remote cluster use-case) because we hold multiple responses on-heap and deserialized but merging is sequential in the end anyways.
Also, removing all synchronization from the hot loops just outright reduces their cost even in the uncontended case.