-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Fix request processing scheduling #127464
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix request processing scheduling #127464
Conversation
Pinging @elastic/es-analytical-engine (Team:Analytics) |
@@ -265,9 +265,8 @@ void onAfter(DriverCompletionInfo info) { | |||
concurrentRequests.release(); | |||
} | |||
|
|||
if (pendingRetries.isEmpty() == false && remainingUnavailableShardResolutionAttempts.decrementAndGet() >= 0) { | |||
if (sendingLock.isHeldByCurrentThread()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think isHeldByCurrentThread
should be used for assertions or debugging purposes, not in production code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. let me find a way to replace it.
pendingRetries.add(shardId); | ||
if (pendingRetries == null && remainingUnavailableShardResolutionAttempts.decrementAndGet() >= 0) { | ||
pendingRetries = new HashSet<>(); | ||
sendingLock.lock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am concerned that we are scattering sendingLock#lock and sendingLock#unlock in two different places. Can we keep them close?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are in the same inner listener, guarding the same structure at the moment.
They were previously in the same method before but that was not enough and caused a bug.
I suspect we could do this by creating a releasable inner wrapper class on top of pendingRetries = new HashSet<>()
but that sounds like an overkill not really helping with readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@idegtiarenko Sorry, I should have provided more detail. My concern is that we acquire the sending lock in maybeScheduleRetry
and release it in onAfter
, which is linked to the status of pendingRetries
. While the implementation is technically correct, I think we should stick to the simplest lock pattern unless there is a strong reason to do otherwise:
lock/tryLock
try {
...
} finally {
unlock
}
I think we can follow this lock pattern in the DataNodeRequestSender class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds a lot like the original implementation (within onAfter
) that was not correct.
Do you see a way how to implement this suggestion while still locking conditionally (only if shard movement is detected) and blocking concurrent requests while it is detected that new shard location resolution is required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed yesterday, I moved retry scheduling to trySendingRequestsForPendingShards
in 8a0dcc6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @idegtiarenko
This change fix concurrency around handling moved shards.
The test was failing as the shard failures were visible before retry was processed. In order to fix it the error handling is updated to:
Closes: #127168