-
Notifications
You must be signed in to change notification settings - Fork 14.5k
KAFKA-19457: Make share group init retry interval configurable. #20104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. Just one comment to address.
@@ -484,6 +493,9 @@ public GroupCoordinatorConfig(AbstractConfig config) { | |||
require(shareGroupSessionTimeoutMs <= shareGroupMaxSessionTimeoutMs, | |||
String.format("%s must be less than or equal to %s", | |||
SHARE_GROUP_SESSION_TIMEOUT_MS_CONFIG, SHARE_GROUP_MAX_SESSION_TIMEOUT_MS_CONFIG)); | |||
require(shareGroupInitializeRetryIntervalMs >= offsetCommitTimeoutMs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs to be changed a little. The problem is that you've added a new internal (and thus undocumented) configuration to allow configuration of something which should in principle not need tweaking. That's fine. However, the validation of the configs can now break if someone changes a documented configuration, and they will not know about the internal configuration. It would be better to ensure in the code that the shareGroupInitializeRetryIntervalMs
is not smaller than the offsetCommitTimeoutMs
just using assignment to an appropriate value.
@AndrewJSchofield Thanks for the comments, incorporated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the patch, LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this patch, leave a comment.
@@ -437,6 +445,8 @@ public GroupCoordinatorConfig(AbstractConfig config) { | |||
this.shareGroupMaxHeartbeatIntervalMs = config.getInt(GroupCoordinatorConfig.SHARE_GROUP_MAX_HEARTBEAT_INTERVAL_MS_CONFIG); | |||
this.shareGroupMaxSize = config.getInt(GroupCoordinatorConfig.SHARE_GROUP_MAX_SIZE_CONFIG); | |||
this.shareGroupAssignors = shareGroupAssignors(config); | |||
int initializeRetryMs = config.getInt(GroupCoordinatorConfig.SHARE_GROUP_INITIALIZE_RETRY_INTERVAL_MS_CONFIG); | |||
this.shareGroupInitializeRetryIntervalMs = Math.max(initializeRetryMs, this.offsetCommitTimeoutMs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this comparison add to SHARE_GROUP_INITIALIZE_RETRY_INTERVAL_MS_DOC
for clarification ?
@smjn The code change looks good to me. Please could you triage the failed tests and make sure they are appropriately tracked. Thanks. |
Unrelated Flaky/failed tests: FAILED ❌ LogRecoveryTest > testHWCheckpointWithFailuresMultipleLogSegments() - already tracked in https://issues.apache.org/jira/browse/KAFKA-19452 FLAKY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for this patch.
GroupMetadataManager.shareGroupHeartbeat
, we check for topics ininitializing
state and if they are a certain amount of time old, weissue retry requests for the same.
offsetsCommitTimeoutMs
and was not configurable.supply the value. The default is
30_000
which is aheuristic based on the fact that the share coordinator
persister
retries request with exponential backoff, with upper cap of
30_000
seconds.
Reviewers: Apoorv Mittal [email protected], Lan Ding
[email protected], TaiJuWu [email protected], Andrew Schofield
[email protected]