KAFKA-19476: Improve state transition handling in SharePartition #20124

adixitconfluent · 2025-07-08T07:01:55Z

About

The way state transition works in SharePartition has a few
problems -

In case we arrive at a state which should be treated as final state
of that batch/offset (example - LSO movement which causes offset/batch
to be ARCHIVED permanently), the result of pending write state RPCs for
that offset/batch should not matter for such cases
There is no locking available for state transitions which can lead to
inconsistency in case a request to archive record and rollback state
transition request arrive at the same
time.
If an acquisition lock timeout occurs while an offset/batch is
undergoing transition followed by write state RPC failure, then we can
land in a scenario where the offset stays in ACQUIRED state with no
acquisition lock timeout task.

Testing

The code has been tested with new and exiting unit tests and existing
integration tests.

Reviewers: Apoorv Mittal [email protected], Andrew Schofield
[email protected]

…read write lock

apoorvmittal10

Thanks for the PR, some comments.

core/src/main/java/kafka/server/share/SharePartition.java

…C failure

AndrewJSchofield

Thanks for the PR. A few comments for your consideration.

core/src/main/java/kafka/server/share/SharePartition.java

apoorvmittal10

Thanks for the changes, though it's a good progress but I have 1 doubt.

apoorvmittal10 · 2025-07-10T19:58:24Z

core/src/main/java/kafka/server/share/SharePartition.java

+                    // hasn't reached a terminal state. If acquisition lock has expired by that time, the record can
+                    // be stuck in ACQUIRED state unless we run the acquisition lock task again.
+                    if (!state.isTerminalState() && state.acquisitionLockTimeoutTask.hasExpired()) {
+                        state.acquisitionLockTimeoutTask.run();


This will record the metric of timeout again. The previous run of timeout task must have already issued a call to persister in background then is it not concerning us?

there is a call to persister only if stateBatches is non-empty. Since the state of the record in this situation cannot be ACQUIRED (it will be in AVAILABLE/ACKNOWLEDGED/ARCHIVED state because of an ongoing transition) during the first acquisition lock timeout, state batches cannot have an entry for this record state. Thus, there won't be any persister calls for this record.

Regarding "This will record the metric of timeout again.", I have a added a code change as described below to not record the metric of timeout twice.

if (!hasExpired) { sharePartitionMetrics.recordAcquisitionLockTimeoutPerSec(lastOffset - firstOffset + 1); }

AndrewJSchofield

Thanks for the updates. The terminal state is easier to follow, I think.

AndrewJSchofield · 2025-07-11T11:08:41Z

core/src/main/java/kafka/server/share/SharePartition.java

        }

        long expirationMs() {
            return expirationMs;
        }

+        boolean hasExpired() {
+            return hasExpired;


This seems to me to be accessed on multiple threads. It is checked underneath the share-partition write lock to determine whether to run the task following a write error, but it can also change if the timer task runs normally.

Agreed, I think it makes sense to make this function thread safe. I have added synchronized around hasExpired() and run() functions.

apoorvmittal10

LGTM! Some minor comments.

apoorvmittal10 · 2025-07-11T16:05:29Z

core/src/test/java/kafka/server/share/SharePartitionTest.java

+        assertTrue(sharePartition.cachedState().get(2L).batchHasOngoingStateTransition());
+        assertTrue(sharePartition.cachedState().get(7L).batchHasOngoingStateTransition());
+
+        // LSO is at 9.


Suggested change

// LSO is at 9.

// Move LSO to 9, so some records/offsets can be marked archived.

apoorvmittal10 · 2025-07-11T16:08:45Z

core/src/test/java/kafka/server/share/SharePartitionTest.java

+        assertEquals(RecordState.ARCHIVED, sharePartition.cachedState().get(7L).offsetState().get(7L).state());
+        assertEquals(RecordState.ARCHIVED, sharePartition.cachedState().get(7L).offsetState().get(8L).state());
+        assertEquals(RecordState.AVAILABLE, sharePartition.cachedState().get(7L).offsetState().get(9L).state());
+        assertEquals(RecordState.AVAILABLE, sharePartition.cachedState().get(7L).offsetState().get(10L).state());
+        assertEquals(RecordState.AVAILABLE, sharePartition.cachedState().get(7L).offsetState().get(11L).state());


These are not being test as future2 is never compeleted, is it intended?

This is intended because we are testing that the records that have been marked ARCHIVED due to LSO movement should remain ARCHIVED even though we have failures in Write State RPC for those records. Perhaps, I'll just remove the last 2 asserts to avoid any confusion.

I think you might want to fail the future2 as well here to see that offsets past 9 are back in Acquired, due to rollback. That should test completely that some of the offsets are archived but some rollbacked. Or is that test exists, can you please point me to.

Hi @apoorvmittal10 , while writing the test as you mentioned, I encountered another problem in the code where during write state call processing, a batch can be split into offsets. This can happen due to LSO movement etc. Hence, we should deal with that issue in a separate JIRA https://issues.apache.org/jira/browse/KAFKA-19502 and we should change the test case then

apoorvmittal10 · 2025-07-11T16:11:18Z

core/src/test/java/kafka/server/share/SharePartitionTest.java

+                },
+                () -> {
+                    inFlightState.completeStateTransition(false);
+                    return null;
+                }


We should add a test case when commit succeeds. It will be good to have so future refactoring do not introduce new issues.

apoorvmittal10 · 2025-07-11T16:17:25Z

core/src/test/java/kafka/server/share/SharePartitionTest.java

@@ -7467,6 +7468,164 @@ public void testNextFetchOffsetWhenOffsetsHaveOngoingTransition() {
        assertEquals(20, sharePartition.nextFetchOffset());
    }

+    @Test
+    public void testLsoMovementWithWriteStateRPCFailuresInAck() {


Suggested change

public void testLsoMovementWithWriteStateRPCFailuresInAck() {

public void testLsoMovementWithWriteStateRPCFailuresInAcknowledgement() {

apoorvmittal10

LGTM, I ll take up the other issue.

Added code to perform state transition wit the help of linked list + …

80c9946

…read write lock

github-actions bot added triage PRs from the community core Kafka Broker KIP-932 Queues for Kafka labels Jul 8, 2025

omkreddy added the ci-approved label Jul 8, 2025

adixitconfluent added 2 commits July 9, 2025 12:39

Got rid of the queue

b0f4233

Added more unit tests

944c57e

adixitconfluent marked this pull request as ready for review July 9, 2025 08:17

Refactor

48215ea

apoorvmittal10 requested changes Jul 9, 2025

View reviewed changes

core/src/main/java/kafka/server/share/SharePartition.java Outdated Show resolved Hide resolved

core/src/main/java/kafka/server/share/SharePartition.java Outdated Show resolved Hide resolved

core/src/main/java/kafka/server/share/SharePartition.java Outdated Show resolved Hide resolved

adixitconfluent added 2 commits July 9, 2025 17:22

Addressed Apoorv's round 1 comments

15527a4

Added handling of acquisition lock timeout expiry with write state RP…

6c590ca

…C failure

adixitconfluent requested a review from apoorvmittal10 July 9, 2025 13:55

AndrewJSchofield self-requested a review July 9, 2025 14:46

AndrewJSchofield removed the triage PRs from the community label Jul 9, 2025

AndrewJSchofield requested changes Jul 9, 2025

View reviewed changes

Addressed Andrew's round 1 comments

3ed5fd6

adixitconfluent requested a review from AndrewJSchofield July 10, 2025 05:19

apoorvmittal10 reviewed Jul 10, 2025

View reviewed changes

Addressed Apoorv's comments

1481f35

adixitconfluent requested a review from apoorvmittal10 July 11, 2025 06:33

AndrewJSchofield requested changes Jul 11, 2025

View reviewed changes

Addressed Andrew's round 2 comments

f7ed17c

adixitconfluent requested a review from AndrewJSchofield July 11, 2025 11:45

apoorvmittal10 reviewed Jul 11, 2025

View reviewed changes

Addressed Apooev's round 2 comments

d011076

adixitconfluent requested a review from apoorvmittal10 July 14, 2025 05:08

apoorvmittal10 approved these changes Jul 14, 2025

View reviewed changes

	// LSO is at 9.
	// Move LSO to 9, so some records/offsets can be marked archived.

	public void testLsoMovementWithWriteStateRPCFailuresInAck() {
	public void testLsoMovementWithWriteStateRPCFailuresInAcknowledgement() {

KAFKA-19476: Improve state transition handling in SharePartition #20124

Are you sure you want to change the base?

KAFKA-19476: Improve state transition handling in SharePartition #20124

Conversation

adixitconfluent commented Jul 8, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

About

Testing

Uh oh!

apoorvmittal10 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AndrewJSchofield left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

apoorvmittal10 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AndrewJSchofield left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

apoorvmittal10 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

apoorvmittal10 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

adixitconfluent commented Jul 8, 2025 •

edited by github-actions bot

Loading