Skip to content

KAFKA-14915: allow read remote storage for multiple partitions in one fetchRequest #20045

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jul 14, 2025

Conversation

showuon
Copy link
Member

@showuon showuon commented Jun 26, 2025

This PR enables reading remote storage for multiple partitions in one
fetchRequest. The main changes are:

  1. In DelayedRemoteFetch, we accept multiple remoteFetchTasks and
    other metadata now.
  2. In DelayedRemoteFetch, we'll wait until all remoteFetch done,
    either succeeded or failed.
  3. In ReplicaManager#fetchMessage, we'll create one
    DelayedRemoteFetch and pass multiple remoteFetch metadata to it, and
    watch all of them.
  4. Add tests

@github-actions github-actions bot added core Kafka Broker storage Pull requests that target the storage module tiered-storage Related to the Tiered Storage feature labels Jun 26, 2025
@showuon
Copy link
Member Author

showuon commented Jun 26, 2025

@satishd @kamalcph , please take a look. Thanks.

@showuon showuon force-pushed the KAFKA-14915 branch 3 times, most recently from 30894b4 to 50a2d77 Compare June 27, 2025 03:38
Copy link
Contributor

@kamalcph kamalcph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch! Overall LGTM.

Few points:

  1. The FETCH request now has to wait to get response from all the remote partitions. And, it can have tail-latency. This tradeoff is fine considering the consumer might have to read the same data in the upcoming FETCH requests. This can be documented in the PR.
  2. When minOneMessage set to false (from second partition in the request), and there are multiple remote reads in the given FETCH request. We may read the data from remote and then drop those when the size of the message does not fit the partition limit. This was fine for local-reads but remote-reads might fetch chunk of data and multiple responses might be dropped. Shall we add some DEBUG logs when the remote fetched data gets dropped?

@showuon
Copy link
Member Author

showuon commented Jul 4, 2025

  1. The FETCH request now has to wait to get response from all the remote partitions. And, it can have tail-latency. This tradeoff is fine considering the consumer might have to read the same data in the upcoming FETCH requests. This can be documented in the PR.

Good point. I've updated the doc in remote.fetch.max.wait to mention the waiting all fetch behavior, and reverted the change in this PR about the limitation.

  1. When minOneMessage set to false (from second partition in the request), and there are multiple remote reads in the given FETCH request. We may read the data from remote and then drop those when the size of the message does not fit the partition limit. This was fine for local-reads but remote-reads might fetch chunk of data and multiple responses might be dropped. Shall we add some DEBUG logs when the remote fetched data gets dropped?

I agree with you. Added a debug log.

Thanks for the review!

@satishd
Copy link
Member

satishd commented Jul 4, 2025

Thanks @showuon for the PR, I will review this PR by Monday.

@satishd satishd self-requested a review July 4, 2025 07:12
Copy link
Contributor

@kamalcph kamalcph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the patch!

Copy link
Contributor

@fvaleri fvaleri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks.

I tried on my local machine and I can see remote partitions being read in parallel and increased throughput.

Copy link
Member

@satishd satishd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @showuon for the PR, overall LGTM.

Left a couple of comments on minor typos int he test.

Copy link
Member

@satishd satishd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @showuon for addressing the comments. LGTM.

@satishd satishd merged commit e1ff387 into apache:trunk Jul 14, 2025
21 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clients consumer core Kafka Broker storage Pull requests that target the storage module tiered-storage Related to the Tiered Storage feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants