Solve one-produce-request-at-time problem #176

ivanyu · 2025-02-10T17:45:58Z

The Kafka network subsystem is organized in a way that allows handling only one request per connection simultaneously. When the request is received, the connection is muted (which means, no further requests are selected by Selector). Only when the request is processed and the response is sent to the client, the channel is unmuted to let the following request it. This is fine for normal Kafka, where produce requests are normally quick. However, in Inkless normally we have up up to 250+ ms delay while the active file is filled, uploaded, and committed. This is the major bottleneck for single producer throughput. More details could be found here.

This commit is a proof-of-concept solution to this problem. The core idea is "upgrading" the connection. When the first Inkless Produce request is handled, the connection is "upgraded". After this moment, it's not expecting to receive any other request type nor even non-Inkless produce. Upgraded Inkless connections are handled differently in several key ways:

They aren't muted when a request is received. This allows to accept requests in the pipelined fashion, hand them over to the InklessAppendInterceptor before the previous ones are handled and responded to. The Inkless produce machinery already has some queueing and support for parallel upload.
There's a response queue, InklessSendQueue. It enables serializing responses by correlation ID (otherwise, the client will fail) and sending them downstream to the connection only when the connection is ready to send them further to the client.
If an unexpected request comes, the connection fails.

TODO

The assumption that if there are produce requests in the connection, there will be only produce requests in the future seems correct for any sensibly implemented client. However, there's two exceptions to this: periodic metadata updates and telemetry sending. It seems, the Java client uses the same connection used for produce requests. A way to handle this must be found. Potentially, another unelegant hack will be needed. For example, there requests may be handled in parallel to Inkless produce.
Connection muting is a back pressure mechanism. Disabling it, we're opening the broker to all sorts of overload and QoS perils. A correct end-to-end back pressure mechanism must be implemented for Inkless produce.
Connection muting is also used for client quotas. This must also be taken into account.

The Kafka network subsystem is organized in a way that allows handling only one request per connection simultaneously. When the request is received, the connection is muted (which means, no further requests are selected by `Selector`). Only when the request is processed and the response is sent to the client, the channel is unmuted to let the following request it. This is fine for normal Kafka, where produce requests are normally quick. However, in Inkless normally we have up up to 250+ ms delay while the active file is filled, uploaded, and committed. This is the major bottleneck for single producer throughput. More details could be found [here](https://aiven-io.slack.com/archives/C06G4TBQ6AW/p1733939393895899). This commit is a proof-of-concept solution to this problem. The core idea is "upgrading" the connection. When the first Inkless Produce request is handled, the connection is "upgraded". After this moment, it's not expecting to receive any other request type nor even non-Inkless produce. Upgraded Inkless connections are handled differently in several key ways: 1. They aren't muted when a request is received. This allows to accept requests in the pipelined fashion, hand them over to the `InklessAppendInterceptor` before the previous ones are handled and responded to. The Inkless produce machinery already has some queueing and support for parallel upload. 2. There's a response queue, `InklessSendQueue`. It enables serializing responses by correlation ID (otherwise, the client will fail) and sending them downstream to the connection only when the connection is ready to send them further to the client. 3. If an unexpected request comes, the connection fails. # TODO 1. The assumption that if there are produce requests in the connection, there will be only produce requests in the future seems correct for any sensibly implemented client. However, there's two exceptions to this: periodic metadata updates and telemetry sending. It seems, the Java client uses the same connection used for produce requests. A way to handle this must be found. Potentially, another unelegant hack will be needed. For example, there requests may be handled in parallel to Inkless produce. 2. Connection muting is a back pressure mechanism. Disabling it, we're opening the broker to all sorts of overload and QoS perils. A correct end-to-end back pressure mechanism must be implemented for Inkless produce. 3. Connection muting is also used for client quotas. This must also be taken into account.

gharris1727

I think this relies on the structure of the correlationId too much. In the java client it wraps around, skips numbers, etc. Using the robustness principle, we should accept any possible correlationId scheme and just treat them as blind numbers, and not compare them with one another. For example, a client may generate a random number for each request.

One other concern: Once a connection is upgraded by producing to an inkless topic, that same connection could produce in parallel to a non-inkless topic without being downgraded. The check for ApiKeys.PRODUCE doesn't preclude it currently, and we shouldn't change the behavior of classic produce requests until we're sure the pipelining isn't harmful.

Also, would it make sense for a non-inkless-produce request to downgrade the connection, rather than killing it with a RuntimeException? What is the technical challenge to implementing downgrading?

ivanyu · 2025-03-11T14:44:10Z

I think this relies on the structure of the correlationId too much. In the java client it wraps around, skips numbers, etc. Using the robustness principle, we should accept any possible correlationId scheme and just treat them as blind numbers, and not compare them with one another. For example, a client may generate a random number for each request.

Yes, you're right.

One other concern: Once a connection is upgraded by producing to an inkless topic, that same connection could produce in parallel to a non-inkless topic without being downgraded. The check for ApiKeys.PRODUCE doesn't preclude it currently, and we shouldn't change the behavior of classic produce requests until we're sure the pipelining isn't harmful.

Yep, this check is missing.

Also, would it make sense for a non-inkless-produce request to downgrade the connection, rather than killing it with a RuntimeException? What is the technical challenge to implementing downgrading?

It seems possible. The most challenging part is that we need to let the current pipeline drain before letting in the non-Inkless request.

Generally the biggest challenge is that essential information (e.g. is it a Inkless produce, classic produce, or mix) is available later than it is needed to make the decision. Not that we can't work around this, but upstreaming this will require a good refactoring.

ivanyu · 2025-03-11T15:45:25Z

I think this proposal suffers from possible violation of the request order. We allow multiple Inkless produce request to pass to the same RequestChannel, concretely into requestQueue inside it. While the requests are in this queue, the order is preserved. However, there are num.io.threads threads reading from this queue. So if there are two requests from the same connection in the queue, their handling order is arbitrary due to the thread scheduling non-determinism.

We probably need a similar ordering mechanism for requests as it's proposed here for responses.

github-actions bot added core storage labels Feb 10, 2025

ivanyu force-pushed the ivanyu/research/pipelined-produce branch 8 times, most recently from 881f9f6 to ca25e2b Compare February 10, 2025 18:16

ivanyu force-pushed the ivanyu/research/pipelined-produce branch from ca25e2b to d5873fa Compare February 10, 2025 18:32

gharris1727 reviewed Feb 11, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Solve one-produce-request-at-time problem #176

Solve one-produce-request-at-time problem #176

Uh oh!

ivanyu commented Feb 10, 2025 •

edited

Loading

Uh oh!

gharris1727 left a comment

Uh oh!

ivanyu commented Mar 11, 2025

Uh oh!

ivanyu commented Mar 11, 2025 •

edited

Loading

Uh oh!

Uh oh!

Solve one-produce-request-at-time problem #176

Are you sure you want to change the base?

Solve one-produce-request-at-time problem #176

Uh oh!

Conversation

ivanyu commented Feb 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TODO

Uh oh!

gharris1727 left a comment

Choose a reason for hiding this comment

Uh oh!

ivanyu commented Mar 11, 2025

Uh oh!

ivanyu commented Mar 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ivanyu commented Feb 10, 2025 •

edited

Loading

ivanyu commented Mar 11, 2025 •

edited

Loading