KAFKA-19254: Add generic feature level metrics #20021

kevin-wu24 · 2025-06-23T16:12:27Z

This PR adds the following metrics for each of the supported production
features (metadata.version, kraft.version, transaction.version,
etc.):

kafka.server:type=MetadataLoader,name=FinalizedLevel,featureName=X
kafka.server:type=node-metrics,name=maximum-supported-level,feature-name=X
kafka.server:type=node-metrics,name=minimum-supported-level,feature-name=X

Reviewers: PoAn Yang [email protected], Jhen-Yung Hsu
[email protected], TengYao Chi [email protected], Ken Huang
[email protected], Lan Ding [email protected], Chia-Ping Tsai
[email protected]

chia7712

@kevin-wu24 thanks for this patch. a couple of comments are left. PTAL

server/src/main/java/org/apache/kafka/server/metrics/NodeMetrics.java

chia7712 · 2025-06-23T16:44:42Z

server/src/main/java/org/apache/kafka/server/metrics/NodeMetrics.java

+
+    private MetricName getFeatureNameTagMetricName(String name, String group, String featureName) {
+        LinkedHashMap<String, String> featureNameTag = new LinkedHashMap<>();
+        featureNameTag.put(FEATURE_NAME_TAG, featureName.replace(".", "-"));


it contains only one element, so it should be fine to use Map.of

chia7712 · 2025-06-23T16:47:27Z

metadata/src/main/java/org/apache/kafka/image/loader/metrics/MetadataLoaderMetrics.java

+
+    private static MetricName getFeatureNameTagMetricName(String type, String name, String featureName) {
+        LinkedHashMap<String, String> featureNameTag = new LinkedHashMap<>();
+        featureNameTag.put(FEATURE_NAME_TAG, sanitizeFeatureName(featureName));


The KafkaYammerMetrics.getMetricName method below expects a LinkedHashMap parameter, not Map.

chia7712 · 2025-06-23T16:49:40Z

metadata/src/main/java/org/apache/kafka/image/loader/metrics/MetadataLoaderMetrics.java

+        finalizedFeatureLevels.put(featureName, featureLevel);
+    }
+
+    public short finalizedFeatureLevel(String featureName) {


this could be private method

Kept this public and added checks in the test mentioned here: #20021 (comment)

chia7712 · 2025-06-23T16:59:01Z

metadata/src/main/java/org/apache/kafka/image/loader/metrics/MetadataLoaderMetrics.java

+        return KafkaYammerMetrics.getMetricName("kafka.server", type, name, featureNameTag);
+    }
+
+    private static String sanitizeFeatureName(String featureName) {


Please consider adding comments to remind readers that the naming style is different to NodeMetrics, and this is expected.

frankvicky

@kevin-wu24: Thanks for the patch!

frankvicky · 2025-06-24T03:57:54Z

metadata/src/test/java/org/apache/kafka/image/loader/metrics/MetadataLoaderMetricsTest.java

    private static MetricName metricName(String type, String name) {
        String mBeanName = String.format("kafka.server:type=%s,name=%s", type, name);
        return new MetricName("kafka.server", type, name, null, mBeanName);
    }
+
+    private static MetricName metricName(String type, String name, String scope) {
+        String mBeanName = String.format("kafka.server:type=%s,name=%s,%s", type, name, scope);
+        return new MetricName("kafka.server", type, name, scope, mBeanName);
+    }


Could we combine these two similar methods?
For example:

private static MetricName metricName(String type, String... nameParts) { if (nameParts.length == 0) { throw new IllegalArgumentException("At least one name part is required"); } String name = nameParts[0]; String scope = nameParts.length > 1 ? String.join(",", Arrays.copyOfRange(nameParts, 1, nameParts.length)) : null; String mBeanName = scope == null ? String.format("kafka.server:type=%s,name=%s", type, name) : String.format("kafka.server:type=%s,name=%s,%s", type, name, scope); return new MetricName("kafka.server", type, name, scope, mBeanName); }

I would like to leave the methods as is. I think combining them both into one variadic method is a bit harder to read.

DL1231

Thanks for the patch, left a comment.

DL1231 · 2025-06-24T07:09:48Z

metadata/src/main/java/org/apache/kafka/image/loader/metrics/MetadataLoaderMetrics.java

+            builder.append(word.substring(0, 1).toUpperCase(Locale.ROOT))
+                   .append(word.substring(1).toLowerCase(Locale.ROOT));


Could we use charAt(0) to directly get the first character? This avoids creating temporary strings by substring.

Suggested change

builder.append(word.substring(0, 1).toUpperCase(Locale.ROOT))

.append(word.substring(1).toLowerCase(Locale.ROOT));

builder.append(Character.toUpperCase(word.charAt(0)))

.append(word.substring(1).toLowerCase(Locale.ROOT));

FrankYang0529

Thanks for the PR. Please run ./gradlew checkstyleMain checkstyleTest spotlessCheck to fix lint issue.

FrankYang0529 · 2025-06-24T08:58:51Z

server/src/test/java/org/apache/kafka/server/metrics/NodeMetricsTest.java

+
+        );
+
+        try (NodeMetrics ignored = new NodeMetrics(metrics, true)) {


Could we create a case to test new NodeMetrics(metrics, false) as well?

FrankYang0529 · 2025-06-24T08:59:55Z

metadata/src/main/java/org/apache/kafka/image/loader/metrics/MetadataLoaderMetrics.java

@@ -42,10 +47,13 @@ public final class MetadataLoaderMetrics implements AutoCloseable {
        "MetadataLoader", "HandleLoadSnapshotCount");
    public static final MetricName CURRENT_CONTROLLER_ID = getMetricName(
        "MetadataLoader", "CurrentControllerId");
+    public static final String FINALIZED_LEVEL_METRIC_NAME = "FinalizedLevel";


This variable can be private. There is no usage outside this class.

FrankYang0529 · 2025-06-24T09:10:21Z

metadata/src/main/java/org/apache/kafka/image/loader/MetadataLoader.java

+        MetadataVersion metadataVersion = image.features().metadataVersionOrThrow();
+        metrics.setCurrentMetadataVersion(metadataVersion);
+        metrics.setFinalizedFeatureLevel(
+            MetadataVersion.FEATURE_NAME,
+            metadataVersion.featureLevel()
+        );
+        for (var finalizedFeatureEntry : image.features().finalizedVersions().entrySet()) {
+            metrics.setFinalizedFeatureLevel(
+                finalizedFeatureEntry.getKey(),
+                finalizedFeatureEntry.getValue()
+            );
+        }


Could you add related test to MetadataLoaderTest as well? For example, there is assertion about currentMetadataVersion.

kafka/metadata/src/test/java/org/apache/kafka/image/loader/MetadataLoaderTest.java

Lines 356 to 357 in 1ca8779

assertEquals(MINIMUM_VERSION,

loader.metrics().currentMetadataVersion());

Yeah, for checks of current metadata version metric value, I will add a check for the finalized feature level metric too.

m1a2st · 2025-06-24T11:39:32Z

server/src/main/java/org/apache/kafka/server/metrics/NodeMetrics.java

+        for (var featureName : Feature.PRODUCTION_FEATURE_NAMES) {
+            addSupportedLevelMetric(MAXIMUM_SUPPORTED_LEVEL_NAME, featureName);
+            addSupportedLevelMetric(MINIMUM_SUPPORTED_LEVEL_NAME, featureName);
+        }
+        addSupportedLevelMetric(
+            MAXIMUM_SUPPORTED_LEVEL_NAME,
+            MetadataVersion.FEATURE_NAME
+        );
+        addSupportedLevelMetric(
+            MINIMUM_SUPPORTED_LEVEL_NAME,
+            MetadataVersion.FEATURE_NAME
+        );


Should we use supportedFeatureRanges instead of setting metrics separately? I noticed that the close method uses supportedFeatureRanges.keySet() to remove metrics.

Suggested change

for (var featureName : Feature.PRODUCTION_FEATURE_NAMES) {

addSupportedLevelMetric(MAXIMUM_SUPPORTED_LEVEL_NAME, featureName);

addSupportedLevelMetric(MINIMUM_SUPPORTED_LEVEL_NAME, featureName);

}

addSupportedLevelMetric(

MAXIMUM_SUPPORTED_LEVEL_NAME,

MetadataVersion.FEATURE_NAME

);

addSupportedLevelMetric(

MINIMUM_SUPPORTED_LEVEL_NAME,

MetadataVersion.FEATURE_NAME

);

supportedFeatureRanges.forEach((featureName, versionRange) -> {

addSupportedLevelMetric(MAXIMUM_SUPPORTED_LEVEL_NAME, featureName);

addSupportedLevelMetric(MINIMUM_SUPPORTED_LEVEL_NAME, featureName);

});

Yunyung · 2025-06-24T12:12:46Z

server/src/test/java/org/apache/kafka/server/metrics/NodeMetricsTest.java

+        Metrics metrics = new Metrics();
+        String expectedGroup = "node-metrics";
+
+        // Metric description is not use for metric name equality


The typo is the same as in BrokerServerMetricsTest.java (looks like you followed it).

Suggested change

// Metric description is not use for metric name equality

// Metric description is not used for metric name equality

server/src/main/java/org/apache/kafka/server/metrics/NodeMetrics.java

DL1231

LGTM.

kevin-wu24 · 2025-06-25T16:11:53Z

Thanks all for the review. Pushed commits to address the comments.
@chia7712 can you take another look and let me know if there is anything else?

chia7712 · 2025-06-25T21:07:03Z

server/src/main/java/org/apache/kafka/server/metrics/NodeMetrics.java

+                METRIC_GROUP_NAME,
+                featureName
+            ),
+            (config, now) -> {


this will use the type Measurable and the value type is double rather than integer, right?

I remember bringing this issue up when working on the KafkaRaftMetrics a while back: #18304 (comment). Yes, it will upcast to a double.

I guess the actual values we want are short right? Since feature levels themselves are shorts. I'll update the KIP with that so it's consistent.

chia7712 · 2025-06-26T05:22:32Z

server/src/main/java/org/apache/kafka/server/metrics/NodeMetrics.java

+        });
+    }
+
+    private void addSupportedLevelMetric(String metricName, String featureName) {


supportedFeatureRanges is immutable, so perhaps we can pass the value to this method?

supportedFeatureRanges.forEach((featureName, versionRange) -> { addSupportedLevelMetric(MAXIMUM_SUPPORTED_LEVEL_NAME, featureName, versionRange.max()); addSupportedLevelMetric(MINIMUM_SUPPORTED_LEVEL_NAME, featureName, versionRange.min()); });

private void addSupportedLevelMetric(String metricName, String featureName, short value) { metrics.addMetric( getFeatureNameTagMetricName( metricName, METRIC_GROUP_NAME, featureName ), (Gauge<Short>) (config, now) -> value ); }

@kevin-wu24 any feedback for above comment?

Yeah I'm good with this change. Will update the PR.

jsancio · 2025-06-26T16:06:46Z

Hi @kevin-wu24, @chia7712 and all,

I discovered a subtle issue with the existing metadata version metrics that I think we should fix in your new metrics. I left a comment in discussion thread if you want to move the discussion there.

https://lists.apache.org/thread/tjrzqb2hmmymshln8r816m9l3d79f605

chia7712 · 2025-06-26T19:48:55Z

I discovered a subtle issue with the existing metadata version metrics that I think we should fix in your new metrics. I left a comment in discussion thread if you want to move the discussion there.

that makes sense to me. @kevin-wu24 WDYT?

kevin-wu24 · 2025-06-26T19:50:31Z

I discovered a subtle issue with the existing metadata version metrics that I think we should fix in your new metrics. I left a comment in discussion thread if you want to move the discussion there.

that makes sense to me. @kevin-wu24 WDYT?

Yeah, I agree with the approach as well. I'll push a commit to get the semantics we want.

chia7712 · 2025-06-27T14:00:47Z

metadata/src/main/java/org/apache/kafka/image/loader/metrics/MetadataLoaderMetrics.java

+     * @param featureLevel The finalized level for the feature
+     */
+    public void recordFinalizedFeatureLevel(String featureName, short featureLevel) {
+        if (finalizedFeatureLevels.putIfAbsent(featureName, featureLevel) == null) {


Excuse me, why to use putIfAbsent? I assume the code could be simplified.

var metricCreated = finalizedFeatureLevels.put(featureName, featureLevel) != null; if (!metricCreated) addFinalizedFeatureLevelMetric(featureName);

I think the double ! is a bit confusing, but yeah let me fix this to use put instead of putIfAbsent.

How about:

final var metricNotRegistered = finalizedFeatureLevels.put(featureName, featureLevel) == null; if (metricNotRegistered) addFinalizedFeatureLevelMetric(featureName);

Sounds great! 😃

jsancio

Thanks for the implementation @kevin-wu24 . Partial review.

jsancio · 2025-06-27T15:06:49Z

metadata/src/main/java/org/apache/kafka/image/loader/metrics/MetadataLoaderMetrics.java

+     * @param featureLevel The finalized level for the feature
+     */
+    public void recordFinalizedFeatureLevel(String featureName, short featureLevel) {
+        final var metricNotRegistered = finalizedFeatureLevels.putIfAbsent(featureName, featureLevel) == null;


This code should always override the value with the latest value, no? What you only want to do once is register the metric since that should only be done the first time the value gets set.

This code should always override the value with the latest value, no?

Yes, the value in the map (and thus the value of the metric) is always updated with the latest value. I was using putIfAbsent before to check if it is the first time the value is getting set to know if we should register the metric, but changed it to put as per @chia7712's comment.

I get a bit confused. That should use put instead of putIfAbsent, shouldn't it?

@kevin-wu24 please take a look at the latest code. The latest code uses putIfAbsent and not put to update the concurrent map.

Sorry, it does say that. Let me fix it.

jsancio · 2025-06-27T15:08:18Z

metadata/src/main/java/org/apache/kafka/image/loader/MetadataLoader.java

+        );
+
+        // Set all production feature levels from the image, defaulting to their minimum production values
+        for (var feature : Feature.PRODUCTION_FEATURES) {


Hmm. Is the argument here that broker and controller registration would fail if there is a finalized feature version that this node doesn't know about?

Meaning it is not possible for the cluster metadata partition to have a finalized feature version that this node doesn't know about.

Is the argument here that broker and controller registration would fail if there is a finalized feature version that this node doesn't know about?

Yes, the registration fails if there is a finalized feature version the registering node does not support. ClusterControlManager#processRegistrationFeature has this code.

Perhaps we should ensure the exposed metrics are always based on current image. The benefit is the exposed finalized versions will be consistent to current image, and we won't need to use minimumProduction which puts us in a weird position

Yeah. It would be nice to not make this assumption. The less assumptions you make the more resilient is the code to future changes. How about iterating through all of the features in the delta and only updating those values?

I'm a bit confused. Doesn't this mean we won't have metrics for features that do not have a finalized level? From the discussion thread, it said:

Any missing finalized feature version can be configured to its minimum value.

which is why I made the change to iterate over all features instead of the ones in the image.

EDIT: If the idea is to add a separate maybeRegisterMissingFeatures method, I don't see how that ends up any different than this code, since we'll have to loop over the PRODUCTION_FEATURES in that method instead.

Perhaps we should ensure the exposed metrics are always based on current image. The benefit is the exposed finalized versions will be consistent to current image, and we won't need to use minimumProduction which puts us in a weird position

IIUC, we should only expose the finalizedLevel metrics for features that actually have a finalized level? If the feature does not have a finalized level, it does not have a finalizedLevel metric? I'm okay with this approach, but I think we'll just need to update the KIP since the specific language is:

The FinalizedLevel metric will report the finalized feature level for each production feature. If the feature level is not set, the metric will return a value of 0, since that means the feature is not enabled.

If the feature does not have a finalized level, it does not have a finalizedLevel metric?

Yes, that is what I meant

IIUC, we should only expose the finalizedLevel metrics for features that actually have a finalized level? If the feature does not have a finalized level, it does not have a finalizedLevel metric?

Yes. Let's implement this definition. Sorry for the confusion and back and forth. Please feel free to update the KIP. This also means that if the finalized feature version is "removed" (set to 0), the code needs to remove the associated metric.

This also means that if the finalized feature version is "removed" (set to 0), the code needs to remove the associated metric.

I guess this applies to all features besides metadata.version (whose minimum level is 7), and kraft.version (whose minimum is 0, but 0 does not mean that KRaft is "disabled" like other features). Since these two features are never part of the features image, I think their associated metrics should never be removed. Other features are removed from the image when their level is set to 0, so I think it makes sense to remove their metrics too. I'm going to update the KIP to document the exceptions for metadata + kraft version.

…etric

Yunyung · 2025-07-07T18:01:12Z

metadata/src/main/java/org/apache/kafka/image/loader/MetadataLoader.java

-        for (var feature : Feature.PRODUCTION_FEATURES) {
+        // Set all production feature levels from the image
+        for (var featureEntry : image.features().finalizedVersions().entrySet()) {
+            metrics.maybeRemoveFinalizedFeatureLevelMetrics(image.features().finalizedVersions());


This line should be outside the loop, right?

kevin-wu24 · 2025-07-08T16:42:17Z

A quick note about kraft.version:
Since d04efca omits the feature level record for kraft.version from the features image, we need to update that feature's metric in a special way (i.e. in handleCommit + handleLoadSnapshot).

I will also add unit tests to cover these cases.

kevin-wu24 · 2025-07-10T17:16:44Z

@chia7712 Are you able to take another look? If there is anything else let me know. Thanks for the reviews.

chia7712

@kevin-wu24 thanks for this patch. overall LGTM

chia7712 · 2025-07-10T23:49:43Z

metadata/src/main/java/org/apache/kafka/image/loader/metrics/MetadataLoaderMetrics.java

+     * @param newFinalizedLevels The new finalized feature levels from the features image
+     */
+    public void maybeRemoveFinalizedFeatureLevelMetrics(Map<String, Short> newFinalizedLevels) {
+        finalizedFeatureLevels.keySet().stream().filter(


Perhaps we could leverage iterator to avoid iterating through all items twice.

var iter = finalizedFeatureLevels.keySet().iterator(); while (iter.hasNext()) { var featureName = iter.next(); if (newFinalizedLevels.containsKey(featureName) || featureName.equals(MetadataVersion.FEATURE_NAME) || featureName.equals(KRaftVersion.FEATURE_NAME)) { continue; } removeFinalizedFeatureLevelMetric(featureName); iter.remove(); }

Yunyung

Thanks for the PR. Overall, LGTM. Two comments left

Yunyung · 2025-07-11T22:05:26Z

metadata/src/test/java/org/apache/kafka/image/loader/MetadataLoaderTest.java

+     */
+    @Test
+    public void testKRaftVersionFinalizedLevelMetric() throws Exception {
+        MockFaultHandler faultHandler = new MockFaultHandler("testLoadEmptyBatch");


Suggested change

MockFaultHandler faultHandler = new MockFaultHandler("testLoadEmptyBatch");

MockFaultHandler faultHandler = new MockFaultHandler("testKRaftVersionFinalizedLevelMetric");

Yunyung · 2025-07-11T22:41:10Z

metadata/src/test/java/org/apache/kafka/image/loader/MetadataLoaderTest.java

+     * @throws Exception
+     */
+    @Test
+    public void testKRaftVersionFinalizedLevelMetric() throws Exception {


It would be great to also cover handleLoadSnapshot (before calling handleCommit) and test whether KRaftVersion is set.

kevin-wu24 added 2 commits June 18, 2025 16:17

implementing generic feature level metrics

911d379

sanitizing metric names

ee9bd97

github-actions bot added triage PRs from the community core Kafka Broker kraft labels Jun 23, 2025

cleanup

0dde80b

chia7712 added the ci-approved label Jun 23, 2025

chia7712 reviewed Jun 23, 2025

View reviewed changes

github-actions bot removed the triage PRs from the community label Jun 24, 2025

frankvicky reviewed Jun 24, 2025

View reviewed changes

DL1231 reviewed Jun 24, 2025

View reviewed changes

FrankYang0529 reviewed Jun 24, 2025

View reviewed changes

m1a2st reviewed Jun 24, 2025

View reviewed changes

Yunyung reviewed Jun 24, 2025

View reviewed changes

code review

390773b

DL1231 approved these changes Jun 25, 2025

View reviewed changes

chia7712 approved these changes Jun 25, 2025

View reviewed changes

chia7712 reviewed Jun 25, 2025

View reviewed changes

ensure node metrics are shorts, not doubles

724506d

chia7712 reviewed Jun 26, 2025

View reviewed changes

do not register finalized level metric until after metadata is loaded

4ec1667

chia7712 reviewed Jun 27, 2025

View reviewed changes

cleanup

252596f

jsancio reviewed Jun 27, 2025

View reviewed changes

kevin-wu24 added 2 commits June 27, 2025 14:15

fixing last commit

18145d7

code review

757cc94

features without a finalized level from the image should not have a m…

4c5a871

…etric

Yunyung reviewed Jul 7, 2025

View reviewed changes

kevin-wu24 added 3 commits July 7, 2025 13:12

fix last commit

ac83565

metadata version handled seperately from other features in image

31deab1

cleanup last commit

9bf79c8

load control records to record kraft.version finalized level for metric

19a36ad

chia7712 reviewed Jul 10, 2025

View reviewed changes

code review

a8b26cc

Yunyung reviewed Jul 11, 2025

View reviewed changes

		builder.append(word.substring(0, 1).toUpperCase(Locale.ROOT))
		.append(word.substring(1).toLowerCase(Locale.ROOT));


		);

		try (NodeMetrics ignored = new NodeMetrics(metrics, true)) {

	assertEquals(MINIMUM_VERSION,
	loader.metrics().currentMetadataVersion());

	// Metric description is not use for metric name equality
	// Metric description is not used for metric name equality

	MockFaultHandler faultHandler = new MockFaultHandler("testLoadEmptyBatch");
	MockFaultHandler faultHandler = new MockFaultHandler("testKRaftVersionFinalizedLevelMetric");

KAFKA-19254: Add generic feature level metrics #20021

Are you sure you want to change the base?

KAFKA-19254: Add generic feature level metrics #20021

Uh oh!

Conversation

kevin-wu24 commented Jun 23, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chia7712 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

frankvicky left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DL1231 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

FrankYang0529 left a comment

Choose a reason for hiding this comment

Uh oh!

FrankYang0529 Jun 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kevin-wu24 Jun 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

DL1231 left a comment

Choose a reason for hiding this comment

Uh oh!

kevin-wu24 commented Jun 25, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kevin-wu24 Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jsancio commented Jun 26, 2025

Uh oh!

chia7712 commented Jun 26, 2025

Uh oh!

kevin-wu24 commented Jun 26, 2025

Uh oh!

Choose a reason for hiding this comment

kevin-wu24 commented Jun 23, 2025 •

edited by github-actions bot

Loading

FrankYang0529 Jun 24, 2025 •

edited

Loading

kevin-wu24 Jun 24, 2025 •

edited

Loading

kevin-wu24 Jun 25, 2025 •

edited

Loading

kevin-wu24 Jun 27, 2025 •

edited

Loading

kevin-wu24 Jun 27, 2025 •

edited

Loading