Skip to content

Improve Performance of MakeLabelPairs #1734

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

kgeckhart
Copy link

@kgeckhart kgeckhart commented Feb 12, 2025

This PR refactors the internals of the Desc to create an optimized version of MakeLabelPairs by replacing the constLabels with a sorted labelPairs containing constant labels and variable labels. The Desc also carries an extra map so it is possible to insert the labelValues in to the proper sorted index on output. Benchmark results,

                             │ baseline.txt │               new.txt                │
                             │    sec/op    │    sec/op     vs base                │
_MakeLabelPairs/1_label-11      94.25n ± 1%   47.06n ±  1%  -50.07% (p=0.000 n=10)
_MakeLabelPairs/3_labels-11     262.9n ± 1%   119.8n ±  0%  -54.44% (p=0.000 n=10)
_MakeLabelPairs/10_labels-11   1005.0n ± 0%   372.9n ± 10%  -62.89% (p=0.000 n=10)
geomean                         292.0n        128.1n        -56.13%

                             │ baseline.txt │              new.txt               │
                             │     B/op     │    B/op     vs base                │
_MakeLabelPairs/1_label-11      136.00 ± 0%   96.00 ± 0%  -29.41% (p=0.000 n=10)
_MakeLabelPairs/3_labels-11      360.0 ± 0%   288.0 ± 0%  -20.00% (p=0.000 n=10)
_MakeLabelPairs/10_labels-11    1048.0 ± 0%   880.0 ± 0%  -16.03% (p=0.000 n=10)
geomean                          371.6        289.8       -22.02%

                             │ baseline.txt │              new.txt               │
                             │  allocs/op   │ allocs/op   vs base                │
_MakeLabelPairs/1_label-11       5.000 ± 0%   3.000 ± 0%  -40.00% (p=0.000 n=10)
_MakeLabelPairs/3_labels-11     11.000 ± 0%   7.000 ± 0%  -36.36% (p=0.000 n=10)
_MakeLabelPairs/10_labels-11     29.00 ± 0%   19.00 ± 0%  -34.48% (p=0.000 n=10)
geomean                          11.68        7.362       -36.99%

No longer sorting the immutable set of label names drops CPU usage dramatically and some memory. There's also a further benefit to memory by calling proto.String only once for the variable label names.

The tradeoff is a little bit more overhead in the Desc to store the full labelPairs and an index lookup map. I think this is a fair tradeoff to make because a single Desc instance is likely to be used for numerous calls to MakeLabelPairs.

Related to: #1702

@kgeckhart kgeckhart force-pushed the kgeckhart/improve-MakeLabelPairs-performance branch from 829ec00 to d049c34 Compare February 12, 2025 21:18
@kgeckhart kgeckhart marked this pull request as ready for review February 12, 2025 21:24
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is brilliant, thank you! Elegant and clean.

I proposed using slice vs map for ordering to extra efficiency. We could also keep sorting on MakeLabelPairs if that is too expensive on NewDesc or memory. Speaking of NewDesc efficiency...

The tradeoff is a little bit more overhead in the Desc to store the full labelPairs and an index lookup map. I think this is a fair tradeoff to make because a single Desc instance is likely to be used for numerous calls to MakeLabelPairs.

Generally it's fair I agree. However, we have many use cases of users using NewConst*Metric with dynamic (😱) NewDesc because the underlying metrics labels they report change. It's a niche and we could provide some optimizations for them later on (separate NewDesc, reusable Desc) but it would epic if we could measure NewDesc efficiency, at least to be aware of the overhead. I proposed a quick way for us to expand your benchmark for it.

To sum up, LGTM with nits, slice suggestion, new benchmark and request to redo benchmarks for both MakeLabelPairs and NewDesc. Should be quick to do -- thank you so much for helping! 💪🏽

@kgeckhart
Copy link
Author

Generally it's fair I agree. However, we have many use cases of users using NewConst*Metric with dynamic (😱) NewDesc because the underlying metrics labels they report change. It's a niche and we could provide some optimizations for them later on (separate NewDesc, reusable Desc) but it would epic if we could measure NewDesc efficiency, at least to be aware of the overhead. I proposed a quick way for us to expand your benchmark for it.

Great point, funny enough yet-another-cloudwatch-exporter was one such use case and refactoring to a reusable Desc approach led me here next. Thanks for all the feedback I'll clean it up and add a bench for Desc as well.

…nd add a benchmark for NewDesc

Signed-off-by: Kyle Eckhart <[email protected]>
@kgeckhart kgeckhart force-pushed the kgeckhart/improve-MakeLabelPairs-performance branch from 4dc5c04 to 6da4db6 Compare February 19, 2025 22:24
@kgeckhart
Copy link
Author

After doing some more benchmarking I think I'll try to find some further optimizations in NewDesc. The benchmark showed increases across the board as expected,

goos: darwin
goarch: arm64
pkg: github.com/prometheus/client_golang/prometheus
cpu: Apple M3 Pro
                     │ baseline.txt │              new.txt               │
                     │    sec/op    │   sec/op     vs base               │
NewDesc/labels=1-11    362.1n ±  4%   412.7n ± 6%  +13.94% (p=0.002 n=6)
NewDesc/labels=3-11    778.1n ±  1%   930.1n ± 8%  +19.53% (p=0.002 n=6)
NewDesc/labels=10-11   3.077µ ± 21%   3.834µ ± 1%  +24.60% (p=0.002 n=6)
geomean                953.6n         1.137µ       +19.28%

                     │ baseline.txt │               new.txt               │
                     │     B/op     │     B/op      vs base               │
NewDesc/labels=1-11      376.0 ± 0%     504.0 ± 0%  +34.04% (p=0.002 n=6)
NewDesc/labels=3-11      744.0 ± 0%    1064.0 ± 0%  +43.01% (p=0.002 n=6)
NewDesc/labels=10-11   3.520Ki ± 0%   4.488Ki ± 0%  +27.52% (p=0.002 n=6)
geomean                 1002.7        1.319Ki       +34.71%

                     │ baseline.txt │              new.txt              │
                     │  allocs/op   │ allocs/op   vs base               │
NewDesc/labels=1-11      12.00 ± 0%   15.00 ± 0%  +25.00% (p=0.002 n=6)
NewDesc/labels=3-11      20.00 ± 0%   27.00 ± 0%  +35.00% (p=0.002 n=6)
NewDesc/labels=10-11     51.00 ± 0%   72.00 ± 0%  +41.18% (p=0.002 n=6)
geomean                  23.05        30.78       +33.56%

The results from MakeLabelPairs didn't change much at all. It was really hard to quantify if the increases in NewDesc were offset by savings in MakeLabelPairs so I added another benchmark to simulate a more real world scenario of NewDesc -> NewConstMetric. I re-organized the results to be number of metrics created, labels with -row

goos: darwin
goarch: arm64
pkg: github.com/prometheus/client_golang/prometheus
cpu: Apple M3 Pro
        │ const-baseline.txt │            const-new.txt            │
        │       sec/op       │    sec/op     vs base               │
1 1              528.3n ± 1%   523.3n ±  1%        ~ (p=0.058 n=6)
1 3              1.100µ ± 1%   1.128µ ±  1%   +2.55% (p=0.002 n=6)
1 10             4.053µ ± 4%   4.356µ ±  1%   +7.48% (p=0.002 n=6)
2 1              676.7n ± 1%   653.2n ±  3%   -3.47% (p=0.002 n=6)
2 3              1.449µ ± 0%   1.325µ ±  2%   -8.59% (p=0.002 n=6)
2 10             5.062µ ± 0%   4.850µ ±  1%   -4.18% (p=0.002 n=6)
3 1              854.4n ± 4%   767.6n ±  2%  -10.15% (p=0.002 n=6)
3 3              1.767µ ± 1%   1.536µ ± 14%  -13.10% (p=0.002 n=6)
3 10             6.066µ ± 0%   5.336µ ±  3%  -12.03% (p=0.002 n=6)
5 1              1.176µ ± 1%   1.002µ ±  2%  -14.81% (p=0.002 n=6)
5 3              2.433µ ± 1%   1.983µ ±  2%  -18.50% (p=0.002 n=6)
5 10             8.022µ ± 4%   6.240µ ±  0%  -22.22% (p=0.002 n=6)
geomean          1.916µ        1.753µ         -8.54%

        │ const-baseline.txt │            const-new.txt            │
        │        B/op        │     B/op      vs base               │
1 1               696.0 ± 0%     784.0 ± 0%  +12.64% (p=0.002 n=6)
1 3             1.258Ki ± 0%   1.500Ki ± 0%  +19.25% (p=0.002 n=6)
1 10            4.816Ki ± 0%   5.605Ki ± 0%  +16.38% (p=0.002 n=6)
2 1              1016.0 ± 0%    1064.0 ± 0%   +4.72% (p=0.002 n=6)
2 3             1.789Ki ± 0%   1.961Ki ± 0%   +9.61% (p=0.002 n=6)
2 10            6.113Ki ± 0%   6.723Ki ± 0%   +9.97% (p=0.002 n=6)
3 1             1.305Ki ± 0%   1.312Ki ± 0%   +0.60% (p=0.002 n=6)
3 3             2.320Ki ± 0%   2.422Ki ± 0%   +4.38% (p=0.002 n=6)
3 10            7.410Ki ± 0%   7.840Ki ± 0%   +5.80% (p=0.002 n=6)
5 1             1.930Ki ± 0%   1.859Ki ± 0%   -3.64% (p=0.002 n=6)
5 3             3.383Ki ± 0%   3.344Ki ± 0%   -1.15% (p=0.002 n=6)
5 10            10.00Ki ± 0%   10.07Ki ± 0%   +0.70% (p=0.002 n=6)
geomean         2.520Ki        2.681Ki        +6.39%

        │ const-baseline.txt │           const-new.txt           │
        │     allocs/op      │ allocs/op   vs base               │
1 1               21.00 ± 0%   22.00 ± 0%   +4.76% (p=0.002 n=6)
1 3               35.00 ± 0%   38.00 ± 0%   +8.57% (p=0.002 n=6)
1 10              87.00 ± 0%   97.00 ± 0%  +11.49% (p=0.002 n=6)
2 1               30.00 ± 0%   29.00 ± 0%   -3.33% (p=0.002 n=6)
2 3               50.00 ± 0%   49.00 ± 0%   -2.00% (p=0.002 n=6)
2 10              123.0 ± 0%   122.0 ± 0%   -0.81% (p=0.002 n=6)
3 1               39.00 ± 0%   36.00 ± 0%   -7.69% (p=0.002 n=6)
3 3               65.00 ± 0%   60.00 ± 0%   -7.69% (p=0.002 n=6)
3 10              159.0 ± 0%   147.0 ± 0%   -7.55% (p=0.002 n=6)
5 1               57.00 ± 0%   50.00 ± 0%  -12.28% (p=0.002 n=6)
5 3               95.00 ± 0%   82.00 ± 0%  -13.68% (p=0.002 n=6)
5 10              231.0 ± 0%   197.0 ± 0%  -14.72% (p=0.002 n=6)
geomean           65.24        62.58        -4.09%

CPU and allocs/op increase if you only use the desc to create a single metric but beyond that you get savings. B/op only starts to show savings when you get to 5 metrics created 😬 . I'll take a deeper look tomorrow but my first inclination is that it's from this block https://github.com/kgeckhart/client_golang/blob/6da4db6dc2f8348a455b3bca5f657b47a3c02b4c/prometheus/desc.go#L174-L178 because I can't re-use the pre-created dto.LabelPair and have to create a new one in MakeLabelPairs

@bwplotka
Copy link
Member

Thanks!

Naive math on how much NewDesc is worse vs MakeLabelPairs is better, easily pays of on CPU, but it is interesting on the alloc side and this is what we see on your constMetricFlow bench.

NewDesc/labels=1-11    362.1n ±  4%   412.7n ± 6%  +13.94% (p=0.002 n=6)

vs

_MakeLabelPairs/1_label-11      94.25n ± 1%   47.06n ±  1%  -50.07% (p=0.000 n=10)
NewDesc/labels=1-11      376.0 ± 0%     504.0 ± 0%  +34.04% (p=0.002 n=6)

vs 

_MakeLabelPairs/1_label-11      136.00 ± 0%   96.00 ± 0%  -29.41% (p=0.000 n=10)

So indeed I would focus on allocs if NewDesc...

}
}

func BenchmarkConstMetricFlow(b *testing.B) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is important for exporters, but if you go this route I would also add benchmark to show the benefit of "normal" use (e.g. NewCounter and WithLabelValues, which is 80% of use). This will perhaps motivate us to accept certain penalty for this niche cases.

For exporters we really need to redesign something around constant metrics. They are too expensive to be created on every scrape, especially if users (majority) expects predictable memory and CPU from k8s containers. I tried in the past to build some dto based cache layer google/cadvisor#2974 but it needs more work.

What I'm saying is that we could make this a bit slower and work with exporters on a long term plan. We could also give option to users (faster NewDesc vs slower one and more work later). Let's check what we could squeeze more.

Would some smart "pre alloc pool" help? (for predictive memory I believe yes)

@bwplotka
Copy link
Member

OK I spend some time to see if trying to get rid of dto.LabelPair pointers is beneficial. Feel free to cherry-pick the PR commits or ignore (#1746). In the description I added benchmark results. Not ideal but getting in good direction if we continue moving dto.Metric/LabelPair use out of client_golang. But this is ofc a bigger work.

@kgeckhart
Copy link
Author

@bwplotka had some stuff pop up that distracted me for a bit. I took another stab with a different solution to producing the sorted labels and I'm pretty happy with the results,

                     │ baseline.txt │              new.txt              │
                     │    sec/op    │   sec/op     vs base              │
NewDesc/labels=1-11    362.1n ±  4%   330.4n ± 0%  -8.78% (p=0.002 n=6)
NewDesc/labels=3-11    778.1n ±  1%   721.1n ± 1%  -7.33% (p=0.002 n=6)
NewDesc/labels=10-11   3.077µ ± 21%   3.065µ ± 1%       ~ (p=1.000 n=6)
geomean                953.6n         900.4n       -5.57%

                     │ baseline.txt │               new.txt               │
                     │     B/op     │     B/op      vs base               │
NewDesc/labels=1-11      376.0 ± 0%     432.0 ± 0%  +14.89% (p=0.002 n=6)
NewDesc/labels=3-11      744.0 ± 0%     896.0 ± 0%  +20.43% (p=0.002 n=6)
NewDesc/labels=10-11   3.520Ki ± 0%   3.996Ki ± 0%  +13.54% (p=0.002 n=6)
geomean                 1002.7        1.138Ki       +16.25%

                     │ baseline.txt │              new.txt               │
                     │  allocs/op   │ allocs/op   vs base                │
NewDesc/labels=1-11      12.00 ± 0%   12.00 ± 0%       ~ (p=1.000 n=6) ¹
NewDesc/labels=3-11      20.00 ± 0%   20.00 ± 0%       ~ (p=1.000 n=6) ¹
NewDesc/labels=10-11     51.00 ± 0%   51.00 ± 0%       ~ (p=1.000 n=6) ¹
geomean                  23.05        23.05       +0.00%
¹ all samples are equal

                            │ baseline.txt │              new.txt               │
                            │    sec/op    │   sec/op     vs base               │
MakeLabelPairs/labels=1-11     90.21n ± 8%   44.87n ± 4%  -50.26% (p=0.002 n=6)
MakeLabelPairs/labels=3-11     251.2n ± 1%   114.5n ± 1%  -54.42% (p=0.002 n=6)
MakeLabelPairs/labels=10-11    856.6n ± 2%   341.4n ± 0%  -60.15% (p=0.002 n=6)
geomean                        268.8n        120.6n       -55.13%

                            │ baseline.txt │              new.txt              │
                            │     B/op     │    B/op     vs base               │
MakeLabelPairs/labels=1-11     136.00 ± 0%   96.00 ± 0%  -29.41% (p=0.002 n=6)
MakeLabelPairs/labels=3-11      360.0 ± 0%   288.0 ± 0%  -20.00% (p=0.002 n=6)
MakeLabelPairs/labels=10-11    1144.0 ± 0%   960.0 ± 0%  -16.08% (p=0.002 n=6)
geomean                         382.6        298.3       -22.04%

                            │ baseline.txt │              new.txt              │
                            │  allocs/op   │ allocs/op   vs base               │
MakeLabelPairs/labels=1-11      5.000 ± 0%   3.000 ± 0%  -40.00% (p=0.002 n=6)
MakeLabelPairs/labels=3-11     11.000 ± 0%   7.000 ± 0%  -36.36% (p=0.002 n=6)
MakeLabelPairs/labels=10-11     32.00 ± 0%   21.00 ± 0%  -34.38% (p=0.002 n=6)
geomean                         12.07        7.612       -36.96%

The flow test has drops in CPU + alloc/op across the board and B/op starts to show improvements much earlier

        │ const-baseline.txt │            const-new.txt            │
        │       sec/op       │    sec/op     vs base               │
1 1              528.3n ± 1%   451.0n ±  0%  -14.65% (p=0.002 n=6)
1 3             1099.5n ± 1%   935.1n ±  1%  -14.95% (p=0.002 n=6)
1 10             4.053µ ± 4%   3.612µ ±  0%  -10.87% (p=0.002 n=6)
2 1              676.7n ± 1%   571.8n ±  1%  -15.50% (p=0.002 n=6)
2 3              1.449µ ± 0%   1.135µ ±  0%  -21.64% (p=0.002 n=6)
2 10             5.062µ ± 0%   4.105µ ±  0%  -18.90% (p=0.002 n=6)
3 1              854.4n ± 4%   693.8n ±  7%  -18.79% (p=0.002 n=6)
3 3              1.767µ ± 1%   1.340µ ±  1%  -24.19% (p=0.002 n=6)
3 10             6.066µ ± 0%   4.588µ ±  3%  -24.37% (p=0.002 n=6)
5 1             1176.0n ± 1%   933.1n ± 12%  -20.65% (p=0.002 n=6)
5 3              2.433µ ± 1%   1.739µ ±  3%  -28.51% (p=0.002 n=6)
5 10             8.022µ ± 4%   5.568µ ±  4%  -30.59% (p=0.002 n=6)
geomean          1.916µ        1.524µ        -20.50%

        │ const-baseline.txt │           const-new.txt            │
        │        B/op        │     B/op      vs base              │
1 1               696.0 ± 0%     712.0 ± 0%  +2.30% (p=0.002 n=6)
1 3             1.258Ki ± 0%   1.336Ki ± 0%  +6.21% (p=0.002 n=6)
1 10            4.816Ki ± 0%   5.113Ki ± 0%  +6.16% (p=0.002 n=6)
2 1              1016.0 ± 0%     992.0 ± 0%  -2.36% (p=0.002 n=6)
2 3             1.789Ki ± 0%   1.797Ki ± 0%  +0.44% (p=0.002 n=6)
2 10            6.113Ki ± 0%   6.230Ki ± 0%  +1.92% (p=0.002 n=6)
3 1             1.305Ki ± 0%   1.242Ki ± 0%  -4.79% (p=0.002 n=6)
3 3             2.320Ki ± 0%   2.258Ki ± 0%  -2.69% (p=0.002 n=6)
3 10            7.410Ki ± 0%   7.348Ki ± 0%  -0.84% (p=0.002 n=6)
5 1             1.930Ki ± 0%   1.789Ki ± 0%  -7.29% (p=0.002 n=6)
5 3             3.383Ki ± 0%   3.180Ki ± 0%  -6.00% (p=0.002 n=6)
5 10           10.004Ki ± 0%   9.582Ki ± 0%  -4.22% (p=0.002 n=6)
geomean         2.520Ki        2.494Ki       -1.02%

        │ const-baseline.txt │           const-new.txt           │
        │     allocs/op      │ allocs/op   vs base               │
1 1               21.00 ± 0%   19.00 ± 0%   -9.52% (p=0.002 n=6)
1 3               35.00 ± 0%   31.00 ± 0%  -11.43% (p=0.002 n=6)
1 10              87.00 ± 0%   76.00 ± 0%  -12.64% (p=0.002 n=6)
2 1               30.00 ± 0%   26.00 ± 0%  -13.33% (p=0.002 n=6)
2 3               50.00 ± 0%   42.00 ± 0%  -16.00% (p=0.002 n=6)
2 10              123.0 ± 0%   101.0 ± 0%  -17.89% (p=0.002 n=6)
3 1               39.00 ± 0%   33.00 ± 0%  -15.38% (p=0.002 n=6)
3 3               65.00 ± 0%   53.00 ± 0%  -18.46% (p=0.002 n=6)
3 10              159.0 ± 0%   126.0 ± 0%  -20.75% (p=0.002 n=6)
5 1               57.00 ± 0%   47.00 ± 0%  -17.54% (p=0.002 n=6)
5 3               95.00 ± 0%   75.00 ± 0%  -21.05% (p=0.002 n=6)
5 10              231.0 ± 0%   176.0 ± 0%  -23.81% (p=0.002 n=6)
geomean           65.24        54.42       -16.59%

I think the fastdto solution will provide even more benefits if we go the route of moving over to using it through out client_golang. I was having a hard time including it since it sadly adds more allocations in MakeLabelPairs to convert.

@@ -133,7 +149,7 @@ func (v2) NewDesc(fqName, help string, variableLabels ConstrainableLabels, const
d.err = fmt.Errorf("%q is not a valid label name for metric %q", label, fqName)
return d
}
labelNames = append(labelNames, "$"+label)
labelNames = append(labelNames, label)
Copy link
Author

@kgeckhart kgeckhart Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is what enables me to drop an extra sort. I can't see the reason variable labels need to be prefixed with a $ here but there might be something I'm missing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe to mark is a non-constant? But don't remember why it was done - we might want to blame/track why it was added, although if tests are passing... likely fine (:

@vesari
Copy link
Contributor

vesari commented Apr 14, 2025

@kgeckhart, thank you very much for this! We’re still evaluating whether to introduce this change and if so, whether we should do that behind a feature flag first, possibly to gather some data on how it would impact users in “real life” production scenarios. In the meantime, could you tell us more about your production setup? Thanks in advance.

@bwplotka
Copy link
Member

Benchmarks are looking good!

I see some allocs on NewDesc as the only "downside":

                     │ baseline.txt │               new.txt               │
                     │     B/op     │     B/op      vs base               │
NewDesc/labels=1-11      376.0 ± 0%     432.0 ± 0%  +14.89% (p=0.002 n=6)
NewDesc/labels=3-11      744.0 ± 0%     896.0 ± 0%  +20.43% (p=0.002 n=6)
NewDesc/labels=10-11   3.520Ki ± 0%   3.996Ki ± 0%  +13.54% (p=0.002 n=6)
geomean                 1002.7        1.138Ki       +16.25%

It looks like we trade it with less work/memory on benefits on MakeLabelPairs which is generally more often done than NewDesc generally (exceptions like exporters might be opposite):

                            │ baseline.txt │              new.txt              │
                            │     B/op     │    B/op     vs base               │
MakeLabelPairs/labels=1-11     136.00 ± 0%   96.00 ± 0%  -29.41% (p=0.002 n=6)
MakeLabelPairs/labels=3-11      360.0 ± 0%   288.0 ± 0%  -20.00% (p=0.002 n=6)
MakeLabelPairs/labels=10-11    1144.0 ± 0%   960.0 ± 0%  -16.08% (p=0.002 n=6)
geomean                         382.6        298.3       -22.04%

So the main question is.. is this acceptable and how to find that answer. (:

I am slowly preparing for paternity leave, so unfortunately I can't promise to review in detail and try to answer this question. Leaving this to other maintainers @vesari @ArthurSens @kakkoyun

As @vesari mentioned, perhaps trying to put this choice under some feature flag would make it easier for community to find this answer. I only wonder if feature flags are feasible on such a low-level. Even adding various data-structures for alternative implementations will add overhead in this hot path we are touching.

Another way to try this is to get some big exporter running on prod adopt this change and macro-benchmark (e.g. cadvisor/node exporter).

@kgeckhart
Copy link
Author

In the meantime, could you tell us more about your production setup?

I think #1702 does a great job of breaking down the overhead incurred by MakeLabelPairs for the windows exporter. It creates quite a few metric descriptions once and incurs the MakeLabelPairs overhead many times as it creates metric instances for collection. I work on YACE (https://github.com/prometheus-community/yet-another-cloudwatch-exporter/) and in our production workloads MakeLabelPairs is the top function driving allocations,
image

As @vesari mentioned, perhaps trying to put this choice under some feature flag would make it easier for community to find this answer. I only wonder if feature flags are feasible on such a low-level. Even adding various data-structures for alternative implementations will add overhead in this hot path we are touching.

I'll take a pass tomorrow to see if there might be an option here. I have similar concerns over being able to flag something like this.

Another way to try this is to get some big exporter running on prod adopt this change and macro-benchmark (e.g. cadvisor/node exporter).

I'm happy to update YACE and test it out against our production workloads to see the potential benefit. The stackdriver exporter is a prime example of an exporter that could be negatively impacted because it creates a description and only uses it to create a single metric. I work on this exporter and we happen to run it in our production environment as well. @jkroepke might also be willing to test this out in the windows exporter as the original issue reporter.

@jkroepke
Copy link
Member

@jkroepke might also be willing to test this out in the windows exporter as the original issue reporter.

@kgeckhart Sure. Do you have an replace directive for the go.mod that leads to your fork? If not, It's fine.

@jkroepke
Copy link
Member

Okay, I tested https://github.com/prometheus/client_golang/tree/v1.22.0 vs kgeckhart@e87a0ad with prometheus-community/windows_exporter@85455bd


https://github.com/prometheus/client_golang/tree/v1.22.0

image

profile.pb.gz


kgeckhart@e87a0ad

image

profile.pb.gz


@kgeckhart
Copy link
Author

Okay, I tested https://github.com/prometheus/client_golang/tree/v1.22.0 vs kgeckhart@e87a0ad with prometheus-community/windows_exporter@85455bd

Thanks for testing! The results look really promising IMO. I'm still planning to test it with YACE + stackdriver exporter but it might take me some more time with the coming holiday.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants