-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Improve Performance of MakeLabelPairs #1734
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Improve Performance of MakeLabelPairs #1734
Conversation
Signed-off-by: Kyle Eckhart <[email protected]>
829ec00
to
d049c34
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is brilliant, thank you! Elegant and clean.
I proposed using slice vs map for ordering to extra efficiency. We could also keep sorting on MakeLabelPairs if that is too expensive on NewDesc or memory. Speaking of NewDesc efficiency...
The tradeoff is a little bit more overhead in the Desc to store the full labelPairs and an index lookup map. I think this is a fair tradeoff to make because a single Desc instance is likely to be used for numerous calls to MakeLabelPairs.
Generally it's fair I agree. However, we have many use cases of users using NewConst*Metric
with dynamic (😱) NewDesc
because the underlying metrics labels they report change. It's a niche and we could provide some optimizations for them later on (separate NewDesc, reusable Desc) but it would epic if we could measure NewDesc
efficiency, at least to be aware of the overhead. I proposed a quick way for us to expand your benchmark for it.
To sum up, LGTM with nits, slice suggestion, new benchmark and request to redo benchmarks for both MakeLabelPairs and NewDesc. Should be quick to do -- thank you so much for helping! 💪🏽
Great point, funny enough yet-another-cloudwatch-exporter was one such use case and refactoring to a reusable Desc approach led me here next. Thanks for all the feedback I'll clean it up and add a bench for Desc as well. |
…nd add a benchmark for NewDesc Signed-off-by: Kyle Eckhart <[email protected]>
4dc5c04
to
6da4db6
Compare
After doing some more benchmarking I think I'll try to find some further optimizations in
The results from
CPU and allocs/op increase if you only use the desc to create a single metric but beyond that you get savings. B/op only starts to show savings when you get to 5 metrics created 😬 . I'll take a deeper look tomorrow but my first inclination is that it's from this block https://github.com/kgeckhart/client_golang/blob/6da4db6dc2f8348a455b3bca5f657b47a3c02b4c/prometheus/desc.go#L174-L178 because I can't re-use the pre-created |
Thanks! Naive math on how much NewDesc is worse vs MakeLabelPairs is better, easily pays of on CPU, but it is interesting on the alloc side and this is what we see on your constMetricFlow bench.
So indeed I would focus on allocs if NewDesc... |
} | ||
} | ||
|
||
func BenchmarkConstMetricFlow(b *testing.B) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is important for exporters, but if you go this route I would also add benchmark to show the benefit of "normal" use (e.g. NewCounter and WithLabelValues, which is 80% of use). This will perhaps motivate us to accept certain penalty for this niche cases.
For exporters we really need to redesign something around constant metrics. They are too expensive to be created on every scrape, especially if users (majority) expects predictable memory and CPU from k8s containers. I tried in the past to build some dto based cache layer google/cadvisor#2974 but it needs more work.
What I'm saying is that we could make this a bit slower and work with exporters on a long term plan. We could also give option to users (faster NewDesc vs slower one and more work later). Let's check what we could squeeze more.
Would some smart "pre alloc pool" help? (for predictive memory I believe yes)
OK I spend some time to see if trying to get rid of |
…for ordering Signed-off-by: Kyle Eckhart <[email protected]>
Signed-off-by: Kyle Eckhart <[email protected]>
@bwplotka had some stuff pop up that distracted me for a bit. I took another stab with a different solution to producing the sorted labels and I'm pretty happy with the results,
The flow test has drops in CPU + alloc/op across the board and B/op starts to show improvements much earlier
I think the fastdto solution will provide even more benefits if we go the route of moving over to using it through out |
@@ -133,7 +149,7 @@ func (v2) NewDesc(fqName, help string, variableLabels ConstrainableLabels, const | |||
d.err = fmt.Errorf("%q is not a valid label name for metric %q", label, fqName) | |||
return d | |||
} | |||
labelNames = append(labelNames, "$"+label) | |||
labelNames = append(labelNames, label) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is what enables me to drop an extra sort. I can't see the reason variable labels need to be prefixed with a $
here but there might be something I'm missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe to mark is a non-constant? But don't remember why it was done - we might want to blame/track why it was added, although if tests are passing... likely fine (:
@kgeckhart, thank you very much for this! We’re still evaluating whether to introduce this change and if so, whether we should do that behind a feature flag first, possibly to gather some data on how it would impact users in “real life” production scenarios. In the meantime, could you tell us more about your production setup? Thanks in advance. |
Benchmarks are looking good! I see some allocs on
It looks like we trade it with less work/memory on benefits on
So the main question is.. is this acceptable and how to find that answer. (: I am slowly preparing for paternity leave, so unfortunately I can't promise to review in detail and try to answer this question. Leaving this to other maintainers @vesari @ArthurSens @kakkoyun As @vesari mentioned, perhaps trying to put this choice under some feature flag would make it easier for community to find this answer. I only wonder if feature flags are feasible on such a low-level. Even adding various data-structures for alternative implementations will add overhead in this hot path we are touching. Another way to try this is to get some big exporter running on prod adopt this change and macro-benchmark (e.g. cadvisor/node exporter). |
I think #1702 does a great job of breaking down the overhead incurred by
I'll take a pass tomorrow to see if there might be an option here. I have similar concerns over being able to flag something like this.
I'm happy to update YACE and test it out against our production workloads to see the potential benefit. The stackdriver exporter is a prime example of an exporter that could be negatively impacted because it creates a description and only uses it to create a single metric. I work on this exporter and we happen to run it in our production environment as well. @jkroepke might also be willing to test this out in the windows exporter as the original issue reporter. |
@kgeckhart Sure. Do you have an replace directive for the go.mod that leads to your fork? If not, It's fine. |
Thanks for testing! The results look really promising IMO. I'm still planning to test it with YACE + stackdriver exporter but it might take me some more time with the coming holiday. |
This PR refactors the internals of the
Desc
to create an optimized version ofMakeLabelPairs
by replacing theconstLabels
with a sortedlabelPairs
containing constant labels and variable labels. The Desc also carries an extra map so it is possible to insert the labelValues in to the proper sorted index on output. Benchmark results,No longer sorting the immutable set of label names drops CPU usage dramatically and some memory. There's also a further benefit to memory by calling
proto.String
only once for the variable label names.The tradeoff is a little bit more overhead in the
Desc
to store the full labelPairs and an index lookup map. I think this is a fair tradeoff to make because a singleDesc
instance is likely to be used for numerous calls toMakeLabelPairs
.Related to: #1702