-
Notifications
You must be signed in to change notification settings - Fork 19
Description
As part of implementing fluent/fluent-bit#10651 I discovered that if you use a histogram metric with whole number bucket values over a certain size, they start to suffer from precision loss due to the digit limit when formatting the double values the buckets are defined with.
e.g. with this as setup:
struct cmt_histogram_buckets *input_record_buckets = \
cmt_histogram_buckets_create_size((double[]){ 100, 1024, 2048, 4096,
100 * 1024, 1024 * 1024, 4 * 1024 * 1024,
10 * 1024 * 1024}, 8);This is what comes out in the prometheus scrape
# HELP fluentbit_input_record_sizes Histogram of the size of input records
# TYPE fluentbit_input_record_sizes histogram
fluentbit_input_record_sizes_bucket{le="0.0",name="tail.0"} 0
fluentbit_input_record_sizes_bucket{le="100.0",name="tail.0"} 0
fluentbit_input_record_sizes_bucket{le="1024.0",name="tail.0"} 1
fluentbit_input_record_sizes_bucket{le="2048.0",name="tail.0"} 2
fluentbit_input_record_sizes_bucket{le="4096.0",name="tail.0"} 3
fluentbit_input_record_sizes_bucket{le="102400.0",name="tail.0"} 5
fluentbit_input_record_sizes_bucket{le="1.04858e+06",name="tail.0"} 5
fluentbit_input_record_sizes_bucket{le="4.1943e+06",name="tail.0"} 5
fluentbit_input_record_sizes_bucket{le="+Inf",name="tail.0"} 0
fluentbit_input_record_sizes_sum{name="tail.0"} 48412
fluentbit_input_record_sizes_count{name="tail.0"} 5
As best I can tell this stems from this line (and presumably some default precision for the %g printf specifier)::
cmetrics/src/cmt_encode_prometheus.c
Line 311 in ab80dd0
| len = snprintf(str, 64, "%g", val); |
Extra info
In the Prometheus text format docs/spec, as best as I can see, there's no specific stipulation for type or formatting of the le labels: https://prometheus.io/docs/instrumenting/exposition_formats/#histograms-and-summaries . The only restrictions are the general ones placed on label values:
label_valuecan be any sequence of UTF-8 characters, but the backslash (\), double-quote ("), and line feed (\n) characters have to be escaped as\\,\", and\n, respectively
In general, from what I've personally seen so far at least, metric tools don't really give you any numerical or mathematical means of reasoning about these bucket values, given that the majority of metric querying etc. seems to work with string-based searching/filtering anyway - but I could be wrong on this.
As another C library reference, In the DigitalOcean prometheus C library, sprintf with %g is also used, so presumably it would suffer the same issue:
https://github.com/digitalocean/prometheus-client-c/blob/c57034d196582d99267d027abb52a05a55dc07f6/prom/src/prom_metric_sample_histogram.c#L502-L509
In the OpenTelemetry project, the buckets are similarly defined as double values:
https://github.com/open-telemetry/opentelemetry-proto/blob/8672494217bfc858e2a82a4e8c623d4a5530473a/opentelemetry/proto/metrics/v1/metrics.proto#L554-L568
There is/was an IntegerHistogram type, but this was for integer observation values, and ironically it seems it has/had double bucket boundaries anyway, and they decided to deprecate it (see open-telemetry/opentelemetry-proto#257, open-telemetry/opentelemetry-proto#270)
Info on %g specifier: