Skip to content

fix: order Prometheus histogram metrics correctly #5399

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

robin-ede
Copy link

@robin-ede robin-ede commented Jul 1, 2025

What does this PR address?

Prometheus histograms must list every _bucket before their _count and _sum.
Out-of-order output breaks parsers like fluent-bit.
This patch guarantees the correct order for all BentoML histogram metrics.

Changes

  • PrometheusClient
    • Add _fix_histogram_ordering()
      • Sorts buckets by le, then appends _count, _sum.
      • Transparent in single- and multi-process modes..

Checklist

  • Conventional commit title.
  • pre-commit run -a passes.
  • No doc updates needed.
  • Tests run.

Fixes #5386

Ensure _bucket metrics are emitted before _count and _sum, per the
Prometheus text-format spec, so scrapers like fluent-bit can parse
/metrics without errors.

• add _fix_histogram_ordering() in PrometheusClient
• sort buckets by ascending , then append count → sum
• preserves non-histogram metrics; works in multi-/single-process modes
• add unit tests for ordering and regression

Fixes bentoml#5386
@robin-ede robin-ede requested a review from a team as a code owner July 1, 2025 14:21
@robin-ede robin-ede requested review from jianshen92 and removed request for a team July 1, 2025 14:21
pre-commit-ci bot and others added 5 commits July 1, 2025 14:21
│                                                                                                                      │
│   Fixed exception handling in extract_le_value function by choosing more specific ValueError and TypeError           │
│   exceptions instead of bare except.
@frostming
Copy link
Collaborator

What caused us to have to use this dirty method to fix it? How does it produce output in wrong order?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug: The prometheus format output is not standard
3 participants