Skip to content

[core][metri-agg/2] recommended for all (mostly) data type #52947

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: can-coreobs01
Choose a base branch
from

Conversation

can-anyscale
Copy link
Collaborator

@can-anyscale can-anyscale commented May 12, 2025

This series of PR addresses #47289. The context is that there are metrics such as ray_tasks and ray_actors produce a high volume of time series on prometheus. Our inspection indicates that this is because the high cardinality of the WorkerId field in these metrics in a high scale cluster. See https://docs.google.com/document/d/1AZVZQGGroSbV1w4KG4Vncc0L_tVdpfkM3ueGo_fD9-M/edit?tab=t.0 for the proposed solution.

This PR:

  • Small step to complete the RECOMMENDED aggregation level, by further support all metric types except for distributed data. Note that aggregation of distributed data at the node level from worker level are not well defined (or very complicated to be well defined), and the existing set of distributed data are already low cardinality, so I don't spend too much time to support this type of data here. Can be adjusted.

Test:

  • CI

@can-anyscale can-anyscale force-pushed the can-coreobs02 branch 3 times, most recently from b1926ae to ae9770d Compare May 12, 2025 22:36
@can-anyscale can-anyscale changed the title [core][obs/2] core_metric_cardinality_level for all metric type excep… [core][obs/2] drop worker_id labels for all metrics except for distributed data May 12, 2025
@can-anyscale can-anyscale changed the title [core][obs/2] drop worker_id labels for all metrics except for distributed data [core][obs/2] drop worker_id labels except for distributed data May 12, 2025
@can-anyscale can-anyscale force-pushed the can-coreobs01 branch 10 times, most recently from 73338a9 to 6360547 Compare May 13, 2025 17:46
@can-anyscale can-anyscale force-pushed the can-coreobs02 branch 3 times, most recently from bf49f72 to e7cefa7 Compare May 13, 2025 18:34
@can-anyscale can-anyscale changed the title [core][obs/2] drop worker_id labels except for distributed data [core][metri-agg/2] recommended for all (mostly) data type May 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant