Skip to content

Add avg_over_time #126572

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 11, 2025
Merged

Add avg_over_time #126572

merged 3 commits into from
Apr 11, 2025

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Apr 9, 2025

This change adds the avg_over_time aggregation for time series indices. Similar to other time series aggregations, we need to translate avg_over_time into regular aggregations. There are two options for this translation:

  1. Translate avg_over_time to EVAL div(sum_over_time, count_over_time), then translate sum_over_time and count_over_time to sum and count.
  2. Translate avg_over_time directly to avg, and then to div(sum, count).

This PR chooses the latter approach. Below is an example:

TS k8s 
| STATS sum(avg_over_time(memory_usage)) BY host, bucket(@timestamp, 1minute)

translates to:

TS k8s
| STATS avg_memory_usage = avg(memory_usage), host_values=VALUES(host) BY _tsid, time_bucket=bucket(@timestamp, 1minute)
| STATS sum(avg_memory_usage) BY host_values, time_bucket

and then:

TS k8s
| STATS sum_memory_usage = sum(memory_usage), count_memory_usage = count(memory_usage), host_values=VALUES(host) BY _tsid, time_bucket=bucket(@timestamp, 1minute)
| EVAL avg_memory_usage = sum_memory_usage / count_memory_usage
| STATS sum(avg_memory_usage) BY host_values, time_bucket

Since we need to substitute AVG with SUM and COUNT after translation, we need to call SubstituteSurrogates twice in LogicalPlanOptimizer. If there is a performance impact, we can move this rule to TranslateTimeSeriesAggregate.

@dnhatn dnhatn force-pushed the avg_over_time branch 2 times, most recently from ca9a217 to 38c95b3 Compare April 10, 2025 01:41
@dnhatn dnhatn marked this pull request as ready for review April 10, 2025 04:00
@elasticsearchmachine elasticsearchmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:StorageEngine labels Apr 10, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

MAX_OVER_TIME(Build.current().isSnapshot()),

/**
* Support avg_over_time aggregation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: add ", that gets evaluated per time-series".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ see 24ee2d3

Copy link
Contributor

@kkrik-es kkrik-es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So simple, well done.

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also played around with it locally. LGTM

@dnhatn dnhatn merged commit 1739049 into elastic:main Apr 11, 2025
17 checks passed
@dnhatn dnhatn deleted the avg_over_time branch April 11, 2025 03:38
@dnhatn
Copy link
Member Author

dnhatn commented Apr 11, 2025

@kkrik-es @martijnvg Thanks for the reviews.

@dnhatn dnhatn mentioned this pull request Apr 28, 2025
13 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >non-issue :StorageEngine/TSDB You know, for Metrics Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:StorageEngine v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants