Skip to content

Add last_over_time #126650

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Apr 17, 2025
Merged

Add last_over_time #126650

merged 6 commits into from
Apr 17, 2025

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Apr 11, 2025

This PR introduces a time-series aggregation function that collects the last value of each time series within each grouping.

For example:

TS index 
| STATS sum(last_over_time(memory_usage)) BY cluster, bucket(@timestamp, 1minute)

This aggregate function should support all data types, but currently, it only supports int, long, float, and double. Support for additional data types will be added later. Another issue is the lack of unit test infra for time-series and time-series aggregators. I plan to address them soon.

@dnhatn
Copy link
Member Author

dnhatn commented Apr 11, 2025

I opened this PR for discussion. Perhaps we should implement last(field, @timestamp), where @timestamp is optional for regular aggregation, then delegate last_over_time to it, similar to how max_over_time delegates to max.

long timestamp = timestamps.getLong(timestamps.getFirstValueIndex(otherPosition));
int firstIndex = values.getFirstValueIndex(otherPosition);
for (int i = 0; i < valueCount; i++) {
current.maybeCollect(groupId, timestamp, values.get$Type$(firstIndex + i));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lemme see if I understand this. Should this be returning a boolean to indicate if we found the last value? Assuming we're scanning in reverse chrono order, the first match should stop iteration over the values in the bucket.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have a mechanism to signal this. I renamed maybeCollect to collectValue to make it less confusing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a comment about this, seems like an important optimization - e.g. for aggs with TBUCKET(1 day), we may scan 1 value vs 1m values, iiuc?

@kkrik-es
Copy link
Contributor

I opened this PR for discussion. Perhaps we should implement last(field, @timestamp), where @timestamp is optional for regular aggregation, then delegate last_over_time to it, similar to how max_over_time delegates to max.

Right, I wonder if the following query makes sense:

TS metrics | STATS last(last_over_time(foo)) BY TBUCKET(1hour), hostname 

which should be equivalent to

TS metrics | STATS last(foo) BY TBUCKET(1hour), hostname 

I can't think of other uses of last vs last_over_time, rate and the other {agg}_over_time functions return a value for the entire interval. But then, last_over_time would have to return the timestamp of the last value, to be used in the wrapping last function.

@dnhatn
Copy link
Member Author

dnhatn commented Apr 11, 2025

PromQL: last_over_time(range-vector): the most recent point value in the specified interval.

@kkrik-es Thank you for the feedback. I was considering implementing last_over_time to be equivalent to the last_over_time function in PromQL. I am not sure about the output of your proposal last(last_over_time).

agg_over_time

@dnhatn dnhatn requested a review from kkrik-es April 11, 2025 14:41
@kkrik-es
Copy link
Contributor

Right the last would calculate the last value of the per-tsid last values, e.g. per host. I don't see it in the listed functions, not sure if it's supported in PromQL.. @felixbarny too.

@kkrik-es
Copy link
Contributor

Hey Nhat, is this blocked? Shall we just move fwd with what you have, maybe adding comments to revisit the logic and terminate early, if possible?

@dnhatn
Copy link
Member Author

dnhatn commented Apr 17, 2025

Hey Nhat, is this blocked? Shall we just move fwd with what you have, maybe adding comments to revisit the logic and terminate early, if possible?

@kkrik-es I added a comment in 9e57f6c. Can you take a look? Thanks.

@dnhatn dnhatn marked this pull request as ready for review April 17, 2025 06:53
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@dnhatn
Copy link
Member Author

dnhatn commented Apr 17, 2025

@kkrik-es @martijnvg Thanks for reviews.

@dnhatn dnhatn merged commit 9d5df19 into elastic:main Apr 17, 2025
16 of 17 checks passed
@dnhatn dnhatn deleted the last_over_time branch April 17, 2025 23:14
@dnhatn dnhatn mentioned this pull request Apr 28, 2025
13 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants