Skip to content

Optimize time-series source operator #127095

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 22, 2025
Merged

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Apr 19, 2025

This query against the TSDB track took 50 seconds and was reduced to 19 seconds with this changes.

TS tsdb 
| STATS sum(rate(kubernetes.container.memory.pagefaults)) by bucket(@timestamp, 5minute)

This change introduces several optimizations to improve the performance of the time-series source operator:

  • Split the leaf queue into two: one for _tsid and another for @timestamp. This avoids repeatedly comparing large _tsid values while iterating over a single _tsid.
  • Track the number of emitted documents per segment and use this data to build forward and backward document maps, reducing the need for expensive sorts.
  • Use ordinal blocks to avoid duplicating the same _tsid multiple times.

@dnhatn dnhatn force-pushed the time-series-source branch from d4f7e9b to 57ab327 Compare April 20, 2025 01:03
@dnhatn dnhatn requested review from kkrik-es and martijnvg April 20, 2025 01:03
@dnhatn dnhatn marked this pull request as ready for review April 20, 2025 01:03
@elasticsearchmachine elasticsearchmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:StorageEngine labels Apr 20, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@dnhatn dnhatn requested a review from kkrik-es April 21, 2025 16:55
Copy link
Contributor

@kkrik-es kkrik-es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, a few nits and questions about further improvements. Let's also have Martijn double-check the lucene part.

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, great job Nhat 👍

@dnhatn
Copy link
Member Author

dnhatn commented Apr 22, 2025

@kkrik-es @martijnvg Thanks for reviewing.

@dnhatn dnhatn merged commit 4f506d4 into elastic:main Apr 22, 2025
17 checks passed
@dnhatn dnhatn deleted the time-series-source branch April 22, 2025 21:32
@dnhatn dnhatn mentioned this pull request Apr 27, 2025
15 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >non-issue :StorageEngine/TSDB You know, for Metrics Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:StorageEngine v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants