Skip to content

chore(dataobj): Add columnar reading APIs to logs and streams sections #17976

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jun 10, 2025

Conversation

rfratto
Copy link
Member

@rfratto rfratto commented Jun 4, 2025

NOTE TO REVIEWER: I've split this one across multiple commits to make it easier to review.

The last three commits for the logs section mirror the changes done to the streams section, and can be ignored.

This PR introduces columnar reading APIs to both the logs and streams sections. As part of this change, both sections now expose the columns stored in the section, which are then used to define predicates on the readers.

The columnar reading APIs emit a sequence of arrow.Records, where each record has no more than the batch size passed to the Reader.Read method.

Unlike the original row-based readers, the columnar readers:

  • Allow reading a subset of columns
  • Have column-based predicates that map literally to the new query engine

To be able to represent a column's value for use in predicates, we import github.com/apache/arrow-go/v18/arrow/scalar. Section Reader implementations perform mapping to and from the internal dataset.Value.

The implementation of logs.Reader is almost an identical copy of the streams.Reader implementation. I've opted for duplicating the implementation in the short term, since there's no obvious way for how to deduplicate it in a way keeps the package structure easy to understand. We can change this in the future if a clean and simple approach presents itself.

This PR does not yet update DataObjScan to make use of these new APIs; that's being left out of scope for another PR to handle.

rfratto added 8 commits June 4, 2025 13:47
Exposing the columns in a streams section will be used to construct the
columnar reading API.
This updates the dataset.Dataset implementation for a section so that
the higher-level API type Column is used.

This will make it easier for vectorized reader to maintain a mapping
from the original Column value to the dataset.Column value when creating
mapped predicates.
This adds a columnar reading API to the streams section. Each read call
returns an Arrow record.
Exposing the columns in a logs section will be used to construct the
columnar reading API.
This updates the dataset.Dataset implementation for a section so that
the higher-level API type Column is used.

This will make it easier for vectorized reader to maintain a mapping
from the original Column value to the dataset.Column value when creating
mapped predicates.
This adds a columnar reading API to the logs section. Each read call
returns an Arrow record.
Copy link
Contributor

@chaudum chaudum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@chaudum chaudum merged commit 06cf527 into grafana:main Jun 10, 2025
66 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants