chore(dataobj): Add columnar reading APIs to logs and streams sections #17976

rfratto · 2025-06-04T18:01:44Z

NOTE TO REVIEWER: I've split this one across multiple commits to make it easier to review.

The last three commits for the logs section mirror the changes done to the streams section, and can be ignored.

This PR introduces columnar reading APIs to both the logs and streams sections. As part of this change, both sections now expose the columns stored in the section, which are then used to define predicates on the readers.

The columnar reading APIs emit a sequence of arrow.Records, where each record has no more than the batch size passed to the Reader.Read method.

Unlike the original row-based readers, the columnar readers:

Allow reading a subset of columns
Have column-based predicates that map literally to the new query engine

To be able to represent a column's value for use in predicates, we import github.com/apache/arrow-go/v18/arrow/scalar. Section Reader implementations perform mapping to and from the internal dataset.Value.

The implementation of logs.Reader is almost an identical copy of the streams.Reader implementation. I've opted for duplicating the implementation in the short term, since there's no obvious way for how to deduplicate it in a way keeps the package structure easy to understand. We can change this in the future if a clean and simple approach presents itself.

This PR does not yet update DataObjScan to make use of these new APIs; that's being left out of scope for another PR to handle.

Exposing the columns in a streams section will be used to construct the columnar reading API.

This updates the dataset.Dataset implementation for a section so that the higher-level API type Column is used. This will make it easier for vectorized reader to maintain a mapping from the original Column value to the dataset.Column value when creating mapped predicates.

This adds a columnar reading API to the streams section. Each read call returns an Arrow record.

Exposing the columns in a logs section will be used to construct the columnar reading API.

This updates the dataset.Dataset implementation for a section so that the higher-level API type Column is used. This will make it easier for vectorized reader to maintain a mapping from the original Column value to the dataset.Column value when creating mapped predicates.

This adds a columnar reading API to the logs section. Each read call returns an Arrow record.

chaudum

lgtm!

rfratto added 8 commits June 4, 2025 13:47

chore: vendor github.com/apache/arrow-go/v18/arrow/scalar

191b7de

chore(dataobj): add utilities for testing arrow records and tables

880cc5f

chore(dataobj/streams): expose columns in a streams section

ff12abe

Exposing the columns in a streams section will be used to construct the columnar reading API.

chore(dataobj/streams): add columnar reading API

1b34899

This adds a columnar reading API to the streams section. Each read call returns an Arrow record.

chore(dataobj/logs): expose columns in a logs section

3499a54

Exposing the columns in a logs section will be used to construct the columnar reading API.

chore(dataobj/logs): add columnar reading API

ff64fd4

This adds a columnar reading API to the logs section. Each read call returns an Arrow record.

rfratto requested a review from a team as a code owner June 4, 2025 18:01

pull-request-size bot added the size/XXL label Jun 4, 2025

ashwanthgoli mentioned this pull request Jun 6, 2025

chore(engine): implement range aggregation operator #17997

Merged

6 tasks

chaudum approved these changes Jun 10, 2025

View reviewed changes

chaudum merged commit 06cf527 into grafana:main Jun 10, 2025
66 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore(dataobj): Add columnar reading APIs to logs and streams sections #17976

chore(dataobj): Add columnar reading APIs to logs and streams sections #17976

rfratto commented Jun 4, 2025 •

edited

Loading

Uh oh!

chaudum left a comment

Uh oh!

Uh oh!

Uh oh!

chore(dataobj): Add columnar reading APIs to logs and streams sections #17976

chore(dataobj): Add columnar reading APIs to logs and streams sections #17976

Conversation

rfratto commented Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chaudum left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

rfratto commented Jun 4, 2025 •

edited

Loading