Skip to content

Support index pattern selector syntax in SQL #120845

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Apr 15, 2025

Conversation

jbaiera
Copy link
Member

@jbaiera jbaiera commented Jan 24, 2025

Selector syntax (#118614) was introduced as part of the work to support implementing failure stores in data streams. This is a new feature of index patterns which allows a user to specify which indices inside of a data stream should be used in an action. Selectors are denoted by using the :: separator between a data stream name and the component of the data stream the user wants to target for an operation.

To search a data stream's backing indices, the ::data selector is used:

SELECT * FROM "my-data-stream::data"

To search a data stream's failure indices, the ::failures selector is used:

SELECT * FROM "my-data-stream::failures

By default, when a data stream has no selector specified, the ::data selector is implied to maintain backwards compatibility with current search functionality. The ::data selector primarily exists as a way to explicitly select the backing indices, but is not required for normal usage.

This PR updates the SQL grammar to include the selector portion of an index pattern. The qualifiedIndex() has been updated to include selectors in the resulting expression. Underlying search operations should already support this functionality, so this is primarily wiring it up where needed.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine elasticsearchmachine added Team:Data Management Meta label for data/management team Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) labels Jan 24, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

*/
@Override
public void visitErrorNode(ErrorNode node) {}
/**
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This entire file was regenerated with inconsistent formatting. Is this something that should be checked in as part of this PR? Does this get regenerated as needed during the build?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's generated by ANTLR, I typically run a spotlessApply to fix it

Copy link
Contributor

@luigidellaquila luigidellaquila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jbaiera, I had a look and left a first round of comments.
I think this needs more strict validation at parse time, to make sure that we only accept valid selectors.

I also tried to run queries with normal indices (non-datastreams), and the behavior is not what I expected:

  • with select * from person::data everything works fine
  • with select * from person::failures I still get the data. I expected no such index [person::failures] as in _search, or at least no data
  • with select * from person::foo I still get the data. I expected a validation exception like in _search: Invalid index name [person::foo], invalid usage of :: separator, [foo] is not a recognized selector

@jbaiera jbaiera requested a review from luigidellaquila April 9, 2025 06:12
@jbaiera
Copy link
Member Author

jbaiera commented Apr 9, 2025

I also tried to run queries with normal indices (non-datastreams), and the behavior is not what I expected:

@luigidellaquila I combed through the code and found that some portions of the code used the qualifiedIndex method to surface the index pattern, and other places just use index. This was causing queries to ignore the selector, which was causing queries to return results from the data component only. I've gone through and added validation to the IdentifierBuilder to keep invalid combinations from being accepted, and also, since cluster info is incompatible with selectors at this time, I've appended any selector provided to the index field on the TableIdentifier.

Copy link
Contributor

@luigidellaquila luigidellaquila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating on this @jbaiera, LGTM

@jbaiera jbaiera merged commit 299bf44 into elastic:main Apr 15, 2025
17 checks passed
@jbaiera
Copy link
Member Author

jbaiera commented Apr 15, 2025

💚 All backports created successfully

Status Branch Result
8.x

Questions ?

Please refer to the Backport tool documentation

@jbaiera jbaiera deleted the failure-store-sql-support branch April 15, 2025 17:05
jbaiera added a commit to jbaiera/elasticsearch that referenced this pull request Apr 15, 2025
Updates the SQL grammar to include the selector portion of an index pattern. The
index() method has been updated to include selectors in the resulting expression.

(cherry picked from commit 299bf44)
afoucret pushed a commit to afoucret/elasticsearch that referenced this pull request Apr 16, 2025
Updates the SQL grammar to include the selector portion of an index pattern. The 
index() method has been updated to include selectors in the resulting expression.
elasticsearchmachine pushed a commit that referenced this pull request Apr 17, 2025
Updates the SQL grammar to include the selector portion of an index pattern. The
index() method has been updated to include selectors in the resulting expression.

(cherry picked from commit 299bf44)

Co-authored-by: Elastic Machine <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/SQL SQL querying backport pending :Data Management/Data streams Data streams and their lifecycles >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Data Management Meta label for data/management team v8.19.0 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants