-
Notifications
You must be signed in to change notification settings - Fork 25.3k
ES|QL: Check if cluster aliases and index patterns are valid before executing query #122497
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ES|QL: Check if cluster aliases and index patterns are valid before executing query #122497
Conversation
Hi @pawankartik-elastic, I've created a changelog YAML for you. |
Things unclear to me at the moment:
|
I only very briefly looked at the solution, but the approach appears correct to me. Maybe it's not so nice to validate in multiple places for CCQ-purposes, as CCQ needs to take into consideration more than just lexikographic correctness. However, I believe that any validation that is possible based on the parser, only, should happen in the IdentifierBuilder (or otherwise close to the parsing step), like in this PR. This allows early rejections without risks, that e.g. we start big fat field_caps calls to try and validate queries that the parser could rightfully reject. And it means that whenever we have a index pattern inside ESQL, we know that it's at least sane from a lexikographic point of view. The possibility of index patterns containing : slipping through is not ideal at the moment. These are clearly invalid and shouldn't make it into the rest of the system.
Update: Please ignore, there's a much simpler explanation, see below.
Update: It's much simpler than that. For local
Quoting is required due to index patterns that can contain funny characters. Like Allowed quoting:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heya, thanks @pawankartik-elastic !
- This approach is unfortunately insufficient, but not by much.
This PR improves the situation for the buggy queries that initially triggered the issue. But it doesn't solve the problem that the index pattern may be otherwise invalid, like in FROM valid_remote:"invalid<index_name"
- which still just returns
<no-fields>
---------------
The reason is that in IdentifierBuilder.visitIndexPattern
, we just never call validateIndexPattern
in case that clusterString
is not null. Additionally, inside validateIndexPattern
, we explicitly skip the validation in case that the pattern contains a :
.
I believe the correct solution should try to split the indexPattern
at the first :
in case that there is no clusterString
and this is not a date math index pattern. And then, the remaining, actual index pattern should be sent to validateIndexPattern
, which needs additionally to consider the :
character as invalid rather than just skipping validation.
- I think there is a separate problem where cluster names can be invalid, which we don't reject at parse time, but in this case we just throw
no_such_remote_cluster_exception
.
This may be cheap enough to not be a problem - although we could improve validateIndexPattern
too check for correct remote name syntax, too.
- On a separate note: here https://github.com/elastic/elasticsearch/pull/120355/files#diff-ef7f022d55c8068ca9c7c26cc5bac2c4cc3cf5689a9bfc20e931cdf17541556eL83-L84 Ievgen added randomly generated cluster names to our parser tests.
I think it'd be nice to go and check if they use all possible characters that can be used in remote names? If there are more characters than that, we should go and make the identifier generator use that, so this is included in our tests. For instance, what about unicode? (I don't know :) )
x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/parser/StatementParserTests.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/parser/StatementParserTests.java
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/parser/IdentifierBuilder.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/parser/StatementParserTests.java
Show resolved
Hide resolved
Thank you @alex-spies. I agree that this fix doesn't fully address the issue. I was simply trying to:
I'll now work on ensuring that we fully address the issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from what @alex-spies awesomely mentioned already, please add many many more tests. For one, there is no test with date math expressions (something you explicitly consider in your code).
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/parser/IdentifierBuilder.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/parser/IdentifierBuilder.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/parser/IdentifierBuilder.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/parser/IdentifierBuilder.java
Outdated
Show resolved
Hide resolved
Of course! I'd definitely do that. I did the bare minimums to kick start some discussions. Over the next few commits, I'll address all your concerns. |
Hi @pawankartik-elastic, I've updated the changelog YAML for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left some comments.
This is a very sensitive and tricky area we are changing now and tests are essential to our confidence in pushing forward this change. Apart from the random tests, I would like to see many more concrete tests:
FROM "*: ,*,"
FROM ",,, "
FROM ",,,"
FROM ", ,,"
FROM ",,,",*
FROM "",*
FROM ""
FROM "<logstash-{now/d{yyyy.MM.dd|+12:00}}>"
FROM "<a|b>"
FROM "*:<logstash-{now/M{yyyy.MM}}>"
assertStringAsIndexPattern("*:index|pattern", command + " \"*:index|pattern\""); | ||
// Entire index pattern is quoted. So it's not a parse error but a semantic error where the index name | ||
// is invalid. | ||
expectError(command + " \"*:index|pattern\"", "Invalid index name [index|pattern], must not contain the following characters"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a big disconnect here between the mechanism we use for validating these index patterns and where we actually use these index patterns. Meaning, the mechanism is the Elasticsearch one where, at indexing (index creation time), it's validating the name provided.
Let me give an example:
from "test<"
gives Invalid index name [test<], must not contain the following characters [' ','\"','*',',','/','<','>','?','\\','|']
which is acceptable. The error message is not appropriate for ES|QL, it may be correct for ES. We do allow *
in index names, so I can do from "test*"
and ES|QL doesn't complain about *
. We need to revisit this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pawankartik-elastic thank you for adding more tests.
What is our plan for this error message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error message comes from MetadataCreateIndexService.validateIndexOrAliasName()
. One idea is to trap it specifically and re-throw with a different message. Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the paradox: that validateIndexOrAliasName
is made to check indices names at creation time, while we use for checks at runtime. And these two are different :-). Even _search
allows *
at search time. We need to provide a different error message indeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I understood that. In that case, should I push a fix to display a different error string by omitting *
from the error string since that's the only misleading char (or did I miss anything else)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, please take out the wildcard from there. Thanks.
x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/parser/StatementParserTests.java
Show resolved
Hide resolved
x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/parser/StatementParserTests.java
Show resolved
Hide resolved
x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/parser/StatementParserTests.java
Show resolved
Hide resolved
x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/parser/StatementParserTests.java
Outdated
Show resolved
Hide resolved
…parser/StatementParserTests.java Co-authored-by: Andrei Stefan <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only reviewed the productive changes as Andrei already made suggestions to the tests.
The production code looks right to me; it's also much simpler than the iteration before the grammar change and looks quite clean to me. Thank you!
I spotted minor issues with trailing commas and white spaces, but nothing dramatic. Please consider this PR unblocked and ready to merge once @astefan approves it.
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/parser/IdentifierBuilder.java
Outdated
Show resolved
Hide resolved
* @param ctx Index Pattern Context for generating error messages with offsets. | ||
* @param hasSeenStar If we've seen an asterisk so far. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice javadoc, thank you!
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/parser/IdentifierBuilder.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/parser/IdentifierBuilder.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/parser/StatementParserTests.java
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/parser/IdentifierBuilder.java
Outdated
Show resolved
Hide resolved
/* | ||
* Presence of multiple patterns requires a comma and comma requires quoting. If quoting is present, | ||
* cluster string and selector string cannot be present; they need to be attached within the quoting. | ||
* So we attempt to extract them later. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: inconsistent comment style, other comments use //
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This currently permits white spaces between separators and index name, like FROM "remote : idx"
and FROM idx :: failures
; it fails with less helpful error messages later down the line.
I think this is related to #129768, but not exactly the same. To avoid scope creep, we can tackle this in a follow-up or make it part of the other issue. I added a comment #129768 (comment) so we don't forget.
Co-authored-by: Alexander Spies <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from the leftover here, LGTM. Thank you!
💔 Backport failed
You can use sqren/backport to manually backport by running |
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
…xecuting query (elastic#122497) ES|QL index patterns validation: Ensure that the patterns in the query are syntactically and semantically valid --------- Co-authored-by: elasticsearchmachine <[email protected]> Co-authored-by: Andrei Stefan <[email protected]> Co-authored-by: Alexander Spies <[email protected]> (cherry picked from commit c94c021) # Conflicts: # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/parser/StatementParserTests.java
…efore executing query (#122497) (#130021) * ES|QL: Check if cluster aliases and index patterns are valid before executing query (#122497) ES|QL index patterns validation: Ensure that the patterns in the query are syntactically and semantically valid --------- Co-authored-by: elasticsearchmachine <[email protected]> Co-authored-by: Andrei Stefan <[email protected]> Co-authored-by: Alexander Spies <[email protected]> (cherry picked from commit c94c021) # Conflicts: # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/parser/StatementParserTests.java * Refactor dropping asterisk from invalid chars * Assert error message and drop invalid chars due to randomisation issue * Fix missing sort in `Strings#INVALID_FILENAME_CHARS` and assert invalid chars in error string
Consider the queries with more than 1 index seperator (not valid as per our current spec):
FROM remote:"foo:bar"
, and,FROM "remote:foo:bar:baz"
As of now, both the queries return empty results when
POST
ing to/_query?format=txt
. This happens because they're syntactically valid (but not semantically) and the subsequent validation makes no attempt to discard them or error out. This PR fixes this behaviour by failing such queries at the parsing stage. There was an initial concern surrounding the licensing and security aspects but since we're failing fast at the parsing stage and aren't allowing the queries "through", we should be fine. Once malformed queries are discarded, subsequent queries should go through the validation steps fine.Addresses #121901.
Considerations:
Should theChanges have been pulled into my branch and accommodated.::
operator be accounted for (because the respective PR is yet to be merged)?This is a bug fix. How far should it be backported (keeping in mind theLooking good to go back as far as 8.19.::
operator)?What's the right team to tag? SF/CCS or Analytics/ES|QL?