-
Notifications
You must be signed in to change notification settings - Fork 25.2k
ESQL: Quotations in index patterns allow for inconsistent pattern composition #122651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Pinging @elastic/es-analytical-engine (Team:Analytics) |
AKA "jank combinations that make Nik cry" |
++, esp. since multiple remotes are possible via I wonder if there's a good reason to interpret commas inside quotes as separators - but otherwise, I feel like this should be a bug. |
There is related ongoing work for CCQ index pattern validation here: #122497 |
A suggestion for how to handle this going forward: It would be beneficial if we allowed quoted index patterns to be used with commas in two mutually exclusive ways. When searching multiple index patterns:
As mentioned, these cases should be mutually exclusive to one another. Unfortunately, I don't think we'll be able to make this approach hold universally in ESQL without getting approval for a breaking change. Today we tolerate index patterns like As such, I propose that we continue to support the previous GA usages where they occur, but throw a validation exception when that previous use is combined with cluster names and selectors outside of the quoted pattern:
This would allow later operations with index selectors to mimic this behavior:
Once CCS supports selectors, the patterns would be unified:
|
I think we should either support "one quote to rule them all" (option 2) or "quote between commas" (option 1) but not other things, and not commas inside quotes. E.g. things like |
It was also brought to my attention by @alex-spies that we should probably disallow As for disallowing commas in quotation marks at all, I'm more on the fence about it. The documentation I can find lists quotation marks as the way to escape characters that are not accepted here by the ESQL parser, but there are plenty of special characters that shouldn't be escaped from index pattern parsing (like |
Maybe I went too far there - I don't necessarily mean banning commas in all cases, just not allowing it as semantically valid separator. I.e.
Agreed. |
I updated my suggestion table to include the examples that are unclear if we should allow or not |
@smalyshev , I think we should make a definitive decision on whether we should go forward with @jbaiera 's suggestion (I think we should), and to what extent. @pawankartik-elastic 's PR #122497 seems to follow a different validation strategy, which treats I think now would be a good time to settle on one strategy, so that #122497 can already follow it and we minimize potentially risky changes to the validation code of |
I've checked the current version of the table above and my main concern there I still can't figure out the general rule by which it works. The rule for "individual quotes or quote everything" makes sense, but the examples in the table don't seem to follow it. Can we formulate a generic rule that would allow me to know how it's parsed without looking at the table? I think we need to have this rule recorded somewhere before we start coding it, otherwise we get "our code is our docs" which is not a good situation. The examples individually make sense to me, and I do think |
Here's my attempt at formulating a rule that fits the table:
If I'm correct, the grammar can be simplified and that would automatically take care of most of the validation. We can restrict There's still the issue that https://www.elastic.co/docs/reference/elasticsearch/rest-apis/api-conventions#api-date-math-index-names mentions the comma having special meaning inside date math patterns, but I don't know specific examples that demonstrate this. |
I'm not sure what is the story there. Comma does not have AFAIK any meaning for datemath syntax itself, and does not have any special meaning in Java date formatting. Some national date formats can include commas, but if you use such a format in datemath expression, you'd get invalid index name since comma is not valid in the index name? So I can't think of any legit way of actually using a comma in this context. |
I'd say here "remote index expression, e.g. Other than the above, I think it makes sense. |
I have one more concern here. With all this quoting going on, on which stage the quotes are going to be eliminated? The reason I ask is that the planning stage expects a single |
Quoting is eliminated within the |
So, here's my suggestion for going forward:
What do we think? |
@alex-spies Your suggestion looks right to me. If we can agree with the rules, I believe we could go for it. |
ES|QL allows for using quotations in index pattern names to escape characters that might otherwise break the syntax. An example of this might be something like
Additionally, we allow users to specify their entire index pattern in a string literal:
I've found recently that you can currently mix and match this syntax with cluster id syntax (and eventually index selector logic #120660) to create index patterns that probably should not be allowed.
All of the following patterns are valid in ES|QL, and parse to their unquoted equivalents when executing:
When combining this existing logic with #120660 we end up with valid patterns like the following:
This seems like invalid pattern parsing to me. In my mind
cluster:"index1,index2"
implies that the comma is escaped, or that both indices should be resolved from the remote cluster provided. Since this incompatibility is really only possible under CCS, I would make a case to disallow commas in the quoted portions of ESQL that include an unquoted cluster id prefix (and in the future, an index selector suffix).The text was updated successfully, but these errors were encountered: