-
Notifications
You must be signed in to change notification settings - Fork 25.2k
[DO NOT MERGE] ESQL: PoC for name qualifiers #126531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
basic | ||
required_capability: name_qualifiers | ||
|
||
FROM sample_data main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to require the AS
keyword, at least in the beginning - if we allow qualifiers in the FROM
clause at all:
FROM sample_data AS main
required_capability: name_qualifiers | ||
|
||
ROW main x = 2 | ||
| EVAL main y = main x * 3, other x = main y - 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should users be allowed to assign qualifiers in EVAL, RENAME or ROW? If so, should attributes with qualifiers still shadow existing attributes on name conflict, or should the query just fail?
| SORT event_duration | ||
| LIMIT 1 | ||
| RENAME message AS main message | ||
| KEEP main message, event_duration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should referring to qualified names use a whitespace as separator, like here?
I.e. should it be KEEP main message, ...
or rather KEEP main->message
with some kind of qualifier separator like ->
?
We already know that using .
as separator is highly ambiguous, even though KEEP main.message
looks the cleanest due to similarity to SQL. I don't think using .
is realistic due to a high amount of ambiguous edge cases.
// TODO: Should ON without predicate even be allowed with qualifiers? | ||
| LOOKUP JOIN languages_lookup right ON language_code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the LOOKUP JOIN ... ON some_attribute
syntax be allowed together with qualifiers? It seems like it can be, but we have to be clear if the right hand side join key gets included into the output or not. In this PoC, it's not included.
FROM employees main | ||
| KEEP main emp_no, main languages | ||
| WHERE main emp_no == 10001 | ||
| LOOKUP JOIN languages_lookup right ON main languages == right language_code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enabling this syntax is one of the aims of this PR.
|
||
FROM employees main | ||
| KEEP main emp_no, main languages | ||
| WHERE main emp_no == 10001 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be allowed to omit the qualifier when it's not ambiguous?
This should be possible in principle; we need to carefully examine and decide how to deal in case of ambiguities, e.g. when there is both an attribute called field
(no qualifier) and another one called qualified field
(qualifier present).
FROM employees | ||
| KEEP emp_no, languages | ||
| WHERE emp_no == 10001 | ||
| LOOKUP JOIN languages_lookup right ON languages == right language_code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be a common situation: fields from the main index (-pattern) have no qualifier, but we consider this fine.
FROM employees main | ||
| KEEP main emp_no, main languages | ||
| WHERE main emp_no == 10001 | ||
| LOOKUP JOIN languages_lookup ON main languages == language_code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the ON ... == ...
syntax be allowed when we don't assign a qualifier to the lookup fields?
indexPatternAndMetadataFields: | ||
indexPattern (COMMA indexPattern)* metadata? | ||
indexPattern (COMMA indexPattern)* qualifier=indexString? metadata? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main grammar changes: index patterns can have qualifiers (assigned to the attributes from the index); attribute names can have qualifiers when referring to them or defining them in EVAL.
shadowing of previous attributes is not allowed: | ||
- do: | ||
catch: /qualified attribute \[main data\] already exists/ | ||
esql.query: | ||
body: | ||
query: 'from test main | eval main data = 10' | ||
--- | ||
internal shadowing is not allowed: | ||
- do: | ||
catch: /qualified attribute \[qualified y\] already exists/ | ||
esql.query: | ||
body: | ||
query: 'row x = 0 | eval qualified y = 1, qualified y = 2 * qualified y' | ||
--- | ||
shadowing in rename is not allowed: | ||
- do: | ||
catch: /qualified attribute \[qualified y\] already exists/ | ||
esql.query: | ||
body: | ||
query: 'row x = 0, qualified y | rename x as qualified y' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not implemented yet, but this could be the behavior if we disallow shadowing of attributes when they have qualifiers.
To get this to production-readiness, we'd need to split this up into smaller portions. Then there's a bit more work to do:
|
This PoC aims to demonstrate how qualifiers could work in ES|QL. In particular it enables the following syntax and makes it work in a csv test:
The added csv tests are probably the most important thing to look at; otherwise, this PR is quite noisy due to the grammar change + required plumbing.
Except for 3 ignored tests, the newly added csv tests pass in single node scenarios; multi-node scenarios fail, but only because I didn't update some serialization code, yet. There are 3 yaml tests that demonstrate how name conflicts with qualifiers would make queries invalid, but the validation is not yet in place.
I'm adding a couple of remarks by highlighting the relevant parts of the added tests.