Skip to content

[DO NOT MERGE] ESQL: PoC for name qualifiers #126531

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 32 commits into
base: main
Choose a base branch
from

Conversation

alex-spies
Copy link
Contributor

@alex-spies alex-spies commented Apr 9, 2025

This PoC aims to demonstrate how qualifiers could work in ES|QL. In particular it enables the following syntax and makes it work in a csv test:

| LOOKUP JOIN languages_lookup right ON main languages == right language_code

The added csv tests are probably the most important thing to look at; otherwise, this PR is quite noisy due to the grammar change + required plumbing.

Except for 3 ignored tests, the newly added csv tests pass in single node scenarios; multi-node scenarios fail, but only because I didn't update some serialization code, yet. There are 3 yaml tests that demonstrate how name conflicts with qualifiers would make queries invalid, but the validation is not yet in place.

I'm adding a couple of remarks by highlighting the relevant parts of the added tests.

basic
required_capability: name_qualifiers

FROM sample_data main
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to require the AS keyword, at least in the beginning - if we allow qualifiers in the FROM clause at all:

FROM sample_data AS main

required_capability: name_qualifiers

ROW main x = 2
| EVAL main y = main x * 3, other x = main y - 5
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should users be allowed to assign qualifiers in EVAL, RENAME or ROW? If so, should attributes with qualifiers still shadow existing attributes on name conflict, or should the query just fail?

| SORT event_duration
| LIMIT 1
| RENAME message AS main message
| KEEP main message, event_duration
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should referring to qualified names use a whitespace as separator, like here?
I.e. should it be KEEP main message, ... or rather KEEP main->message with some kind of qualifier separator like ->?

We already know that using . as separator is highly ambiguous, even though KEEP main.message looks the cleanest due to similarity to SQL. I don't think using . is realistic due to a high amount of ambiguous edge cases.

Comment on lines +72 to +73
// TODO: Should ON without predicate even be allowed with qualifiers?
| LOOKUP JOIN languages_lookup right ON language_code
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the LOOKUP JOIN ... ON some_attribute syntax be allowed together with qualifiers? It seems like it can be, but we have to be clear if the right hand side join key gets included into the output or not. In this PoC, it's not included.

FROM employees main
| KEEP main emp_no, main languages
| WHERE main emp_no == 10001
| LOOKUP JOIN languages_lookup right ON main languages == right language_code
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enabling this syntax is one of the aims of this PR.


FROM employees main
| KEEP main emp_no, main languages
| WHERE main emp_no == 10001
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be allowed to omit the qualifier when it's not ambiguous?

This should be possible in principle; we need to carefully examine and decide how to deal in case of ambiguities, e.g. when there is both an attribute called field (no qualifier) and another one called qualified field (qualifier present).

FROM employees
| KEEP emp_no, languages
| WHERE emp_no == 10001
| LOOKUP JOIN languages_lookup right ON languages == right language_code
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a common situation: fields from the main index (-pattern) have no qualifier, but we consider this fine.

FROM employees main
| KEEP main emp_no, main languages
| WHERE main emp_no == 10001
| LOOKUP JOIN languages_lookup ON main languages == language_code
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the ON ... == ... syntax be allowed when we don't assign a qualifier to the lookup fields?

indexPatternAndMetadataFields:
indexPattern (COMMA indexPattern)* metadata?
indexPattern (COMMA indexPattern)* qualifier=indexString? metadata?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main grammar changes: index patterns can have qualifiers (assigned to the attributes from the index); attribute names can have qualifiers when referring to them or defining them in EVAL.

Comment on lines 117 to 136
shadowing of previous attributes is not allowed:
- do:
catch: /qualified attribute \[main data\] already exists/
esql.query:
body:
query: 'from test main | eval main data = 10'
---
internal shadowing is not allowed:
- do:
catch: /qualified attribute \[qualified y\] already exists/
esql.query:
body:
query: 'row x = 0 | eval qualified y = 1, qualified y = 2 * qualified y'
---
shadowing in rename is not allowed:
- do:
catch: /qualified attribute \[qualified y\] already exists/
esql.query:
body:
query: 'row x = 0, qualified y | rename x as qualified y'
Copy link
Contributor Author

@alex-spies alex-spies Apr 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not implemented yet, but this could be the behavior if we disallow shadowing of attributes when they have qualifiers.

@alex-spies
Copy link
Contributor Author

To get this to production-readiness, we'd need to split this up into smaller portions. Then there's a bit more work to do:

  • Ensuring that EsqlSession.fieldNames still provides correct output when qualifiers are present. This is a major piece of work, esp. since making changes to this method is currently scary. In the PR's state, fieldNames simply ignores any and all qualifiers.
    But this means that there are very, very, very likely some scenarios where the field caps request, which is based on fieldNames' output, doesn't ask for some required index field - because fieldNames thinks it's being shadowed by some command that defines an attribute of the same name, even though the qualified names are different.
  • TransportVersion bump, required at least for LOOKUP JOIN with == condition
  • update the toString methods for all NamedExpression inheritors.
  • Make sure the error message suggesting columns that may have been meant includes the qualifier: "Unknown column [...] did you mean [...]?" + add tests
  • Make AS required for assigning aliases, most likely.

@alex-spies alex-spies added the :Analytics/ES|QL AKA ESQL label Apr 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants