ES|QL: fix join masking eval #126614

luigidellaquila · 2025-04-10T15:12:22Z

What happened here is the following:

in a query like

from languag* 
| eval type = null 
| rename language_name as message 
| lookup join message_types_lookup on message 
| rename type as message 
| lookup join message_types_lookup on message 
| keep `language.name`

type is initially set to null
the first LOOKUP overwrites it with a keyword
the second LOOKUP uses it as a join key

At pre-analysis time, we try to minimize the number of attributes we require from field_caps, and we try to infer these attributes from the query itself.
From this list of attributes, we remove the aliases created by EVAL (type in this case, assuming it's a constant null value).
What we did not take into consideration is that JOIN could overwrite that attribute (potentially with a different type), so we did lookup index resolution only asking for message.
As long as we didn't have a KEEP at the end, the query worked fine, just because there is a shortcut that prevents that optimization in case we need all the attributes.
Adding a KEEP at the end, hence removing type from the result, we got a mapping for the lookup index with only one attribute (message) and we completely missed the fact that type actually had to be overwritten.

The fix consists in not removing aliases from the attributes list when there is a subsequent JOIN, since we don't know if it will overwrite them yet.

elasticsearchmachine · 2025-04-10T15:12:48Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

elasticsearchmachine · 2025-04-10T15:12:48Z

Hi @luigidellaquila, I've created a changelog YAML for you.

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java

alex-spies

I didn't have time to think deeply about this, but at first glance, the fix looks like it should work.

…val' into esql/fix_join_masking_eval

astefan · 2025-04-11T06:13:13Z

I'll have a look as well.

astefan · 2025-04-11T15:40:03Z

...in/esql/src/test/java/org/elasticsearch/xpack/esql/session/IndexResolverFieldNamesTests.java

+                | where x > 1
+                | keep emp_no, language_name
+                | limit 1""",
+            Set.of("emp_no", "emp_no.*", "languages", "languages.*", "language_name", "language_name.*", "x", "y", "x.*", "y.*")


And x appears here as well because x could come from enrich languages_policy on y?

Exactly, we don't know the structure of the enrich index yet, after the ENRICH x could become something else (eg. a keyword) and the WHERE could be invalid

astefan · 2025-04-11T15:42:14Z

x-pack/plugin/esql/qa/testFixtures/src/main/resources/lookup-join.csv-spec

+joinMaskingEval
+required_capability: join_lookup_v12
+required_capability: fix_join_masking_eval
+from languag* 


Please, add this test and others (be creative) to IndexResolverFieldNamesTests.

astefan · 2025-04-11T15:53:45Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java

+     * eg. EVAL, GROK, DISSECT can override an alias, but we know it in advance, ie. we don't need to resolve indices to know.
+     */
+    private static boolean couldOverrideAliases(LogicalPlan p) {
+        return (p instanceof Aggregate


Why are you listing all of these that cannot define an additional "hidden" Attribute instead of checking those that can?

To avoid that it breaks when we add new commands that behave like JOIN.
As a follow-up, IMHO we should define an interface for commands that allow this optimization (similar to SortAgnostic) and replace this long list with a single instanceof

If we add a command that doesn't behave like JOIN, then it could break the other way around. Imo, it makes much more sense to list those commands that are "special".

If we add a command that doesn't behave like JOIN, then it could break the other way around

This is interesting, I'm not sure I have completely clear the implications of this aspect.

If I got it right, we use these field names to limit the scope of the field_caps; if we add a field that does not exist in any of the involved indices, will field_caps fail?

My understanding is that it will work fine, just ignoring the additional fields.

If it wasn't the case, then we would be in a catch-22: we couldn't know which fields to send before validating the query, and we couldn't validate the query before sending the fields to field_caps.

The fact that everything works fine makes me think that it's safe, but maybe I'm missing something.

How about completion command (I see it implements GeneratingPlan), is this a command that can add a "hidden" attribute that can be overiden?

I think Completion is safe (the target attribute is defined at parsing time), but I have to double-check.

If I got it right, we use these field names to limit the scope of the field_caps; if we add a field that does not exist in any of the involved indices, will field_caps fail?
My understanding is that it will work fine, just ignoring the additional fields.

Yep, I see your point. Adding more fields is safer. Added less fields is problematic. Better more fields than less.

astefan · 2025-04-11T15:57:20Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java

        parsed.forEachDown(p -> {// go over each plan top-down
+            if (couldOverrideAliases(p)) {


I would do this check closer to the actual logic that uses it, I find the code easier to reason about (there is a lot of code between this check here and where this check is actually useful, code that doesn't use the result of canRemoveAliases).

…val' into esql/fix_join_masking_eval

astefan

Left some minor comments, LGTM otherwise.

astefan · 2025-04-15T13:27:16Z

...in/esql/src/test/java/org/elasticsearch/xpack/esql/session/IndexResolverFieldNamesTests.java

+    public void testEnrichMaskingEvalOn() {
+        assertFieldNames("""
+            from employees
+            | eval langague_name = null


Did you really want to use here an alias name that will not be recalled further in the query? langague_name

Yes, the intention was to let the first ENRICH overwrite it, but there is a typo in the query indeed, the EVAL was supposed to be

| eval languages = length(languages)

(I need a number rather than a string).
Let me fix it.

astefan · 2025-04-15T13:28:04Z

...in/esql/src/test/java/org/elasticsearch/xpack/esql/session/IndexResolverFieldNamesTests.java

+            | eval langague_name = null
+            | enrich languages_policy on languages
+            | rename language_name as languages
+            | eval languages = length(language_name)


Here length(language_name) is not actually valid because language_name has been renamed. Was this intended like this?

Exactly this one, see my comment above

astefan · 2025-04-15T13:28:29Z

...in/esql/src/test/java/org/elasticsearch/xpack/esql/session/IndexResolverFieldNamesTests.java

+
+    public void testEnrichAndJoinMaskingEvalWh() {
+        assertFieldNames("""
+            from employees


Same strange "typos" here as well.

astefan · 2025-04-15T13:30:32Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java

+     * eg. EVAL, GROK, DISSECT can override an alias, but we know it in advance, ie. we don't need to resolve indices to know.
+     */
+    private static boolean couldOverrideAliases(LogicalPlan p) {
+        return (p instanceof Aggregate


If I got it right, we use these field names to limit the scope of the field_caps; if we add a field that does not exist in any of the involved indices, will field_caps fail?
My understanding is that it will work fine, just ignoring the additional fields.

Yep, I see your point. Adding more fields is safer. Added less fields is problematic. Better more fields than less.

astefan · 2025-04-15T13:42:53Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java

-                // If there are joins/enriches in the middle, these could override some of these fields.
-                // We don't know at this stage, so we have to keep all of them.
-                if (canRemoveAliases[0]) {
+            // If there are joins, enriches etc. in the middle, these could override some of these fields.


May I suggest a slightly different wording here, since this code in EsqlSession is tricky enough in what removes and what keeps and comments DO help a lot:

"
If the current node in the tree is of type JOIN (lookup join, inlinestats) or ENRICH or other type of command that we may add in the future which can override already defined Aliases with EVAL (for example "from test | eval ip = 123 | enrich ips_policy ON hostname | rename ip AS my_ip" and ips_policy enriches the results with the same name ip field), these aliases should be kept in the list of fields.
"

👍 it looks more clear, I'm changing it. Thanks!

astefan · 2025-04-15T13:44:54Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java

+        LOADING_NON_INDEXED_IP_FIELDS,
+
+        /**
+         * During resolution (pre-analysis) we have to consider that joins can override EVALuated values


Suggested change

* During resolution (pre-analysis) we have to consider that joins can override EVALuated values

* During resolution (pre-analysis) we have to consider that joins or enriches can override EVALuated values

astefan · 2025-04-15T13:45:14Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java

+                AttributeSet planRefs = p.references();
+                Set<String> fieldNames = planRefs.names();
+                p.forEachExpressionDown(Alias.class, alias -> {
+                    // do not remove the UnresolvedAttribute that has the same name as its alias, ie "rename id = id"


Suggested change

// do not remove the UnresolvedAttribute that has the same name as its alias, ie "rename id = id"

// do not remove the UnresolvedAttribute that has the same name as its alias, ie "rename id AS id"

elasticsearchmachine · 2025-04-15T16:10:49Z

💔 Backport failed

Status	Branch	Result
❌	8.18	Commit could not be cherrypicked due to conflicts
❌	8.x	Commit could not be cherrypicked due to conflicts
❌	9.0	Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 126614

luigidellaquila added 6 commits April 10, 2025 13:33

ES|QL: fix pre-analysis of JOIN masking EVAL

27865d7

Add capability

0d35e21

Fix test

08803dc

Merge branch 'main' into esql/fix_join_masking_eval

b820057

enable relevant paths in generative tests

fc44c26

make tests deterministic

132c587

luigidellaquila added >bug auto-backport Automatically create backport pull requests when merged :Analytics/ES|QL AKA ESQL v8.18.1 v8.19.0 v9.0.1 labels Apr 10, 2025

luigidellaquila requested review from astefan and alex-spies April 10, 2025 15:12

elasticsearchmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.1.0 labels Apr 10, 2025

Update docs/changelog/126614.yaml

dcc9e18

luigidellaquila commented Apr 10, 2025

View reviewed changes

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java Outdated Show resolved Hide resolved

alex-spies reviewed Apr 10, 2025

View reviewed changes

luigidellaquila added 2 commits April 10, 2025 17:55

Fix test

fb205bd

Merge remote-tracking branch 'luigidellaquila/esql/fix_join_masking_e…

3c1ccea

…val' into esql/fix_join_masking_eval

luigidellaquila added 5 commits April 11, 2025 10:43

Merge branch 'main' into esql/fix_join_masking_eval

7d5d67b

Refactoring

812cbe8

also Aggregate

5f3b001

Merge branch 'main' into esql/fix_join_masking_eval

f2f0a3d

Merge branch 'main' into esql/fix_join_masking_eval

ae267ac

astefan reviewed Apr 11, 2025

View reviewed changes

luigidellaquila added 3 commits April 15, 2025 10:09

Merge branch 'main' into esql/fix_join_masking_eval

6720f9b

Refactor and add tests

1a23c32

Merge remote-tracking branch 'luigidellaquila/esql/fix_join_masking_e…

ab77736

…val' into esql/fix_join_masking_eval

astefan approved these changes Apr 15, 2025

View reviewed changes

luigidellaquila added 5 commits April 15, 2025 16:14

Implement review suggestions

cf4719d

Fix typo

ecb72e9

Comments

d6fdb22

Merge branch 'main' into esql/fix_join_masking_eval

e6f51bb

Merge branch 'main' into esql/fix_join_masking_eval

8744884

luigidellaquila enabled auto-merge (squash) April 15, 2025 15:42

luigidellaquila merged commit de42ba3 into elastic:main Apr 15, 2025
17 checks passed

elasticsearchmachine added the backport pending label Apr 15, 2025

luigidellaquila added a commit to luigidellaquila/elasticsearch that referenced this pull request Apr 15, 2025

ES|QL: fix join masking eval (elastic#126614)

6cc922c

luigidellaquila mentioned this pull request Apr 15, 2025

[8.x] ES|QL: fix join masking eval (#126614) #126860

Merged

luigidellaquila added a commit to luigidellaquila/elasticsearch that referenced this pull request Apr 15, 2025

ES|QL: fix join masking eval (elastic#126614)

fcb87b5

luigidellaquila mentioned this pull request Apr 15, 2025

ES|QL: fix join masking eval (#126614) #126861

Merged

elasticsearchmachine pushed a commit that referenced this pull request Apr 15, 2025

ES|QL: fix join masking eval (#126614) (#126861)

cc2f101

afoucret pushed a commit to afoucret/elasticsearch that referenced this pull request Apr 16, 2025

ES|QL: fix join masking eval (elastic#126614)

5fc6b3e

elasticsearchmachine pushed a commit that referenced this pull request Apr 18, 2025

ES|QL: fix join masking eval (#126614) (#126860)

d7b78dd

luigidellaquila mentioned this pull request Apr 18, 2025

ES|QL: Tests for column pruning after JOIN #127059

Merged

luigidellaquila added a commit to luigidellaquila/elasticsearch that referenced this pull request Apr 22, 2025

ES|QL: fix join masking eval (elastic#126614)

8a31781

luigidellaquila mentioned this pull request Apr 22, 2025

ES|QL: fix join masking eval (#126614) #127149

Merged

elasticsearchmachine pushed a commit that referenced this pull request Apr 22, 2025

ES|QL: fix join masking eval (#126614) (#127149)

8c57db8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ES|QL: fix join masking eval #126614

ES|QL: fix join masking eval #126614

luigidellaquila commented Apr 10, 2025

elasticsearchmachine commented Apr 10, 2025

elasticsearchmachine commented Apr 10, 2025

alex-spies left a comment

astefan commented Apr 11, 2025

astefan Apr 11, 2025

luigidellaquila Apr 11, 2025

astefan Apr 11, 2025

astefan Apr 11, 2025

luigidellaquila Apr 11, 2025

astefan Apr 11, 2025

luigidellaquila Apr 11, 2025

astefan Apr 11, 2025

luigidellaquila Apr 11, 2025

astefan Apr 15, 2025

astefan Apr 11, 2025

astefan left a comment

astefan Apr 15, 2025

luigidellaquila Apr 15, 2025

astefan Apr 15, 2025

luigidellaquila Apr 15, 2025

astefan Apr 15, 2025

astefan Apr 15, 2025

astefan Apr 15, 2025

luigidellaquila Apr 15, 2025

astefan Apr 15, 2025

astefan Apr 15, 2025

elasticsearchmachine commented Apr 15, 2025

		parsed.forEachDown(p -> {// go over each plan top-down
		if (couldOverrideAliases(p)) {

	* During resolution (pre-analysis) we have to consider that joins can override EVALuated values
	* During resolution (pre-analysis) we have to consider that joins or enriches can override EVALuated values

	// do not remove the UnresolvedAttribute that has the same name as its alias, ie "rename id = id"
	// do not remove the UnresolvedAttribute that has the same name as its alias, ie "rename id AS id"

ES|QL: fix join masking eval #126614

ES|QL: fix join masking eval #126614

Conversation

luigidellaquila commented Apr 10, 2025

elasticsearchmachine commented Apr 10, 2025

elasticsearchmachine commented Apr 10, 2025

alex-spies left a comment

Choose a reason for hiding this comment

astefan commented Apr 11, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

astefan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elasticsearchmachine commented Apr 15, 2025

💔 Backport failed