Skip to content

FORK - allow EVAL/DISSECT/STATS in branches #125937

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Apr 10, 2025

Conversation

ioanatia
Copy link
Contributor

@ioanatia ioanatia commented Mar 31, 2025

#121950

The FORK branches can now have different schema outputs.
Allows more than just WHERE, LIMIT and SORT in FORK branches.

The commands for which we added support in the FORK branches are:

  • EVAL
  • STATS
  • DISSECT.

There is a parsing issue that at the moment does not allow us to add more commands.
This will be addressed in a follow up PR.

FROM employees
| WHERE emp_no == 10048 OR emp_no == 10081
| FORK ( EVAL a = CONCAT(first_name, " ", emp_no::keyword, " ", last_name)
        | DISSECT a "%{x} %{y} %{z}"
        | EVAL y = y::keyword )
       ( STATS x = COUNT(*)::keyword, y = MAX(emp_no)::keyword, z = MIN(emp_no)::keyword )
       ( SORT emp_no ASC | LIMIT 2 | EVAL x = last_name )
       ( EVAL x = "abc" | EVAL y = "aaa" )
| KEEP _fork, emp_no, x, y, z, a
| SORT _fork, emp_no

output:

_fork:keyword | emp_no:integer | x:keyword | y:keyword | z:keyword | a:keyword
fork1         | 10048          | Florian   | 10048     | Syrotiuk  | Florian 10048 Syrotiuk
fork1         | 10081          | Zhongwei  | 10081     | Rosen     | Zhongwei 10081 Rosen
fork2         | null           | 2         | 10081     | 10048     | null
fork3         | 10048          | Syrotiuk  | null      | null      | null
fork3         | 10081          | Rosen     | null      | null      | null
fork4         | 10048          | abc       | aaa       | null      | null
fork4         | 10081          | abc       | aaa       | null      | null

If the sub plans schemas contain a conflicting types for the same column a verification error will be returned.

FORK will align the outputs by adding null columns in each subplan.
Example:

FROM employees
| WHERE emp_no == 10048 OR emp_no == 10081
| FORK ( EVAL a = 1 | EVAL b = 2)
       ( EVAL c = 3 | EVAL d = 4)

The planner will align the columns of each subplans, the previous example is equivalent to:

FROM employees
| WHERE emp_no == 10048 OR emp_no == 10081
| FORK ( EVAL a = 1 | EVAL b = 2 | EVAL c = NULL, d = NULL | KEEP a, b, c, d, emp_no, first_name ...)
       ( EVAL c = 3 | EVAL d = 4 | EVAL a = NULL, b = NULL | KEEP a, b, c, d, emp_no, first_name ...) 

Note on the grammar parser changes

As supported sub commands I only added the ones that I tested and see that work without a parsing exception.
I am not sure why but if I try to expand that list to other commands like MV_EXPAND I get a parsing error:

FROM search-movies
| FORK (STATS x = COUNT(*), y = VALUES(title) | MV_EXPAND y)
       (WHERE title:"Journey")
error:

{
  "error": {
    "root_cause": [
      {
        "type": "parsing_exception",
        "reason": "line 3:66: token recognition error at: ')'"
      }
    ],
    "type": "parsing_exception",
    "reason": "line 3:66: token recognition error at: ')'",
    "caused_by": {
      "type": "lexer_no_viable_alt_exception",
      "reason": null
    }
  },
  "status": 400
}

I also don't want to be blocked by this.
If we can fix the grammar problem here or in a follow up, I am more than happy to add all the tests we think are needed to cover the usage of all commands in FORK branches.
If we can support FORK branches to include STATS/EVALS as part of this PR, we should be able to extend support for other commands when we fix the grammar change.

@ioanatia ioanatia added >non-issue :Analytics/ES|QL AKA ESQL ES|QL-ui Impacts ES|QL UI Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.1.0 labels Mar 31, 2025
@@ -280,9 +280,17 @@ forkSubQueryCommand
;

forkSubQueryProcessingCommand
: whereCommand
| sortCommand
: evalCommand
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use processingCommand and exclude the commands that are not supported by Fork.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am actually having a bit of trouble with the grammar.
Even if I use processingCommand here, there are some combinations that fail with a parsing exception:

this works:

ROW a=[1,2,3]
| FORK (EVAL a = [2,3 ] | MV_EXPAND a | WHERE a == 2)
       (MV_EXPAND a | WHERE a == 2 )

this fails with a parsing exception:

FROM search-movies
| FORK (STATS x = COUNT(*), y = VALUES(title) | MV_EXPAND y)
       (WHERE title:"Journey")

error:

{
  "error": {
    "root_cause": [
      {
        "type": "parsing_exception",
        "reason": "line 3:66: token recognition error at: ')'"
      }
    ],
    "type": "parsing_exception",
    "reason": "line 3:66: token recognition error at: ')'",
    "caused_by": {
      "type": "lexer_no_viable_alt_exception",
      "reason": null
    }
  },
  "status": 400
}

I am able to use WHERE/LIMIT/SORT/DISSECT/EVAL/STATS without issues.
But using commands like MV_EXPAND/KEEP/RENAME/DROP/GROK in FORK branches fails with a parsing errors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that ultimately we should be able to effectively remove this list and replace it with processingCommand (that was my original intention when I added it), but this PR is a good step forward in that direction. Let's decouple this, as it will need even more extensive and new testing which is better in a subsequent PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am able to use WHERE/LIMIT/SORT/DISSECT/EVAL/STATS without issues. But using commands like MV_EXPAND/KEEP/RENAME/DROP/GROK in FORK branches fails with a parsing errors.

GROK does not throw ParsingException for me if it is added under forkSubQueryProcessingCommand. I wonder if it is related to the mode of the commands, the commands that can be recognized under FORK have EXPRESSION_MODE.

+ curl -u elastic:password -v -X POST 'localhost:9200/_query?format=txt&pretty' -H 'Content-Type: application/json' '-d
{
  "query": "from sample_data | fork (grok message \"%{WORD:x} %{WORD:y}\") (dissect message \"%{x} %{y}\") | keep message, x, y, _fork"
}

       message       |       x       |       y       |     _fork     
---------------------+---------------+---------------+---------------
Connected to 10.1.0.3|Connected      |to             |fork1          
Connected to 10.1.0.2|Connected      |to             |fork1          
Disconnected         |null           |null           |fork1          
Connection error     |Connection     |error          |fork1          
Connection error     |Connection     |error          |fork1          
Connection error     |Connection     |error          |fork1          
Connected to 10.1.0.1|Connected      |to             |fork1          
Connected to 10.1.0.3|Connected      |to 10.1.0.3    |fork2          
Connected to 10.1.0.2|Connected      |to 10.1.0.2    |fork2          
Disconnected         |null           |null           |fork2          
Connection error     |Connection     |error          |fork2          
Connection error     |Connection     |error          |fork2          
Connection error     |Connection     |error          |fork2          
Connected to 10.1.0.1|Connected      |to 10.1.0.1    |fork2   

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it is related to the mode of the commands

you are probably right - I'd like to follow up on the grammar issue separately if that's okay.
if I recall correctly for GROK I was hitting a parsing issue when the FORK subbranch contained multiple commands and not just GROK.

@ioanatia ioanatia requested review from bpintea and ChrisHegarty April 4, 2025 15:11
@ioanatia ioanatia marked this pull request as ready for review April 4, 2025 15:12
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Apr 4, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/kibana-esql (ES|QL-ui)

@elasticsearchmachine elasticsearchmachine removed the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Apr 4, 2025
Copy link
Contributor

@ChrisHegarty ChrisHegarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -280,9 +280,17 @@ forkSubQueryCommand
;

forkSubQueryProcessingCommand
: whereCommand
| sortCommand
: evalCommand
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that ultimately we should be able to effectively remove this list and replace it with processingCommand (that was my original intention when I added it), but this PR is a good step forward in that direction. Let's decouple this, as it will need even more extensive and new testing which is better in a subsequent PR.

Copy link
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a first look.
I think the grammar changes need to be pushed a bit further: FROM employees | FORK (KEEP emp_no) fails with the same token recognition error at: ')'.

Copy link
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, only left optional issues.
I'd maybe update the PR description, to mention which new commands are now allowed.

}
}

List<Alias> aliases = missing.stream()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use missing.forEach() instead of heavier streams, but optional / preference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept this as it is since I felt using map is more natural here - happy to change it if you have a strong preference to use forEach.

@stratoula
Copy link

@ioanatia which are the additional commands we are going to support with this change? All of them?

@ioanatia
Copy link
Contributor Author

ioanatia commented Apr 9, 2025

@stratoula for now we just add support for DISSECT/STATS/EVAL.
this is just temporary, we have a grammar issue which I haven't manage to fix atm to get more commands supported.
I created an issue to follow up on this since it won't be done in this PR #126553

Copy link
Member

@fang-xing-esql fang-xing-esql left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you @ioanatia ! Supporting the other commands(or giving a more friendly error messages of unsupported commands) can be addressed as follow ups.

@ioanatia ioanatia changed the title FORK - allow more command types in sub branches FORK - allow EVAL/DISSECT/STATS in branches Apr 10, 2025
@ioanatia ioanatia merged commit 9b6ce86 into elastic:main Apr 10, 2025
17 checks passed
@ioanatia ioanatia deleted the fork_better_branches branch April 10, 2025 10:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL ES|QL-ui Impacts ES|QL UI >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants