Skip to content

ES|QL Fork Command #121652

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
7 tasks
ChrisHegarty opened this issue Feb 4, 2025 · 1 comment
Open
7 tasks

ES|QL Fork Command #121652

ChrisHegarty opened this issue Feb 4, 2025 · 1 comment
Assignees
Labels
Meta priority:high A label for assessing bug priority to be used by ES engineers :Search Relevance/Search Catch all for Search Relevance Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.1.0

Comments

@ChrisHegarty
Copy link
Contributor

ChrisHegarty commented Feb 4, 2025

Fork is a foundational building block to support multiple-subqueries, RRF, and much more.

What is FORK?

Conceptually, fork is:

  1. a bifurcation of the stream, with all data going to each fork branch, followed by
  2. a merge of the branches, enhanced with a discriminator column

The name, fork, is somewhat inspired by unix fork, and other streamy frameworks, since the concept of forked execution is quite familiar. Other names considered and discounted are: union, merge, combine, tee, tpipe. While conceptually similar, the aforementioned names would likely lead to confusion with similar (but different) concepts in other languages, e.g. SQL union.

Example:

FROM test
| FORK
    ( WHERE content:"fox" )
    ( WHERE content:"dog" )
| SORT _fork
| KEEP _fork, id, content

Conceptual data flow:

Image

Actual execution flow:
The planner and execution is free to reorganise things as long as it adheres to the conceptual flow of data.

Building upon the previous example, now with a common pre-filter:

FROM test
| WHERE id > 1  // common pre-filter
| FORK
    ( WHERE content:"fox" )
    ( WHERE content:"dog" )
| SORT _fork
| KEEP _fork, id, content

Where the FORK is “pushable”, then the common pre-filter and the WHERE of each fork branch is pushed down to be an effective subquery.

Image

Where the FORK is not pushable, e.g. after a STATS, then the fork implementation will “fan-out” and merge within the compute engine. That is, the implementation will be more like the initial conceptual diagram above.

### Initial Restrictions

A number of initial restrictions have been put in place in order to make progress and unblock other development efforts dependent on Fork, e.g. RRF.

The restrictions are:

  1. First level data retrieval only - not yet general purpose bifurcation of the stream. This allows us to support multiple different subqueries. For bifurcation of the stream, then the planner will have to determine that the fork is actually being performed in second stage retrieval. This is a pragmatic limitation that we can lift later.
  2. All branches of the fork must return the same data scheme (same columns). This is a pragmatic limitation that we can lift later. For this reason, only WHERE, SORT, and LIMIT, are supported within fork subqueries.
  3. No fork within a fork. This is a pragmatic limitation that we can lift later.
  4. Lucene queries are independent - no point-in-time. We can add this later
  5. Fork branches are automatically named. We can provide the ability to name the branches later.

Development outline and evolution

We will lift all the restrictions as outlined above, but not all at once and not necessarily in the outlined order.

Since FORK is a significant feature, its development will be broken down over several other smaller PRs and issues. This section is intended to capture the current state and future plans as we progress towards a complete implementation. As such, consider this section "live", as new PRs and issues are filed they can be linked here.

@ChrisHegarty ChrisHegarty added :Search Relevance/Search Catch all for Search Relevance Meta Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.1.0 labels Feb 4, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@ChrisHegarty ChrisHegarty added the priority:high A label for assessing bug priority to be used by ES engineers label Feb 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Meta priority:high A label for assessing bug priority to be used by ES engineers :Search Relevance/Search Catch all for Search Relevance Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.1.0
Projects
None yet
Development

No branches or pull requests

3 participants