-
Notifications
You must be signed in to change notification settings - Fork 25.3k
lookup join first draft #123719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lookup join first draft #123719
Changes from 1 commit
d8c77e3
f5c5961
9c92e58
224656e
e3d94a1
a6dec83
13b73a6
1714289
47a8bf6
c64bfb9
1632577
8ae5d05
6ffccb5
999472d
b5fd81f
74d627a
140bb1f
2810e9d
6d5022a
fb2dd02
7aca7e2
fb28191
050fc6c
bd9f6c8
332a503
7b28714
4238b4c
56f69f8
f1127d4
f98dc57
67080c9
61e484d
b915c21
201992e
7e86684
8b67e4d
738e5f1
176015c
dbf074a
c8f9a9a
960bd64
697870d
aa66099
e093838
a8606c8
95fea35
82b7a66
f01a9e8
38f083b
5f67b42
90ea040
4ced718
c729072
33f4f76
3f31a03
81dfe0b
44182e2
9ffe225
91d5ac1
d274230
b2a48a6
7a32762
5155441
7d72171
f40f814
cde9b0e
b43d7af
63e3abd
fc4d6a4
908bd6b
c8dfd31
95867f6
2f2022a
7424b45
81b0555
0b22b49
0abd223
c6bb176
7573f66
73f42e9
ab147e0
ed9f3ac
e762fbd
950335b
adac826
7c61d06
987ecb1
60122a1
4bc2c5f
677829e
4d80506
e325dcd
a7ace03
0a71c40
50d0b8d
f3082a2
b6f58b3
93581ee
e0dea94
6dcb345
1e2fabc
c7fb1df
d5667ea
441c0af
baf9c54
c507cba
8e67c6f
6c110ce
6dacdb0
d4aac83
2a23d0d
96a9946
a37291e
a04781c
b9fa1fb
324f96a
f981f1e
1aabc15
84b5d67
fd5d23c
87e0ef6
4e20ff4
0f5c99c
914f2cd
78830f0
ef2c8d6
f980407
4b32753
8c92341
85cdb72
ecc19b0
3bd0052
436320a
469c7e1
05bc1c6
6e06838
a28cfbc
6a7a21e
3f09ff0
1c9c8e9
9afb36e
d52ac36
4b9fc46
9926903
24d9ca0
7e50bc1
7481411
cb92185
09e6f22
0688eca
4f34202
c0cb3f5
ef37ad3
b752e61
3dd09ea
26a5cd4
beae3db
67c2088
3fe8211
c3b1f9a
a0120be
4cb8c7a
b57ab72
6b358f4
24b82b1
ced8687
2febbd2
3936ba8
faf4b99
0889142
57ab5c4
88b094a
17baa2e
aae70ac
684131c
2bf626b
69fadc7
431ca2d
06d38f8
d0966ac
90d13fe
59b4cfd
02c9644
c175607
5982720
d74dc02
9fead41
0194727
a4e32bf
86b65f3
3f96715
2da66b8
7179480
9edf1c2
6f65aa8
ce611c6
5870b3e
1d74dda
795dc15
374f484
d7262fb
27e3d30
18b002a
d1ce796
624cfb8
8a232a7
e5637b0
e700036
c8442b8
1c8e0e6
8184026
80c1b86
7e1936a
e090bde
195b5bb
d4065d0
b95fc4c
514be2c
a93865c
6611ee2
e601fa8
64b1704
75bfcbf
4721d1c
b708b32
d97b414
fab4b65
353e161
5126d57
6292e95
fa6c1d3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
- Loading branch information
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -15,6 +15,13 @@ For example, you can use `ENRICH` to: | |
* Add product information to retail orders based on product IDs | ||
* Supplement contact information based on an email address | ||
|
||
[`ENRICH`](/reference/query-languages/esql/esql-commands.md#esql-enrich) is similar to [`LOOKUP join`](/reference/query-languages/esql/esql-commands.md#esql-lookup-join) in the fact that they both help you join data together. You should use `ENRICH` when: | ||
georgewallace marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
* Enrichment data doesn't changes frequently | ||
georgewallace marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* You can accept index-time overhead | ||
* You are working with structured enrichment patterns | ||
* You can accept having multiple matches combined into multi-values | ||
* You can accept being limited to predefined match fields | ||
alex-spies marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The matching logic of "which lookup/enrich document matches a given input row" is also more lenient for ENRICH compared to LOOKUP JOIN. For instance, for I'll gather more precise information for this, we can maybe add this in a follow-up PR. |
||
|
||
### How the `ENRICH` command works [esql-how-enrich-works] | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,38 +1,63 @@ | ||
--- | ||
navigation_title: "LOOKUP JOIN" | ||
navigation_title: "Correlate data with LOOKUP JOIN" | ||
mapped_pages: | ||
- https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-enrich-data.html | ||
--- | ||
|
||
# LOOKUP JOIN [esql-lookup-join] | ||
georgewallace marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
The {{esql}} [`LOOKUP join`](/reference/query-languages/esql/esql-commands.md#esql-lookup-join) processing command combines, at query-time, data from one or more source indexes with field-value combinations found in an input table. | ||
The {{esql}} [`LOOKUP join`](/reference/query-languages/esql/esql-commands.md#esql-lookup-join) processing command combines, at query-time, data from one or more source indexes with field-value combinations found in an input table. Teams often have data scattered across multiple indices – like logs, IPs, user IDs, hosts, employees etc. Without a direct way to enrich or correlate each event with reference data, root-cause analysis, security checks, and operational insights become time-consuming. | ||
alex-spies marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
For example, you can use `LOOKUP JOIN` to: | ||
|
||
* Pull in environment or ownership details for each host to enrich your metrics data | ||
* Pull in environment or ownership details for each host to correlate your metrics data. | ||
georgewallace marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* Quickly see if any source IPs match known malicious addresses. | ||
* Tag logs with the owning team or escalation info for faster triage and incident response. | ||
|
||
[`LOOKUP join`](/reference/query-languages/esql/esql-commands.md#esql-lookup-join) is similar to [`ENRICH`](/reference/query-languages/esql/esql-commands.md#esql-enrich) in the fact that they both help you join data together. You should use `LOOKUP JOIN` when: | ||
|
||
### How the `LOOKUP JOIN` command works [esql-how-lookup-join-works] | ||
* Enrichment data changes frequently | ||
* You want to avoid index time processing | ||
* Working with regular indices | ||
* Need to preserve distinct matches | ||
* Need to match on any field in a lookup index | ||
alex-spies marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## How the `LOOKUP JOIN` command works [esql-how-lookup-join-works] | ||
|
||
The `LOOKUP JOIN` command adds new columns to a table, with data from {{es}} indices. It requires a few special components: | ||
|
||
:::{image} ../../../images/esql-lookup-join.png | ||
:alt: esql lookup join | ||
::: | ||
|
||
::::{tip} | ||
`LOOKUP JOIN` does not guarantee the output to be in any particular order. If a certain order is required, users should use a [`SORT`](/reference/query-languages/esql/esql-commands.md#esql-sort) somewhere after the `LOOKUP JOIN`. | ||
|
||
:::: | ||
|
||
$$$esql-source-index$$$ | ||
|
||
Source index | ||
: An index which stores enrich data that the `LOOKUP` command can add to input tables. You can create and manage these indices just like a regular {{es}} index. You can use multiple source indices in an enrich policy. You also can use the same source index in multiple enrich policies. | ||
georgewallace marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
|
||
### Prerequisites [esql-enrich-prereqs] | ||
## Prerequisites [esql-enrich-prereqs] | ||
georgewallace marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
To use `LOOKUP JOIN`, you must have: | ||
|
||
* Data types of join key and join field in the lookup index need to generally be the same - up to widening of data types, where e.g. `short,byte` are considered equal to `integer`. Also, text fields can be used on the left hand side if and only if there is an exact subfield whose name is suffixed with `.keyword`. | ||
alex-spies marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## Limitations | ||
alex-spies marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
The following is a list of current limitations with `LOOKUP JOIN` | ||
|
||
* `LOOKUP JOIN` will be sucessfull if both left and right type of the join are both `KEYWORD` types or if the left type is of `TEXT` and the right type is `KEYWORD`. | ||
alex-spies marked this conversation as resolved.
Show resolved
Hide resolved
georgewallace marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* Indices in [lookup](elasticsearch/docs/reference/elasticsearch/index-settings/index-modules.md#index-mode-setting) mode are always single-sharded. | ||
* Cross cluster search is unsupported. Both source and lookup indicies must be local. | ||
georgewallace marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* `LOOKUP JOIN` can only use a single match field, and can only use a single index. Wildcards, aliases, and datastreams are not supported. | ||
alex-spies marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* The name of the match field in `LOOKUP JOIN lu_idx ON match_field` must match an existing field in the query. This may require renames or evals to achieve it. | ||
georgewallace marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* The query will circuit break if you fetch too much data in a single page. A large heap is needed to manage results of multiple megabytes. | ||
georgewallace marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* This limit is per page of data which is about about 10,000 rows. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't know if pages of data are a well-defined concept (at least user-facing). I also don't know if we can say this is about 10k rows. (@nik9000 may have more precise ideas on this.) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would be good to know the exacts here, I took a stab based on what I read but unsure what the limit is. Also will they get a specific messagE? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll check in with Nik to get a better, but still precise wording here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, I don't think they will get a specific error message, probably just a generic circuit breaker exception. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Suggestion:
@nik9000 , could you please keep me honest here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I like Alex's proposal though I might say "about" instead of "normally". It is a bit fuzzy, but that's what you get when you don't describe pages to users precisely. One thing that we could add is that larger nodes will allow bigger fetches. This is fairly temporary - we should switch to a stream of results at some point. |
||
* Matching many rows per incoming row will count against this limit. | ||
* This limit is approximately the same as for [`ENRICH`](/reference/query-languages/esql/esql-commands.md#esql-enrich). | ||
|
Uh oh!
There was an error while loading. Please reload this page.