Skip to content

Text field block loader properly handles null values from delegate #127525

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
May 6, 2025

Conversation

lkts
Copy link
Contributor

@lkts lkts commented Apr 29, 2025

See analysis in #126427 (comment).

This change makes text fields that have a keyword multi-field with null_value correctly return null_value instead of null when there are only null values in the document. This is already true for nulls mixed with regular value (not considering whether that is intentional or not).

F.e. indexing ["hey", null] with a multi-field that has null_value: "NULL" results in text block loader returning ["hey", "NULL"]. This PR makes [null, null] returns ["NULL", "NULL"] instead of nothing.

@lkts lkts added >bug auto-backport Automatically create backport pull requests when merged :Analytics/ES|QL AKA ESQL v8.19.0 labels Apr 29, 2025
@lkts lkts requested review from nik9000 and dnhatn April 29, 2025 17:55
@elasticsearchmachine elasticsearchmachine added v9.1.0 Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) labels Apr 29, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @lkts, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@lkts
Copy link
Contributor Author

lkts commented Apr 29, 2025

Closes #126427.

Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth updating the description to say that this will return the null_value from the text fields and probably putting a WARNING about this somewhere in the docs for synthetic source or text fields. It's weird, but it's the least weird choice.

@lkts lkts added the >docs General docs changes label Apr 29, 2025
@elasticsearchmachine elasticsearchmachine added the Team:Docs Meta label for docs team label Apr 29, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-docs (Team:Docs)

Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks Sasha!

@lkts
Copy link
Contributor Author

lkts commented Apr 30, 2025

I'll wait for a review from docs team.

@lkts lkts requested a review from shainaraskas May 5, 2025 17:20
Copy link
Contributor

@shainaraskas shainaraskas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some small suggestions for you from a docs pov

Comment on lines 109 to 115
::::{note}
Synthetic source of the `text` field will have the same [modifications](/reference/elasticsearch/mapping-reference/mapping-source-field.md#synthetic-source) as a `keyword` field in this case.

These modifications can impact usage of `text` fields:
* Reordering text fields can have an effect on [phrase](/reference/query-languages/query-dsl/query-dsl-match-query-phrase.md) and [span](/reference/query-languages/query-dsl/span-queries.md) queries. See the discussion about [`position_increment_gap`](/reference/elasticsearch/mapping-reference/position-increment-gap.md) for more detail. You can avoid this by making sure the `slop` parameter on the phrase queries is lower than the `position_increment_gap`. This is the default.
* Handling of `null` values is different. `text` fields ignore `null` values but `keyword` fields support replacing `null`s with a value specified in the `null_value` parameter. This replacement will be represented in synthetic source.
::::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you've reframed the entire section to be about keyword, so I'd take this out of a note

Suggested change
::::{note}
Synthetic source of the `text` field will have the same [modifications](/reference/elasticsearch/mapping-reference/mapping-source-field.md#synthetic-source) as a `keyword` field in this case.
These modifications can impact usage of `text` fields:
* Reordering text fields can have an effect on [phrase](/reference/query-languages/query-dsl/query-dsl-match-query-phrase.md) and [span](/reference/query-languages/query-dsl/span-queries.md) queries. See the discussion about [`position_increment_gap`](/reference/elasticsearch/mapping-reference/position-increment-gap.md) for more detail. You can avoid this by making sure the `slop` parameter on the phrase queries is lower than the `position_increment_gap`. This is the default.
* Handling of `null` values is different. `text` fields ignore `null` values but `keyword` fields support replacing `null`s with a value specified in the `null_value` parameter. This replacement will be represented in synthetic source.
::::
In this case, the synthetic source of the `text` field will have the same [modifications](/reference/elasticsearch/mapping-reference/mapping-source-field.md#synthetic-source) as a `keyword` field.
These modifications can impact usage of `text` fields:
* Reordering text fields can have an effect on [phrase](/reference/query-languages/query-dsl/query-dsl-match-query-phrase.md) and [span](/reference/query-languages/query-dsl/span-queries.md) queries. See the discussion about [`position_increment_gap`](/reference/elasticsearch/mapping-reference/position-increment-gap.md) for more details. You can avoid this by making sure the `slop` parameter on the phrase queries is lower than the `position_increment_gap`. This is the default.
* Handling of `null` values is different. `text` fields ignore `null` values, but `keyword` fields support replacing `null` values with a value specified in the `null_value` parameter. This replacement is represented in synthetic source.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo, it would be good to keep an example here because it's the actual synthetic part. not critical, but the store example is less important I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to repeat the same example as keyword has. Do you think it's helpful?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's intuitive for people to look for the "core" example on the keyword page ... they'd likely just scroll down to the synthetics section and copy whatever is there/assume whatever example is there represents the happy path. you could consider using a snippet to avoid maintaining two examples.

I'm also ok with this merging as is if you're not super worried about it 🤷

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we don't actually have an example with a multi-field i'll go ahead and add it. Thanks for the feedback.

@lkts lkts merged commit 0df9d1c into elastic:main May 6, 2025
16 checks passed
@lkts lkts deleted the fix_text_block_loader_nulls branch May 6, 2025 19:29
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.19 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 127525

@lkts
Copy link
Contributor Author

lkts commented May 6, 2025

💚 All backports created successfully

Status Branch Result
8.19

Questions ?

Please refer to the Backport tool documentation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged backport pending >bug >docs General docs changes Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Docs Meta label for docs team v8.19.0 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants