Skip to content

Adding check for isIndexed in text fields when generating field exists queries to avoid ISE when field is stored but not indexed or with doc_values #130531

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

pmpailis
Copy link
Contributor

@pmpailis pmpailis commented Jul 3, 2025

As per #129973 (comment), it could be possible that if a text field is not indexed, but stored (i.e.

                    "b_field": {
                        "type": "text",
                        "index": false,
                        "store": true
                    }

we would be met with the following ISE exception from Lucene

Caused by: java.lang.IllegalStateException: FieldExistsQuery requires that the field indexes doc values, norms or vectors, but field 'b_field' exists and indexes neither of these data structures
at [email protected]/org.apache.lucene.search.FieldExistsQuery.rewrite(FieldExistsQuery.java:156)
at [email protected]/org.apache.lucene.search.ConstantScoreQuery.rewrite(ConstantScoreQuery.java:44)
at [email protected]/org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:311)
at [email protected]/org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:316)
at [email protected]/org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:876)

As per @martijnvg description in the linked issue (thank you :) ):

the issue is that FieldExistsQuery gets used because TextSearchInfo indicates that norms are enabled. However because no inverted index is stored (index is false) we assume that there are norms and use FieldExistsQuery in MappedFieldType#existsQuery(). However FieldExistsQuery can only be returned if the field is also indexed and has norms.

Closes: #129973

…s queries to avoid ISE when field is stored but not indexed or with doc_values
@pmpailis pmpailis added >bug :Search Relevance/Analysis How text is split into tokens labels Jul 3, 2025
@elasticsearchmachine elasticsearchmachine added Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.2.0 labels Jul 3, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@elasticsearchmachine
Copy link
Collaborator

Hi @pmpailis, I've created a changelog YAML for you.

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@pmpailis
Copy link
Contributor Author

pmpailis commented Jul 3, 2025

There is a test failing in KeywordFieldTypeTests that I try to understand if it's testing a viable scenario, and whether the introduced change might actually affect results.

        {
            FieldType fieldType = new FieldType();
            fieldType.setOmitNorms(false);
            KeywordFieldType ft = new KeywordFieldType("field", fieldType);
            assertEquals(new FieldExistsQuery("field"), ft.existsQuery(MOCK_CONTEXT));
        }

The above now fails (we instead get a TermQuery) because we have the default doc_values: false, and index: false, whereas we explicitly state that norms: true. So, by adding the isIndexed check as a precondition for checking norms, we now force the mapper to generate a TermQuery instead. It's also worth noting that the ctor with the custom FieldType is only used from the test.

FWIW, adding the isIndexed precondition also prevents an ISE coming from a mapping as follows (that mimics the above with the addition of store: true):

            "keyword": {
                "type": "keyword",
                "index": false,
                "doc_values": false,
                "store": true,
                "norms": true
            }

I guess it's ok to update the test to the new generated query (and add a yaml test to ensure that we do get back results in other cases) ?

@martijnvg
Copy link
Member

I guess it's ok to update the test to the new generated query (and add a yaml test to ensure that we do get back results in other cases) ?

I think so, given that this case was also prone to ISE.

@pmpailis
Copy link
Contributor Author

pmpailis commented Jul 3, 2025

Updated tests in 7d05df9df3bdc8e6192f979b0f394c8a6314c697

@pmpailis pmpailis enabled auto-merge (squash) July 4, 2025 08:53
@pmpailis pmpailis merged commit dc34f4d into elastic:main Jul 4, 2025
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search Relevance/Analysis How text is split into tokens Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

java.lang.IllegalStateException: FieldExistsQuery requires that the field indexes doc values, norms or vectors
3 participants