Adding support to exclude semantic_text subfields #127664

Samiul-TheSoccerFan · 2025-05-02T20:33:13Z

Update the fieldCaps API to exclude semantic_text subfields in both legacy and new formats.

Legacy format:

setup:


PUT test-field-caps-with-legacy
{
    "settings": {
        "index.mapping.semantic_text.use_legacy_format": true
    },
    "mappings": {
        "properties": {
            "test_field_legacy": {
                "type": "semantic_text",
                "inference_id": ".elser-2-elasticsearch"
            },
            "non_infer_field_legacy": {
                "type": "text"
            },
            "sparse_vector_legacy": {
                "type": "sparse_vector"
            },
            "dense_vector_legacy": {
                "type": "dense_vector",
                "dims": 3,
                "similarity": "l2_norm"
            }
        }
    }
}

PUT test-field-caps-with-legacy/_doc/doc1
{
    "test_field_legacy": "these are not the droids you're looking for. He's free to go around",
    "sparse_vector_legacy": {
        "these": 1,
        "are": 2,
        "not": 3
    },
    "dense_vector_legacy": [1, 2, 3]
}

Query:

GET /_field_caps?allow_no_indices=true&fields=*&index=test*&ignore_unavailable=true&expand_wildcards=open

Response before update (Skimmed):

{
  "indices": [
    "test-field-caps-with-legacy"
  ],
  "fields": {
    "non_infer_field_legacy": {
      "text": {
        "type": "text",
        "metadata_field": false,
        "searchable": true,
        "aggregatable": false
      }
    },
    "test_field_legacy.inference.chunks.text": {
      "keyword": {
        "type": "keyword",
        "metadata_field": false,
        "searchable": false,
        "aggregatable": false
      }
    },
    "test_field_legacy.inference": {
      "object": {
        "type": "object",
        "metadata_field": false,
        "searchable": false,
        "aggregatable": false
      }
    },
    "sparse_vector_legacy": {
      "sparse_vector": {
        "type": "sparse_vector",
        "metadata_field": false,
        "searchable": true,
        "aggregatable": false
      }
    },
    "test_field_legacy": {
      "text": {
        "type": "text",
        "metadata_field": false,
        "searchable": true,
        "aggregatable": false
      }
    },
    "test_field_legacy.inference.chunks.embeddings": {
      "sparse_vector": {
        "type": "sparse_vector",
        "metadata_field": false,
        "searchable": true,
        "aggregatable": false
      }
    },
    "dense_vector_legacy": {
      "dense_vector": {
        "type": "dense_vector",
        "metadata_field": false,
        "searchable": true,
        "aggregatable": false
      }
    },
    "test_field_legacy.inference.chunks": {
      "nested": {
        "type": "nested",
        "metadata_field": false,
        "searchable": false,
        "aggregatable": false
      }
    }
  }
}

Response after update (Skimmed):

{
  "indices": [
    "test-field-caps-with-legacy"
  ],
  "fields": {
    "non_infer_field_legacy": {
      "text": {
        "type": "text",
        "metadata_field": false,
        "searchable": true,
        "aggregatable": false
      }
    },
    "sparse_vector_legacy": {
      "sparse_vector": {
        "type": "sparse_vector",
        "metadata_field": false,
        "searchable": true,
        "aggregatable": false
      }
    },
    "test_field_legacy": {
      "text": {
        "type": "text",
        "metadata_field": false,
        "searchable": true,
        "aggregatable": false
      }
    },
    "dense_vector_legacy": {
      "dense_vector": {
        "type": "dense_vector",
        "metadata_field": false,
        "searchable": true,
        "aggregatable": false
      }
    }
  }
}

new format:

setup:

PUT test-field-caps
{
    "mappings": {
        "properties": {
            "test_field": {
                "type": "semantic_text",
                "inference_id": ".elser-2-elasticsearch"
            },
            "non_infer_field": {
                "type": "text"
            },
            "sparse_vector": {
                "type": "sparse_vector"
            },
            "dense_vector": {
                "type": "dense_vector",
                "dims": 3,
                "similarity": "l2_norm"
            }
        }
    }
}

PUT test-field-caps/_doc/doc1
{
    "test_field": "these are not the droids you're looking for. He's free to go around",
    "sparse_vector": {
        "these": 1,
        "are": 2,
        "not": 3
    },
    "dense_vector": [1, 2, 3]
}

Query:

GET /_field_caps?allow_no_indices=true&fields=*&index=test*&ignore_unavailable=true&expand_wildcards=open

Response before update (Skimmed):

{
  "indices": [
    "test-field-caps"
  ],
  "fields": {
    "_ignored_source": {
      "_ignored_source": {
        "type": "_ignored_source",
        "metadata_field": true,
        "searchable": false,
        "aggregatable": false
      }
    },
    "non_infer_field": {
      "text": {
        "type": "text",
        "metadata_field": false,
        "searchable": true,
        "aggregatable": false
      }
    },
    "_index": {
      "_index": {
        "type": "_index",
        "metadata_field": true,
        "searchable": true,
        "aggregatable": true
      }
    },
    "_feature": {
      "_feature": {
        "type": "_feature",
        "metadata_field": true,
        "searchable": false,
        "aggregatable": false
      }
    },
    "sparse_vector": {
      "sparse_vector": {
        "type": "sparse_vector",
        "metadata_field": false,
        "searchable": true,
        "aggregatable": false
      }
    },
    "test_field.inference.chunks.embeddings": {
      "sparse_vector": {
        "type": "sparse_vector",
        "metadata_field": false,
        "searchable": true,
        "aggregatable": false
      }
    },
    "test_field.inference.chunks.offset": {
      "offset_source": {
        "type": "offset_source",
        "metadata_field": false,
        "searchable": false,
        "aggregatable": false
      }
    },
    "test_field": {
      "text": {
        "type": "text",
        "metadata_field": false,
        "searchable": true,
        "aggregatable": false
      }
    },
    "_inference_fields": {
      "_inference_fields": {
        "type": "_inference_fields",
        "metadata_field": true,
        "searchable": false,
        "aggregatable": false
      }
    },
    "test_field.inference": {
      "object": {
        "type": "object",
        "metadata_field": false,
        "searchable": false,
        "aggregatable": false
      }
    },
    "dense_vector": {
      "dense_vector": {
        "type": "dense_vector",
        "metadata_field": false,
        "searchable": true,
        "aggregatable": false
      }
    },
    "test_field.inference.chunks": {
      "nested": {
        "type": "nested",
        "metadata_field": false,
        "searchable": false,
        "aggregatable": false
      }
    }
  }
}

Response after update (Skimmed):

{
  "indices": [
    "test-field-caps"
  ],
  "fields": {
    "test_field": {
      "text": {
        "type": "text",
        "metadata_field": false,
        "searchable": true,
        "aggregatable": false
      }
    },
    "_inference_fields": {
      "_inference_fields": {
        "type": "_inference_fields",
        "metadata_field": true,
        "searchable": false,
        "aggregatable": false
      }
    },
    "non_infer_field": {
      "text": {
        "type": "text",
        "metadata_field": false,
        "searchable": true,
        "aggregatable": false
      }
    },
    "sparse_vector": {
      "sparse_vector": {
        "type": "sparse_vector",
        "metadata_field": false,
        "searchable": true,
        "aggregatable": false
      }
    },
    "dense_vector": {
      "dense_vector": {
        "type": "dense_vector",
        "metadata_field": false,
        "searchable": true,
        "aggregatable": false
      }
    },    
  }
}

elasticsearchmachine · 2025-05-02T20:35:10Z

Hi @Samiul-TheSoccerFan, I've created a changelog YAML for you.

...e/src/yamlRestTest/resources/rest-api-spec/test/inference/10_semantic_text_field_mapping.yml

Samiul-TheSoccerFan · 2025-05-02T20:45:55Z

...e/src/yamlRestTest/resources/rest-api-spec/test/inference/10_semantic_text_field_mapping.yml

+  - requires:
+      cluster_features: "gte_v8.16.0"
+      reason: field_caps support for semantic_text added in 8.16.0


Do we need to define a new cluster feature? As per my understanding, these fields are not expected from field_caps API so excluding these should not have an impact on the API level or discover. We have also covered backward compatibility through other yaml file.

I think it would be good to create a test feature for these tests.

kderusso

Agreed with @Mikep86 's comments in Slack, but good start!

kderusso · 2025-05-05T12:43:12Z

...e/src/yamlRestTest/resources/rest-api-spec/test/inference/10_semantic_text_field_mapping.yml

+  - requires:
+      cluster_features: "gte_v8.16.0"
+      reason: field_caps support for semantic_text added in 8.16.0


I think it would be good to create a test feature for these tests.

Mikep86 · 2025-05-05T12:58:19Z

server/src/main/java/org/elasticsearch/action/fieldcaps/FieldCapabilitiesFetcher.java

+    /**
+     * Returns true if the field should be excluded from the field capabilities response.
+     * This is used to exclude fields that are not useful for the user, such as
+     * offset_source and inference chunk embeddings.
+     */
+    private static boolean shouldExcludeField(MappedFieldType ft) {
+        return ft.typeName().equals("offset_source")
+            || ((ft instanceof SparseVectorFieldMapper.SparseVectorFieldType
+                || ft instanceof DenseVectorFieldMapper.DenseVectorFieldType
+                || ft instanceof KeywordFieldMapper.KeywordFieldType) && ft.name().contains(".inference.chunks"));
+    }


Reiterating my message offline, this is a brittle solution. We shouldn't be hard-coding field names to exclude from field caps. Instead, I recommend investigating a solution where we add a flag to MappedFieldType to control if a field is excluded from field caps.

Adding support to exclude semantic_text subfields

1a3bb97

Samiul-TheSoccerFan added >enhancement v9.1.0 :Search Foundations/Mapping Index mappings, including merging and defining field types :Search Relevance/Vectors Vector search :SearchOrg/Relevance Label for the Search (solution/org) Relevance team labels May 2, 2025

Update docs/changelog/127664.yaml

522730e

github-actions bot deployed to docs-preview May 2, 2025 20:35 View deployment

Updating changelog file

db64ad2

github-actions bot deployed to docs-preview May 2, 2025 20:39 View deployment

Samiul-TheSoccerFan commented May 2, 2025

View reviewed changes

...e/src/yamlRestTest/resources/rest-api-spec/test/inference/10_semantic_text_field_mapping.yml Outdated Show resolved Hide resolved

remove duplicate test from yaml file

e333e78

Samiul-TheSoccerFan commented May 2, 2025

View reviewed changes

github-actions bot deployed to docs-preview May 2, 2025 20:46 View deployment

kderusso reviewed May 5, 2025

View reviewed changes

Mikep86 reviewed May 5, 2025

View reviewed changes

Adding support to exclude semantic_text subfields from mapper builders

06caf66

github-actions bot deployed to docs-preview May 6, 2025 22:06 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding support to exclude semantic_text subfields #127664

Adding support to exclude semantic_text subfields #127664

Samiul-TheSoccerFan commented May 2, 2025

elasticsearchmachine commented May 2, 2025

Samiul-TheSoccerFan May 2, 2025

kderusso May 5, 2025

kderusso left a comment

kderusso May 5, 2025

Mikep86 May 5, 2025

Adding support to exclude semantic_text subfields #127664

Are you sure you want to change the base?

Adding support to exclude semantic_text subfields #127664

Conversation

Samiul-TheSoccerFan commented May 2, 2025

Legacy format:

setup:

Query:

Response before update (Skimmed):

Response after update (Skimmed):

new format:

setup:

Query:

Response before update (Skimmed):

Response after update (Skimmed):

elasticsearchmachine commented May 2, 2025

Samiul-TheSoccerFan May 2, 2025

Choose a reason for hiding this comment

kderusso May 5, 2025

Choose a reason for hiding this comment

kderusso left a comment

Choose a reason for hiding this comment

kderusso May 5, 2025

Choose a reason for hiding this comment

Mikep86 May 5, 2025

Choose a reason for hiding this comment