Check array size at parse time #129997

drempapis · 2025-06-25T10:24:33Z

For an index having an array defined as nested

PUT /test-large-array
{
  "mappings": {
    "properties": {
      "array": {
        "type": "nested",           
        "properties": {
          "value": { "type": "integer" }
        }
      }
    }
  }
}

Indexing is protected by the index.mapping.nested_objects.limit setting, which restricts the number of nested objects allowed per document. However, in cases where an index is created via dynamic mapping without predefined limits, no adequate control is in place. This can result in the ingestion of a poison document—a document containing an array with a vast number of objects, which can severely degrade Elasticsearch cluster performance or even lead to node crashes due to excessive memory consumption and processing overhead.

This is a preliminary draft that attempts to address the scenario by incorporating safeguards, enforcement of the index.mapping.nested_objects.limit setting, and additional validation checks during document parsing. The approach aims to mitigate the risk of ingesting poison documents by interleaving structural controls and runtime validation to limit the number of objects.

…/elasticsearch into check-array-size-at-parse-time

javanna · 2025-06-26T08:45:37Z

server/src/main/java/org/elasticsearch/index/mapper/DocumentParser.java

@@ -749,8 +749,20 @@ private static void parseNonDynamicArray(
        XContentParser.Token token;
        XContentParser.Token previousToken = parser.currentToken();
        int elements = 0;
+        int countArray = 0;
+        long nestedDocsLimit = context.indexSettings().getMappingNestedDocsLimit();


I am not sure if you are looking at the correct limit for the scenario you are covering. Could you review https://www.elastic.co/guide/en/elasticsearch/reference/8.18/mapping-settings-limit.html and its different limits please? The one you are checking is specific for the nested type, I am not mistaken.

I used this setting because it controls the number of elements allowed in arrays when the field is defined as a nested array. I couldn't find a more suitable existing one. If we follow this approach, should we create a new one?

PUT /index-large-array { "mappings": { "properties": { "array": { "type": "nested", "properties": { "value": { "type": "integer" } } } } } }

The issue I see with the setting you used is that it's specific to the nested type, re-used for something unrelated with your change. I am not sure that's a good idea. Maybe it deserves discussion with the team. An alternative is index.mapping.depth.limit but it's also not a great fit. Adding a new one is an option too.

Introduced a new setting, index.mapping.array_objects.limit, to enforce a limit on array size during parsing.

javanna · 2025-06-26T08:47:27Z

server/src/main/java/org/elasticsearch/index/mapper/DocumentParser.java

        while ((token = parser.nextToken()) != XContentParser.Token.END_ARRAY) {
            if (token == XContentParser.Token.START_OBJECT) {
+                if (countArray++ >= nestedDocsLimit) {


I worry that this will be a performance hit when parsing documents perhaps. We should have micro benchmarks for document parsing under ./benchmarks. It is probably worth double checking the impact of this change.

We should have micro benchmarks for document parsing under ./benchmarks.

What’s the recommended way to micro-benchmark this functionality

I would run the existing micro benchmarks for document parsing, and check if they would detect regressions around the counting you added, or add a new one that's specific to this scenario if needed. We may actually need documents with an array including many inner objects that we likely don't have in the existing benchmarks

I did a micro-benchmarking to test the time it takes to parse documents with varying array sizes.
The Run 1 is for he main branch whereRun 2 is for the current branch (update)

Array Size | Run 1 Avg (ns/op) | Run 2 Avg (ns/op) | Difference (ns) | % Change -------------------------------------------------------------------------------- 10 | 1983.711 | 2223.238 | +239.527 | +12.07% 100 | 15356.069 | 15497.797 | +141.728 | +0.92% 1000 | 141146.977 | 145791.315 | +4644.338 | +3.29% 10000 | 1432616.938 | 1443769.196 | +11152.258 | +0.78%

The performance degraded slightly in the second run for all array sizes. For larger array sizes (1000 and 10000), the performance remained quite stable.

For the array size of 10 we observe the largest relative increase though the absolute impact is small.

I executed the benchmark locally, aiming to minimize system load by stopping all resource-intensive processes. However, some system noise or garbage collection activity may have influenced the results.

The code I executed was the following

@Fork(value = 1) @Warmup(iterations = 5) @Measurement(iterations = 5) @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) @State(Scope.Benchmark) public class DocumentParserBenchmark { static { LogConfigurator.loadLog4jPlugins(); LogConfigurator.configureESLogging(); } @Param({"10", "100", "1000", "10000"}) private int arraySize; private MapperService mapperService; private SourceToParse sourceToParse; @Setup public void setUp() { this.mapperService = MapperServiceFactory.create(""" { "properties": { "array": { "properties": { "value": { "type": "integer" } } } } } \s"""); this.sourceToParse = new SourceToParse(UUIDs.randomBase64UUID(), new BytesArray(generateDocWithArray(arraySize)), XContentType.JSON); } @Benchmark public List<LuceneDocument> benchmarkDocumentWithLargeArray() { return mapperService.documentMapper().parse(sourceToParse).docs(); } private static String generateDocWithArray(int arraySize) { StringBuilder sb = new StringBuilder(); sb.append("{ \"array\": ["); for (int i = 1; i <= arraySize; i++) { sb.append("{\"value\": ").append(i).append("}"); if (i < arraySize) { sb.append(","); } } sb.append("]}"); return sb.toString(); } }

Add control for array size

9e3aad9

elasticsearchmachine added the v9.2.0 label Jun 25, 2025

elasticsearchmachine and others added 5 commits June 25, 2025 10:34

[CI] Auto commit changes from spotless

d4c2c2b

Merge branch 'main' into check-array-size-at-parse-time

08f6e68

apply spot

b70d02d

Merge branch 'check-array-size-at-parse-time' of github.com:drempapis…

369dc05

…/elasticsearch into check-array-size-at-parse-time

update for test execution

9d6e147

drempapis added the >non-issue label Jun 25, 2025

elasticsearchmachine and others added 9 commits June 25, 2025 13:15

[CI] Auto commit changes from spotless

f7437d8

Merge branch 'main' into check-array-size-at-parse-time

2234112

update

5e2a748

Merge branch 'check-array-size-at-parse-time' of github.com:drempapis…

d26055a

…/elasticsearch into check-array-size-at-parse-time

Merge branch 'main' into check-array-size-at-parse-time

bbb9340

Add test for the new control

6f301f5

Merge branch 'check-array-size-at-parse-time' of github.com:drempapis…

5aa4f46

…/elasticsearch into check-array-size-at-parse-time

Remove explicit mapping

8fc0ec5

[CI] Auto commit changes from spotless

01e2aa0

javanna reviewed Jun 26, 2025

View reviewed changes

drempapis added Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch :Search Foundations/Search Catch all for Search Foundations labels Jun 26, 2025

drempapis added 5 commits July 2, 2025 10:24

Merge branch 'main' into check-array-size-at-parse-time

1020dff

update after review

d21ca69

Merge branch 'main' into check-array-size-at-parse-time

26c5ccb

Merge branch 'main' into check-array-size-at-parse-time

ad4c0a3

Merge branch 'main' into check-array-size-at-parse-time

24286f4

drempapis self-assigned this Jul 3, 2025

n1v0lg mentioned this pull request Jul 3, 2025

[CI] CoreWithSecurityClientYamlTestSuiteIT class failing #130156

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Check array size at parse time #129997

Check array size at parse time #129997

Uh oh!

drempapis commented Jun 25, 2025

Uh oh!

javanna Jun 26, 2025

Uh oh!

drempapis Jun 26, 2025

Uh oh!

javanna Jun 27, 2025

Uh oh!

drempapis Jul 2, 2025 •

edited

Loading

Uh oh!

javanna Jun 26, 2025

Uh oh!

drempapis Jun 26, 2025 •

edited

Loading

Uh oh!

javanna Jun 27, 2025

Uh oh!

drempapis Jul 1, 2025 •

edited

Loading

Uh oh!

Uh oh!

Check array size at parse time #129997

Are you sure you want to change the base?

Check array size at parse time #129997

Uh oh!

Conversation

drempapis commented Jun 25, 2025

Uh oh!

javanna Jun 26, 2025

Choose a reason for hiding this comment

Uh oh!

drempapis Jun 26, 2025

Choose a reason for hiding this comment

Uh oh!

javanna Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

drempapis Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

javanna Jun 26, 2025

Choose a reason for hiding this comment

Uh oh!

drempapis Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

javanna Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

drempapis Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

drempapis Jul 2, 2025 •

edited

Loading

drempapis Jun 26, 2025 •

edited

Loading

drempapis Jul 1, 2025 •

edited

Loading