Skip to content

Many duplicate dynamic mappers are created when indexing arrays of unknown fields. #117593

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
original-brownbear opened this issue Nov 26, 2024 · 2 comments · May be fixed by #127439
Open

Many duplicate dynamic mappers are created when indexing arrays of unknown fields. #117593

original-brownbear opened this issue Nov 26, 2024 · 2 comments · May be fixed by #127439
Assignees
Labels
>bug priority:normal A label for assessing bug priority to be used by ES engineers :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch

Comments

@original-brownbear
Copy link
Member

original-brownbear commented Nov 26, 2024

In org.elasticsearch.index.mapper.DocumentParserContext#addDynamicMapper we have a TODO that asks to optimize the way we dynamically map object arrays:

        // TODO we may want to stop adding object mappers to the dynamic mappers list: most times they will be mapped when parsing their
        // sub-fields (see ObjectMapper.Builder#addDynamic), which causes extra work as the two variants of the same object field
        // will be merged together when creating the final dynamic update. The only cases where object fields need extra treatment are
        // dynamically mapped objects when the incoming document defines no sub-fields in them:
        // 1) by default, they would be empty containers in the mappings, is it then important to map them?
        // 2) they can be the result of applying a dynamic template which may define sub-fields or set dynamic, enabled or subobjects.
        dynamicMappers.computeIfAbsent(mapper.fullPath(), k -> new ArrayList<>()).add(mapper);

We should address this TODO. The current behavior of potentially creating a bunch of duplicate mapper instances causes a massive spike in heap use for duplicate mapper instances if the first document containing a new (sub-)field via an array of objects is mapped.

@original-brownbear original-brownbear added :Search Foundations/Mapping Index mappings, including merging and defining field types >bug labels Nov 26, 2024
@elasticsearchmachine elasticsearchmachine added the Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch label Nov 26, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-foundations (Team:Search Foundations)

@javanna javanna added the priority:normal A label for assessing bug priority to be used by ES engineers label Mar 21, 2025
@original-brownbear original-brownbear self-assigned this Apr 25, 2025
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue Apr 27, 2025
We can do the merging more eagerly to save on heap at the cost of some cycles (though the overhead should be reasonably limited
because we save a little down the line if all the mappers are the same instance).
Obviously a bit of a dirty trick but this logic is quite brittle now with things like the
vector mapper special case and this seems to me like the shortest path to avoiding runaway heap use.

This yet again shows the need to implement actual mapper equality checks and stronger deduplication
for them to avoid having mapping merging be our only mechanism for deduplication.

closes elastic#117593
@original-brownbear
Copy link
Member Author

relates #73460 to some degree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug priority:normal A label for assessing bug priority to be used by ES engineers :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants