Skip to content

[Failure store] Remove unused write indices from failure store #126611

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
gmarouli opened this issue Apr 10, 2025 · 1 comment
Open

[Failure store] Remove unused write indices from failure store #126611

gmarouli opened this issue Apr 10, 2025 · 1 comment
Labels
:Data Management/Data streams Data streams and their lifecycles >enhancement Team:Data Management Meta label for data/management team

Comments

@gmarouli
Copy link
Contributor

Description

Data stream failure store is using the lazy rollover to reduce its footprint. Let's see how:

  1. Data stream gets created with 1 backing index and 0 failure indices. It also has the rollover_on_write set to true for the failure store.
  2. When a failure is encountered, the data stream will execute the rollover of its failure store because of rollover_on_write, and it will write the failed document.
  3. The data stream now has 1 backing index and 1 failure index and rollover_on_write set to false.
  4. As time passes DLM will roll over the failure index and this will create another failure index.
  5. The data stream now has 1 backing index and 2 failure index and rollover_on_write set to false
  6. As time passes DLM will delete the initial failure index.

So far so good. However, if we encountered no more failure the last failure index created cannot be removed because DLM does not remove a write index, the only option is to roll them over.

Enhancement
We would like to be able to go back to 0 failure indices if we witness no failures for reasonable time period. This reduces the costs of having a failure store when we have no failures.

Draft idea

Introduce a lazy rollover with conditions, meaning that when this request is executed it will set the rollover_on_write only if the conditions are met.

  • This can be used later by DLM to lazy rollover failure store when the conditions apply.
  • The getWriteFailureIndex should only return the write index if the rollover_on_write is set to false.
  • This means that the latest failure index can easily move through the DLM since it's not the write index anymore.
@gmarouli gmarouli added :Data Management/Data streams Data streams and their lifecycles >enhancement labels Apr 10, 2025
@elasticsearchmachine elasticsearchmachine added the Team:Data Management Meta label for data/management team label Apr 10, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Data streams Data streams and their lifecycles >enhancement Team:Data Management Meta label for data/management team
Projects
None yet
Development

No branches or pull requests

2 participants