fix: limit volume and frequency of persisted patterns #18362
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What this PR does / why we need it:
This PR addresses the volume of persisted patterns by delaying persistence until they are flushed from the ingesters. Previously we were persisting patterns during training. This method is ok for logs that are well structured and have few patterns, but is untenable for poorly structured logs with lots of patterns, as it produces far too much volume.
The pattern ingesters already have logic for handling high pattern churn, such as temporarily disabling detection of new patterns, and evicting infrequently used patterns. By delaying persistence until chunks are flushed, we're able to leverage all of this logic.
The downside, of course, is data durability. Patterns are stored in memory, and by default not flushed until 3hr. This PR will flush patterns on graceful shutdown, however unexpected container kills will cause data loss. Since we're just dealing with patterns that should be acceptable for now.
Checklist
CONTRIBUTING.md
guide (required)feat
PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.docs/sources/setup/upgrade/_index.md
deprecated-config.yaml
anddeleted-config.yaml
files respectively in thetools/deprecated-config-checker
directory. Example PR