-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Add ability to redirect ingestion failures on data streams to a failure store #126973
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ability to redirect ingestion failures on data streams to a failure store #126973
Conversation
Hi @jbaiera, I've created a changelog YAML for you. |
Hi @jbaiera, I've updated the changelog YAML for you. Note that since this PR is labelled |
@elasticmachine update branch |
@elasticmachine update branch |
@elasticmachine update branch |
Pinging @elastic/es-data-management (Team:Data Management) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I left a few minor comments
// Should be removed after backport | ||
PARSER.declareBoolean(ConstructingObjectParser.optionalConstructorArg(), FAILURE_STORE_FIELD); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why should this be removed after backport?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an old field - we use the data stream options to determine if failure store is enabled now. Currently, the field is read and used as a fall back if data stream options are not present, but I think that's mostly for BWC testing during development. I think logic-wise we could simply ignore the field if present because it would always be overridden by data stream options. The only situation where this field is relevant is mixed clusters with 8.19 nodes and very old nodes running with the feature flag on which we would not support.
I opened #127071 for this. I also cleaned up the serialization logic a little.
…re store (elastic#126973) Removes the feature flags and guards that prevent the new failure store functionality from operating in production runtimes.
…a failure store (#126973) (#127546) * Add ability to redirect ingestion failures on data streams to a failure store (#126973) Removes the feature flags and guards that prevent the new failure store functionality from operating in production runtimes. * Fix build * [CI] Auto commit changes from spotless * Fix build * Fix build * Fix build * [CI] Auto commit changes from spotless --------- Co-authored-by: elasticsearchmachine <[email protected]>
Documents that encountered ingest pipeline failures or mapping conflicts would previously be returned to the client as errors in the bulk and index operations. Many client applications are not equipped to respond to these failures. This leads to the failed documents often being dropped by the client which cannot hold the broken documents indefinitely. In many end user workloads, these failed documents represent events that could be critical signals for observability or security use cases.
To help mitigate this problem, data streams now maintain a "failure store" which is used to accept and hold documents that fail to be ingested due to preventable configuration errors. The data stream's failure store operates like a separate set of backing indices with their own mappings and access patterns that allow Elasticsearch to accept documents that would otherwise be rejected due to unhandled ingest pipeline exceptions or mapping conflicts.
Users can enable redirection of ingest failures to the failure store on new data streams by specifying it in the new
data_stream_options
field inside of a component or index template:Existing data streams can be configured with the new data stream
_options
endpoint:When redirection is enabled, any ingestion related failures will be captured in the failure store if the cluster is able to, along with the timestamp that the failure occurred, details about the error encountered, and the document that could not be ingested. Since failure stores are a kind of Elasticsearch index, we can search the data stream for the failures that it has collected. The failures are not shown by default as they are stored in different indices than the normal data stream data. In order to retrieve the failures, we use the
_search
API along with a new bit of index pattern syntax, the::
selector.This index syntax informs the search operation to target the indices in its failure store instead of its backing indices. It can be mixed in a number of ways with other index patterns to include their failure store indices in the search operation:
This PR removes the feature flags and guards that prevent the new failure store functionality from operating in production runtimes.