Skip to content

Conditional ingest processors should respect ignore_failure for errors in if condition #126005

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
parkertimmins opened this issue Mar 31, 2025 · 3 comments
Labels
>bug :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP low-risk An open issue or test failure that is a low risk to future releases Team:Data Management Meta label for data/management team

Comments

@parkertimmins
Copy link
Contributor

When an error occurs in the if statement of an ingest processor, and the processor has ignore_failure=true, the failure is not ignored, and the on_failure processor is called. This behavior seems incorrect. Example:

Given the following pipeline:

{
    "processors":
    [
        {
            "set":
            {
                "if": "ctx.does_not_exist.length() > 0",
                "ignore_failure": true,
                "field": "wont_get_set",
                "value": true
            }
        },
        {
            "set":
            {
                "field": "set_after",
                "value": true
            }
        }
    ],
    "on_failure":
    [
        {
            "set":
            {
                "field": "error",
                "value": "{{ _ingest.on_failure_message }}"
            }
        }
    ]
}'

And the following simulate ingest:

curl -s -X POST "localhost:9200/_ingest/pipeline/some_pipeline/_simulate?pretty" -H 'Content-Type: application/json' -d'
{
  "docs": [
    {
        "_source": { "field": 1 }
    }
  ]
}
'

We get the following output:

{
  "docs": [
    {
      "doc": {
        "_index": "_index",
        "_version": "-3",
        "_id": "_id",
        "_source": {
          "field": 1,
          "error": "cannot access method/field [length] from a null def reference"
        },
        "_ingest": {
          "timestamp": "2025-03-31T22:42:12.895010791Z"
        }
      }
    }
  ]
}

This seems incorrect. Though the if statement in the first processor should fail because the variable does_not_exist does not exist, ignore_failure was set to true. So the following set processor should have run, and set_after field should be set. The failure processor should not have run.

The results should instead have been:

{
    "field": 1,
    "set_after": true
}

Interestingly, if verbose is set to true in the simulate call, we get the correct result. I did a bit of debugging, and see that without verbose, the top-level CompoundProcessor has ignore_failure=false. Whereas, with verbose, the top-level TrackingResultProcessor has ignore_failure=true. It appears this this is because the tracking result processor unwraps the conditional processor here, bringing the ignore_failure value up to the top-level processor.

@parkertimmins parkertimmins added the :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP label Mar 31, 2025
@elasticsearchmachine elasticsearchmachine added the Team:Data Management Meta label for data/management team label Mar 31, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@parkertimmins parkertimmins added >bug low-risk An open issue or test failure that is a low risk to future releases labels Mar 31, 2025
@joegallo
Copy link
Contributor

joegallo commented Apr 8, 2025

if (onFailureProcessors.size() > 0 || ignoreFailure) {
processor = new OnFailureProcessor(ignoreFailure, processor, onFailureProcessors);
}
if (conditionalScript != null) {
processor = new ConditionalProcessor(tag, description, conditionalScript, scriptService, processor);
}

If we changed the order of these two if statements, then we'd have the OnFailureProcessor around the ConditionalProcessor rather than vice versa (and that would have the effect of the "if" being covered by the "ignore_failure"). There's probably some care that would need to be taken to make sure that we still end up counting stats correctly -- I think there's at least a couple of places in the code where we assume that the wrapping is conditional-around-onfailure rather than the opposite.

I'm on the fence about changing this, though, but I am convincible -- in the above example, the problem is that the "if" is poorly written and that it should be fixed (but probably we could come up with a better example where it isn't poorly written). We might want to have a discussion about what ignore_failure really means.

Also, I'm aghast that when I changed the order of the implementation, the tests that I habitually run for smoketesting ingest all passed -- whether we change this behavior or not, we should have tests that cover the behavior (and I'm not sure we currently do).

@joegallo
Copy link
Contributor

We talked through this one, and given the absence of user complaints, we don't think it's worth changing the behavior (right now). Let's keep this ticket open so that user complaints can find their way here if there are any.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP low-risk An open issue or test failure that is a low risk to future releases Team:Data Management Meta label for data/management team
Projects
None yet
Development

No branches or pull requests

3 participants