Skip to content

Better behavior around ignore_missing when databases aren't available #127520

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

joegallo
Copy link
Contributor

@joegallo joegallo commented Apr 29, 2025

Closes #87345

The documentation for ignore_missing says:

If true and field does not exist, the processor quietly exits without modifying the document

But if the database isn't available, then we unconditionally add tags to the document regardless of the value of the ignore_missing configuration, which doesn't seem correct.

This PR aligns the behavior in the database-unavailable case to better match the behavior in the database-available case (and, by extension, to better match the docs).

and (formatting nit) don't have a period at the end of the message
either
to be more in line with the data-available behavior -- that is,
respect ignore_missing and throw an exception on an invalid source
field.
@joegallo joegallo added >bug :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP Team:Data Management Meta label for data/management team v8.18.1 v8.19.0 v9.0.1 v9.1.0 labels Apr 29, 2025
@joegallo joegallo requested a review from parkertimmins April 29, 2025 16:47
@joegallo
Copy link
Contributor Author

See especially #87345 (comment)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@elasticsearchmachine
Copy link
Collaborator

Hi @joegallo, I've created a changelog YAML for you.

@joegallo
Copy link
Contributor Author

Per the description on #87345, a good test for this behavior is:

POST /_ingest/pipeline/_simulate
{
  "pipeline": {
    "processors": [
      {
        "geoip": {
          "field": "client.ip",
          "target_field": "client.geo",
          "ignore_missing": true
        }
      }
    ]
  },
  "docs": [
    {
      "_index": "index",
      "_source": {}
    },
    {
      "_index": "index",
      "_source": {"client": {"ip": "8.8.8.8"}}
    }
  ]
}

Without this PR, the result of this is (note the tags on the first document):

{
  "docs" : [
    {
      "doc" : {
        "_index" : "index",
        "_version" : "-3",
        "_id" : "_id",
        "_source" : {
          "tags" : [
            "_geoip_database_unavailable_GeoLite2-City.mmdb"
          ]
        },
        "_ingest" : {
          "timestamp" : "2025-04-29T16:49:01.40054Z"
        }
      }
    },
    {
      "doc" : {
        "_index" : "index",
        "_version" : "-3",
        "_id" : "_id",
        "_source" : {
          "client" : {
            "ip" : "8.8.8.8"
          },
          "tags" : [
            "_geoip_database_unavailable_GeoLite2-City.mmdb"
          ]
        },
        "_ingest" : {
          "timestamp" : "2025-04-29T16:49:01.400909Z"
        }
      }
    }
  ]
}

But with this PR, the result is (note the lack of tags on the first document):

{
  "docs" : [
    {
      "doc" : {
        "_index" : "index",
        "_version" : "-3",
        "_id" : "_id",
        "_source" : { },
        "_ingest" : {
          "timestamp" : "2025-04-29T16:54:23.761992Z"
        }
      }
    },
    {
      "doc" : {
        "_index" : "index",
        "_version" : "-3",
        "_id" : "_id",
        "_source" : {
          "client" : {
            "ip" : "8.8.8.8"
          },
          "tags" : [
            "_geoip_database_unavailable_GeoLite2-City.mmdb"
          ]
        },
        "_ingest" : {
          "timestamp" : "2025-04-29T16:54:23.762687Z"
        }
      }
    }
  ]
}

@joegallo joegallo added the auto-backport Automatically create backport pull requests when merged label Apr 29, 2025
@dakrone
Copy link
Member

dakrone commented Apr 29, 2025

Can you also add a blurb to the docs that clarifies the behavior of ignore_missing when the database is missing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged >bug :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP Team:Data Management Meta label for data/management team v8.18.2 v8.19.0 v9.0.2 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

geoip processor adds _geoip_database_unavailable_* tag to docs with missing source field
4 participants