-
Notifications
You must be signed in to change notification settings - Fork 25.3k
Add XmlProcessor initial implementation #130337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🔍 Preview links for changed docs: 🔔 The preview site may take up to 3 minutes to finish building. These links will become live once it completes. |
5bd50d5
to
95df637
Compare
95df637
to
67dd264
Compare
Pinging @elastic/es-data-management (Team:Data Management) |
Hi @marc-gr, I've created a changelog YAML for you. |
- Replace XMLStreamReader with SAX parser + DOM for XPath support - Add XPath extraction, namespaces, strict parsing, content filtering - New options: force_array, force_content, remove_namespaces, store_xml - Enhanced security with XXE protection and pre-compiled XPath expressions - Full test coverage and updated documentation
🔍 Preview links for changed docs |
- Fix test assertion for remove_namespaces feature - Use StandardCharsets.UTF_8 instead of string literal - Replace string reference comparison with isEmpty() - Move regex pattern to static final field for performance
… into feat/xml-processor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Adds an initial implementation of a new XmlProcessor
to parse XML input into JSON-like structures with feature parity to Logstash’s XML filter, alongside configuration options, factory validation, and documentation.
- Introduce
XmlProcessor
with streaming SAX parsing, optional DOM building for XPath, and secure defaults. - Add end-to-end tests (
XmlProcessorTests
) and factory validation tests (XmlProcessorFactoryTests
). - Register the processor in the plugin, update module-info, documentation, and changelog.
Reviewed Changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/XmlProcessor.java | New XML parsing processor implementation |
modules/ingest-common/src/test/java/org/elasticsearch/ingest/common/XmlProcessorTests.java | End-to-end tests for XML parsing behavior |
modules/ingest-common/src/test/java/org/elasticsearch/ingest/common/XmlProcessorFactoryTests.java | Tests for factory config and validation |
modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/IngestCommonPlugin.java | Register XmlProcessor in the plugin registry |
modules/ingest-common/src/main/java/module-info.java | Add requires java.xml for XML APIs |
docs/reference/enrich-processor/xml-processor.md | Documentation for XML processor |
docs/reference/enrich-processor/toc.yml | Add entry for xml-processor.md |
docs/reference/enrich-processor/index.md | Include xml processor in the index |
docs/changelog/130337.yaml | Changelog entry for PR |
Comments suppressed due to low confidence (1)
docs/reference/enrich-processor/xml-processor.md:9
- The implementation actually uses a streaming SAX parser with optional DOM building for XPath. Update this description to reflect the streaming-based approach for accurate documentation.
Parses XML documents and converts them to JSON objects using a DOM parser. This processor efficiently handles XML data with a single-parse architecture that supports both structured output and XPath extraction for optimal performance.
This PR creates a new XML processor that achieves feature parity with Logstash's XML filter.
⚙️ Configuration Options
🏗️ Architecture
📚 Documentation
Documentation includes:
Logstash differences
ignore_empty_value
behaves a bit different thansuppress_empty
, but I think it matches better with other processors behavior. It could be adapted, or even add both, but I found it confusing.Closes #97364