Adding `EcsNamespaceProcessor` #125699

eyalkoren · 2025-03-26T17:14:03Z

Namespacing algorithm [EDIT 1]:

start by checking whether the document is OTel or not. A document is considered OTel if:
- resource exists as a key and the value is a map
- resource either doesn't contain an attributes field, or contains an attributes field of type map
- scope is either missing or a map
- attributes is either missing or a map
- body is either missing or a map
- body either doesn't contain a text field, or contains a text field of type String
- body either doesn't contain a structured field, or contains a structured field that is not of type String
if it is OTel - return as is
if it is not OTel:
- create a new attributes map
- create new resource map with one entry of which attributes is the key and a new map as its value
- move the following top level fields (if they exist) to the new attributes map: attributes, resource, span_id, body, severity_text and trace_id
- add the new attributes and resource maps as top level fields
- rename special keys (e.g. span.id, log.level) to OTel-compliant names: for each, look for a value first in the nested form and if not found look for a top level dotted field. The first value that is found is used for the renamed field
- move all keys that start with agent. or is agent or start with cloud. or is cloud or start with host. or is host to resource.attributes
- move all remaining top level fields, other than @timestamp, trace_id, span_id, severity_text, body, attributes, resource and scope to the new attributes map
- flatten all fields that are not arrays in attributes and resource.attributes maps

…cessor

elasticsearchmachine · 2025-03-27T16:07:03Z

Hi @eyalkoren, I've created a changelog YAML for you.

elasticsearchmachine · 2025-03-27T16:07:27Z

Pinging @elastic/es-data-management (Team:Data Management)

dakrone

Thanks Eyal, I left some initial comments. I had a question about the way that we nest a document with an existing attributes field into the OTel attributes. Is this something we want to do? For example this doc:

{
  "attributes": {
    "a": "b",
    "c": [1, 2, 3]
  },
  "log.level": "1234"
}

Becomes, after processing:

{
  "resource": {
    "attributes": {}
  },
  "severity_text": "1234",
  "attributes": {
    "attributes.a": "b",
    "attributes.c": [1, 2, 3]
  }
}

Is that the desired behavior for an existing attributes field?

modules/ingest-ecs/src/main/java/org/elasticsearch/ingest/ecs/EcsNamespacingProcessor.java

modules/ingest-ecs/src/test/java/org/elasticsearch/ingest/ecs/EcsNamespacingProcessorTests.java

eyalkoren · 2025-03-30T03:22:21Z

Answering the general question:

I had a question about the way that we nest a document with an existing attributes field into the OTel attributes. Is this something we want to do?

We started off by trying to be more "clever" about this and merge OTel with non-OTel. Then we had to handle lots of corner cases, like:

if attributes exists and is not a map - it needs to go into a new attributes map
if resource exists and is not a map - it needs to go into a new attributes map
if there was attributes map before that included a resource entry, and the top-level resource is not a map, we need to make sure that the new attributes.resource entry's value becomes an array that includes both values
same for resource.attribtues
if both span.id and span_id exist - we need to do something about it, for example: make the new span_id an array with two values

And so forth.

So last week we decided to change the way we think about it: a document is either sent by an OTel-compliant shipper, or not. If not, no reason to treat the original fields as if they have the OTel sematics. So even if it has a field that has an OTel name, we can consider it to be by chance and namespace it. If that's so- no reason to complicate things for the unlikely event where non-OTel documents contain fields with intended OTel semantics.

…cessor

Co-authored-by: Lee Hinman <[email protected]>

…to ECS-namespacing-processor

Assert that it's the sameInstance, and use an immutable map of immutable maps in order to be sure that nothing changed.

modules/ingest-ecs/src/main/java/org/elasticsearch/ingest/ecs/EcsNamespaceProcessor.java

eyalkoren · 2025-04-10T06:34:04Z

Thanks for the review @joegallo 🙏
Note that there are few more additions I need to make to this PR:

it was decided to add more namespaces to attribute.resource, it's just not finalized yet exactly which
after adding those - I will update and add tests accordingly
I need to rename the processor type to normalize_to_otel and the processor class name to NormalizeToOtelProcessor
add it to documentation

I am waiting for a finalized list of keys that need to go into resource.attributes and then I will wrap it up and request a final review, which should be quite trivial.

When I run ./gradlew ":modules:ingest-ecs:check for this branch, it fails -- perhaps there's something incorrect in the build.gradle for it (or something missing, 🤷)?

gradle is definitely not a strength of mine and Elasticsearch is not the best project to get familiar with its basics 🙂
If you have any idea what doesn't seem right - let me know. Otherwise, I'll find what it is.

flash1293 · 2025-04-24T01:57:01Z

We discussed that we want to avoid making this processor available to users too quickly, so let's not expose it on serverless right away. According to @joegallo it's possible to filter based on the package name to achieve this.

We still plan to make this available eventually, but since discussions around details of the central /logs endpoint are still ongoing and could affect this, it makes sense to not move unnecessary quickly here, since the value of the processor will only be achieved with the combination of logs endpoint and streams API/UI changes anyway.

joegallo · 2025-04-25T20:18:03Z

+1 to @flash1293's comment that we're pretty sure we can filter this out from serverless for the time being. We'll probably want to flag that this is a tech preview (or some such) in the ordinary documentation (and that documentation may come via a different PR, I'm not sure what the intent is necessarily).

There aren't any tests of this on 8.19, so there's nothing to test this against in a yaml-rest-compat-test way.

joegallo · 2025-04-25T20:29:10Z

gradle is definitely not a strength of mine and Elasticsearch is not the best project to get familiar with its basics 🙂

Heh, I know exactly what you mean. 😄

I poked and prodded things for a little while this afternoon, though, and I got it to stop being unhappy with us for now at least. I'm going to watch this new build to see if we get to a green check mark from CI -- there was a serverless failure that kept popping up which hopefully will go away now. 🤷

…cessor

eyalkoren · 2025-05-05T16:04:51Z

@joegallo I applied all requested changes from former reviews in c66663b

In addition, I added the new logic related to more accurate namespacing of resource.attributes in 7d536a7.
The changes include:

a small algorithm change - we are now first moving all fields to attributes, then flatten all of them and only then move the required ones to resource.attributes
a static set of all ECS fields that have OTel semconv counterparts that are defined as resource attribtues
a mechanism to automatically detect which ECS fields correspond semconv resource attributes (through tests)

The discovery of ECS fields that correspond semconv resource attributes works as follows:

discovering all semconv resource attributes - scanning the entire semconv repository, parsing all its yaml files and extracting attributes from all groups that are defined as type: resource
extracting all ECS fields that have OTel semconv counterparts, based on the otel attributes in the ECS field definition
finding the intersection between these sets
adding the agent.* fields

For now, step 1 was implemented through web crawling of the semconv repo through GitHub APIs. The API usage is straightforward and convenient, however it turned out to have two cons in order to be practical:

the test needs to be run with a GitHub token through environment variable in order to avoid rate limits
it requires quite a lot of HTTP requests and in order to run in a matter of seconds rather than minutes, I had to considerably complicate it with concurrency.

I will try a different approach - cloning the entire repository and do the crawling on the local disk and let's see if this makes things better.

Some feedback I that you can already provide is:

is this method of using statically compiled set of resource attributes the best way? should we consider using a file instead? the advantage of a file it that it's easier to automatically create when updates are required
if file - where to locate it? what format should it have?
do we need to maintain versions of this set?
how do we make sure that CI normally skips all tests in ResourceAttributesTests and periodically (e.g. nightly) runs ResourceAttributesTests#testAttributesSetUpToDate, notifying specific channels on errors? We have something that runs EcsDynamicTemplatesIT nightly and notifies us whenever out dynamic templates are not in sync with the latest ECS. I am not sure how it works in buildkite, all I found is ecs-dynamic-template-tests.yml, but I am not sure how this is used

…to ECS-namespacing-processor

…cessor

Adding EcsNamespacingProcessor

30b0ed3

elasticsearchmachine added the v9.1.0 label Mar 26, 2025

eyalkoren added 8 commits March 27, 2025 07:49

Adding module-info

b96cc0c

Merge remote-tracking branch 'upstream/main' into ECS-namespacing-pro…

469df61

…cessor

Exposing and testing the processor

2bd819f

Add test and some algorithm fixes

1c2a670

Making scope non-mandatory

904c19c

Minimize dependencies

bd75b06

Extending REST tests

50f3c4d

Merge remote-tracking branch 'upstream/main' into ECS-namespacing-pro…

cb0dcee

…cessor

eyalkoren self-assigned this Mar 27, 2025

eyalkoren added >feature :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP labels Mar 27, 2025

eyalkoren marked this pull request as ready for review March 27, 2025 16:06

Update docs/changelog/125699.yaml

f68cf93

elasticsearchmachine added the Team:Data Management Meta label for data/management team label Mar 27, 2025

github-actions bot deployed to docs-preview March 27, 2025 16:07 View deployment

joegallo requested review from dakrone and joegallo March 27, 2025 18:37

dakrone requested changes Mar 27, 2025

View reviewed changes

eyalkoren and others added 2 commits March 30, 2025 09:53

Merge remote-tracking branch 'upstream/main' into ECS-namespacing-pro…

9dbe94b

…cessor

instanceOf with pattern matching

79bf683

Co-authored-by: Lee Hinman <[email protected]>

github-actions bot had a problem deploying to docs-preview March 30, 2025 07:05 Failure

instanceOf with pattern matching

dbc4d4a

Co-authored-by: Lee Hinman <[email protected]>

github-actions bot had a problem deploying to docs-preview March 30, 2025 07:06 Failure

eyalkoren and others added 2 commits March 30, 2025 10:06

revert constants usage

d160fe6

Co-authored-by: Lee Hinman <[email protected]>

Merge remote-tracking branch 'eyalkoren/ECS-namespacing-processor' in…

dfe33fc

…to ECS-namespacing-processor

Rewrite this test

65f7ea2

Assert that it's the sameInstance, and use an immutable map of immutable maps in order to be sure that nothing changed.

github-actions bot deployed to docs-preview April 9, 2025 21:12 View deployment

joegallo reviewed Apr 9, 2025

View reviewed changes

modules/ingest-ecs/src/main/java/org/elasticsearch/ingest/ecs/EcsNamespaceProcessor.java Outdated Show resolved Hide resolved

Merge branch 'main' into ECS-namespacing-processor

3639c77

github-actions bot deployed to docs-preview April 25, 2025 20:19 View deployment

Drop yaml-rest-compat-test

f9421ac

There aren't any tests of this on 8.19, so there's nothing to test this against in a yaml-rest-compat-test way.

github-actions bot deployed to docs-preview April 25, 2025 20:27 View deployment

elasticsearchmachine added the serverless-linked Added by automation, don't add manually label Apr 25, 2025

eyalkoren added 2 commits April 28, 2025 11:55

Refactor ECS Namespacing to Normalize to OTel

b2dd61d

Apply review comments

c66663b

github-actions bot deployed to docs-preview April 28, 2025 10:26 View deployment

eyalkoren added 2 commits May 5, 2025 15:55

Adding accurate resource attributes handling

7d536a7

Merge remote-tracking branch 'upstream/main' into ECS-namespacing-pro…

9493a73

…cessor

github-actions bot deployed to docs-preview May 5, 2025 12:57 View deployment

Suppress warning for forbidden usage of System.out in tests

bb538a6

github-actions bot deployed to docs-preview May 5, 2025 14:46 View deployment

Eliminating some more forbidden APIs

e7c1f9d

github-actions bot deployed to docs-preview May 5, 2025 15:27 View deployment

[CI] Auto commit changes from spotless

605bdc9

github-actions bot deployed to docs-preview May 5, 2025 15:36 View deployment

eyalkoren added 4 commits May 6, 2025 07:50

Reverting shameful refactoring errors

e65c0ef

Merge remote-tracking branch 'eyalkoren/ECS-namespacing-processor' in…

916eaca

…to ECS-namespacing-processor

Merge remote-tracking branch 'upstream/main' into ECS-namespacing-pro…

07e78a2

…cessor

Disabling ResourceAttributesTests

5c34f94

github-actions bot deployed to docs-preview May 6, 2025 04:55 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding `EcsNamespaceProcessor` #125699

Adding `EcsNamespaceProcessor` #125699

eyalkoren commented Mar 26, 2025 •

edited

Loading

elasticsearchmachine commented Mar 27, 2025

elasticsearchmachine commented Mar 27, 2025

dakrone left a comment

eyalkoren commented Mar 30, 2025 •

edited

Loading

eyalkoren commented Apr 10, 2025

flash1293 commented Apr 24, 2025

joegallo commented Apr 25, 2025

joegallo commented Apr 25, 2025

eyalkoren commented May 5, 2025 •

edited

Loading

Adding EcsNamespaceProcessor #125699

Are you sure you want to change the base?

Adding EcsNamespaceProcessor #125699

Conversation

eyalkoren commented Mar 26, 2025 • edited Loading

elasticsearchmachine commented Mar 27, 2025

elasticsearchmachine commented Mar 27, 2025

dakrone left a comment

Choose a reason for hiding this comment

eyalkoren commented Mar 30, 2025 • edited Loading

eyalkoren commented Apr 10, 2025

flash1293 commented Apr 24, 2025

joegallo commented Apr 25, 2025

joegallo commented Apr 25, 2025

eyalkoren commented May 5, 2025 • edited Loading

Adding `EcsNamespaceProcessor` #125699

Adding `EcsNamespaceProcessor` #125699

eyalkoren commented Mar 26, 2025 •

edited

Loading

eyalkoren commented Mar 30, 2025 •

edited

Loading

eyalkoren commented May 5, 2025 •

edited

Loading