-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Adding EcsNamespaceProcessor
#125699
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Adding EcsNamespaceProcessor
#125699
Conversation
Hi @eyalkoren, I've created a changelog YAML for you. |
Pinging @elastic/es-data-management (Team:Data Management) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Eyal, I left some initial comments. I had a question about the way that we nest a document with an existing attributes
field into the OTel attributes. Is this something we want to do? For example this doc:
{
"attributes": {
"a": "b",
"c": [1, 2, 3]
},
"log.level": "1234"
}
Becomes, after processing:
{
"resource": {
"attributes": {}
},
"severity_text": "1234",
"attributes": {
"attributes.a": "b",
"attributes.c": [1, 2, 3]
}
}
Is that the desired behavior for an existing attributes
field?
modules/ingest-ecs/src/main/java/org/elasticsearch/ingest/ecs/EcsNamespacingProcessor.java
Outdated
Show resolved
Hide resolved
modules/ingest-ecs/src/main/java/org/elasticsearch/ingest/ecs/EcsNamespacingProcessor.java
Outdated
Show resolved
Hide resolved
modules/ingest-ecs/src/main/java/org/elasticsearch/ingest/ecs/EcsNamespacingProcessor.java
Outdated
Show resolved
Hide resolved
modules/ingest-ecs/src/main/java/org/elasticsearch/ingest/ecs/EcsNamespacingProcessor.java
Outdated
Show resolved
Hide resolved
modules/ingest-ecs/src/main/java/org/elasticsearch/ingest/ecs/EcsNamespacingProcessor.java
Outdated
Show resolved
Hide resolved
modules/ingest-ecs/src/main/java/org/elasticsearch/ingest/ecs/EcsNamespacingProcessor.java
Outdated
Show resolved
Hide resolved
modules/ingest-ecs/src/main/java/org/elasticsearch/ingest/ecs/EcsNamespacingProcessor.java
Outdated
Show resolved
Hide resolved
modules/ingest-ecs/src/main/java/org/elasticsearch/ingest/ecs/EcsNamespacingProcessor.java
Outdated
Show resolved
Hide resolved
modules/ingest-ecs/src/main/java/org/elasticsearch/ingest/ecs/EcsNamespacingProcessor.java
Outdated
Show resolved
Hide resolved
modules/ingest-ecs/src/test/java/org/elasticsearch/ingest/ecs/EcsNamespacingProcessorTests.java
Outdated
Show resolved
Hide resolved
Answering the general question:
We started off by trying to be more "clever" about this and merge OTel with non-OTel. Then we had to handle lots of corner cases, like:
And so forth. So last week we decided to change the way we think about it: a document is either sent by an OTel-compliant shipper, or not. If not, no reason to treat the original fields as if they have the OTel sematics. So even if it has a field that has an OTel name, we can consider it to be by chance and namespace it. If that's so- no reason to complicate things for the unlikely event where non-OTel documents contain fields with intended OTel semantics. |
Co-authored-by: Lee Hinman <[email protected]>
Co-authored-by: Lee Hinman <[email protected]>
Co-authored-by: Lee Hinman <[email protected]>
…to ECS-namespacing-processor
Assert that it's the sameInstance, and use an immutable map of immutable maps in order to be sure that nothing changed.
modules/ingest-ecs/src/main/java/org/elasticsearch/ingest/ecs/EcsNamespaceProcessor.java
Outdated
Show resolved
Hide resolved
Thanks for the review @joegallo 🙏
I am waiting for a finalized list of keys that need to go into
gradle is definitely not a strength of mine and Elasticsearch is not the best project to get familiar with its basics 🙂 |
We discussed that we want to avoid making this processor available to users too quickly, so let's not expose it on serverless right away. According to @joegallo it's possible to filter based on the package name to achieve this. We still plan to make this available eventually, but since discussions around details of the central /logs endpoint are still ongoing and could affect this, it makes sense to not move unnecessary quickly here, since the value of the processor will only be achieved with the combination of logs endpoint and streams API/UI changes anyway. |
+1 to @flash1293's comment that we're pretty sure we can filter this out from serverless for the time being. We'll probably want to flag that this is a tech preview (or some such) in the ordinary documentation (and that documentation may come via a different PR, I'm not sure what the intent is necessarily). |
There aren't any tests of this on 8.19, so there's nothing to test this against in a yaml-rest-compat-test way.
Heh, I know exactly what you mean. 😄 I poked and prodded things for a little while this afternoon, though, and I got it to stop being unhappy with us for now at least. I'm going to watch this new build to see if we get to a green check mark from CI -- there was a serverless failure that kept popping up which hopefully will go away now. 🤷 |
@joegallo I applied all requested changes from former reviews in c66663b In addition, I added the new logic related to more accurate namespacing of
The discovery of ECS fields that correspond semconv resource attributes works as follows:
For now, step 1 was implemented through web crawling of the semconv repo through GitHub APIs. The API usage is straightforward and convenient, however it turned out to have two cons in order to be practical:
I will try a different approach - cloning the entire repository and do the crawling on the local disk and let's see if this makes things better. Some feedback I that you can already provide is:
|
Namespacing algorithm [EDIT 1]:
resource
exists as a key and the value is a mapresource
either doesn't contain anattributes
field, or contains anattributes
field of type mapscope
is either missing or a mapattributes
is either missing or a mapbody
is either missing or a mapbody
either doesn't contain atext
field, or contains atext
field of typeString
body
either doesn't contain astructured
field, or contains astructured
field that is not of typeString
attributes
mapresource
map with one entry of whichattributes
is the key and a new map as its valueattributes
map:attributes
,resource
,span_id
,body
,severity_text
andtrace_id
attributes
andresource
maps as top level fieldsspan.id
,log.level
) to OTel-compliant names: for each, look for a value first in the nested form and if not found look for a top level dotted field. The first value that is found is used for the renamed fieldagent.
or isagent
or start withcloud.
or iscloud
or start withhost.
or ishost
toresource.attributes
@timestamp
,trace_id
,span_id
,severity_text
,body
,attributes
,resource
andscope
to the newattributes
mapattributes
andresource.attributes
maps