o365: reduce code complexity, repeated work and garbage generation #14932

efd6 · 2025-08-14T07:09:14Z

Proposed commit message

o365: reduce code complexity, repeated work and garbage generation

This is the first step in making the integration safer to use in a
memory-constrained setting. The current code does the full set of
requests for each element of a set of subscription results, causing a
heavy memory load. The intention is to make use of work lists to reduce
the individual evaluation memory load when the CEL program is running.
The existing code is relatively over-complex, so this reduces that
complexity, while replacing heavily allocating operations with lighter
calls, and omitting completely calls that are not required, and so
eliding the allocation that those calls fruitlessly make.

Rather than making this change a commit in a pull request to convert to
work lists, this optimised version is made a version release since it is
a less risky change, while still improving performance and memory use
characteristics.

The changes that are made include:

* using sprintf to construct strings
* using the exists macro rather than the size of a filter result
* retaining work rather than repeating it
* reducing copying large values
* using optional types to avoid ternaries
* removing redundant calls: flatten and drop_empty on flat, all
  non-empty lists

Checklist

I have reviewed tips for building integrations and this pull request is aligned with them.
I have verified that all data streams collect metrics or logs.
I have added an entry to my package's changelog.yml file.
I have verified that Kibana version constraints are current according to guidelines.
I have verified that any added dashboard complies with Kibana's Dashboard good practices

Author's Checklist

[ ]

How to test this PR locally

Related issues

For o365: rewrite agent CEL collector code to use work lists #14735

Screenshots

elastic-vault-github-plugin-prod · 2025-08-14T07:33:13Z

🚀 Benchmarks report

To see the full report comment with /test benchmark fullreport

elasticmachine · 2025-08-14T08:04:41Z

Pinging @elastic/security-service-integrations (Team:Security-Service Integrations)

kcreddy

LGTM overall. Just minor clarifications.

kcreddy · 2025-08-14T10:00:03Z

packages/o365/data_stream/audit/agent/stream/cel.yml.hbs

+            list_resp.Body.decode_json().as(list_body,
+              (
+                type(list_body) == list && list_body[?0].optMap(e,
+                  e.contentUri != "" && e.contentCreated != ""


Does contentUri and contentCreated needs to be checked for != null as well?

I agree. It looks like it based on the code. I'm not sure if they keys will always be there.

null != absence; does O365 use null to indicate absence? Depending on whether they do, the implementation would be different. I'm going to assume that they don't since the previous code does not tolerate them doing so.

This then becomes

type(list_body) == list && list_body[?0].as(e, e.contentUri.orValue("") != "" && e.contentCreated.orValue("") != "" )

kcreddy · 2025-08-14T10:12:44Z

packages/o365/data_stream/audit/agent/stream/cel.yml.hbs

+                      "content_created_at": (list_resp.StatusCode == 200 && (r_query.?endTime.optMap(t, t.size() > 0)).orValue(false)) ?
+                        (r_query.endTime[0])
+                      :
+                        r_query.?startTime[0].orValue(


Shouldn't r_query.?startTime[0].orValue( also have a condition on .size() > 0 just like endTime has (r_query.?endTime.optMap(t, t.size() > 0)).orValue(false)) ?

Just curious why we handle the same previous size conditions differently here.

I think all the checking is redundant. The API requires either no times (and returns a default time range like last 15 mins or something) or it needs both startTime and endTime. I'm pretty sure we always provide the times.

This is sort of clever, so I was a little reluctant to use it, but the syntax saving I think is worth it. If I missed this elsewhere, I'll fix it.

To clarify the behaviour here, the index into startTime is against the value as an optional type, so the index is allowed event if it would fail, if it's zero-length it would fail, returning an optional.none(), so we don't need to check that the length is safe (the expression here is returning a string and the types agree). In the case of the endTime, we have the same thing, but the types don't agree, on existence, we are a string and on absence we are a bool. This is why we need the comparison. I had briefly thought I'd missed some, but all the "missed" cases are the same shape and so cannot be simplified.

I've gone over the code with Chris' comment in mind, and I believe that is true. Do we want to remove all the checks? While I agree, I'd like to keep the risk level down on this change, so I will not remove them now.

chrisberkhout

Nice refactoring.

I agree with Krishna about the null check probably being needed.

Other points are some minor potential improvements.

chrisberkhout · 2025-08-14T17:16:58Z

packages/o365/data_stream/audit/agent/stream/cel.yml.hbs

+                          code + " " + message
+                        )
+                      )
+                    ).orValue(sub_body),


Would be good to add the content type prefix for the sub_body case. I forgot that last time.

AFAICS that's already there; there is content_type + before the total expression here, which includes this value

That's right. Actually it was right in the previous version too.

I'd seen logs where the combined error messages look like this:

"The subscription was disabled.; Audit.Exchange: StartSubscription IDS AND STACK TRACE...; The subscription was disabled.; DLP.All: StartSubscription IDS AND STACK TRACE..."

It looked like the "The subscription was disabled" part was missing the content type prefix, but now I realize it's not coming from a subscription request. It's there because one of the data requests resulted in an event that coincidentally has the { "error": { "message": ... } structure.

chrisberkhout · 2025-08-14T17:19:48Z

packages/o365/data_stream/audit/agent/stream/cel.yml.hbs

+      )
+    ).do_request().as(sub_resp,
+      sub_resp.Body.decode_json().as(sub_body,
+        (sub_body.?status == optional.of("enabled") || sub_body.?error.code == optional.of("AF20024")) ?


chrisberkhout · 2025-08-14T17:49:30Z

packages/o365/data_stream/audit/agent/stream/cel.yml.hbs

+                      "content_created_at": (list_resp.StatusCode == 200 && (r_query.?endTime.optMap(t, t.size() > 0)).orValue(false)) ?
+                        (r_query.endTime[0])
+                      :
+                        r_query.?startTime[0].orValue(


I think all the checking is redundant. The API requires either no times (and returns a default time range like last 15 mins or something) or it needs both startTime and endTime. I'm pretty sure we always provide the times.

chrisberkhout · 2025-08-14T17:53:26Z

packages/o365/data_stream/audit/agent/stream/cel.yml.hbs

+            list_resp.Body.decode_json().as(list_body,
+              (
+                type(list_body) == list && list_body[?0].optMap(e,
+                  e.contentUri != "" && e.contentCreated != ""


I agree. It looks like it based on the code. I'm not sure if they keys will always be there.

This is the first step in making the integration safer to use in a memory-constrained setting. The current code does the full set of requests for each element of a set of subscription results, causing a heavy memory load. The intention is to make use of work lists to reduce the individual evaluation memory load when the CEL program is running. The existing code is relatively over-complex, so this reduces that complexity, while replacing heavily allocating operations with lighter calls, and omitting completely calls that are not required, and so eliding the allocation that those calls fruitlessly make. Rather than making this change a commit in a pull request to convert to work lists, this optimised version is made a version release since it is a less risky change, while still improving performance and memory use characteristics. The changes that are made include: * using sprintf to construct strings * using the exists macro rather than the size of a filter result * retaining work rather than repeating it * reducing copying large values * using optional types to avoid ternaries * removing redundant calls: flatten and drop_empty on flat, all non-empty lists

elasticmachine · 2025-08-14T22:12:31Z

💚 Build Succeeded

Buildkite Build
Commit: f1b02a8

History

💔 Build #30049 failed 302508b
💚 Build #30013 succeeded 5d01c4b

cc @efd6

elastic-sonarqube · 2025-08-14T22:12:36Z

Quality Gate passed

Issues
0 New issues
0 Fixed issues
0 Accepted issues

Measures
0 Security Hotspots
100.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube

elastic-vault-github-plugin-prod · 2025-08-14T22:36:10Z

Package o365 - 2.23.0 containing this change is available at https://epr.elastic.co/package/o365/2.23.0/

…lastic#14932) This is the first step in making the integration safer to use in a memory-constrained setting. The current code does the full set of requests for each element of a set of subscription results, causing a heavy memory load. The intention is to make use of work lists to reduce the individual evaluation memory load when the CEL program is running. The existing code is relatively over-complex, so this reduces that complexity, while replacing heavily allocating operations with lighter calls, and omitting completely calls that are not required, and so eliding the allocation that those calls fruitlessly make. Rather than making this change a commit in a pull request to convert to work lists, this optimised version is made a version release since it is a less risky change, while still improving performance and memory use characteristics. The changes that are made include: * using sprintf to construct strings * using the exists macro rather than the size of a filter result * retaining work rather than repeating it * reducing copying large values * using optional types to avoid ternaries * removing redundant calls: flatten and drop_empty on flat, all non-empty lists

efd6 self-assigned this Aug 14, 2025

efd6 added enhancement New feature or request Integration:o365 Microsoft Office 365 Team:Security-Service Integrations Security Service Integrations team [elastic/security-service-integrations] labels Aug 14, 2025

efd6 force-pushed the 14735-o365-optimisations branch from 453cd24 to 5d01c4b Compare August 14, 2025 07:09

efd6 marked this pull request as ready for review August 14, 2025 08:04

efd6 requested a review from a team as a code owner August 14, 2025 08:04

efd6 requested a review from kcreddy August 14, 2025 08:04

kcreddy approved these changes Aug 14, 2025

View reviewed changes

chrisberkhout approved these changes Aug 14, 2025

View reviewed changes

efd6 added 2 commits August 15, 2025 07:17

address pr comment

f1b02a8

efd6 force-pushed the 14735-o365-optimisations branch from 302508b to f1b02a8 Compare August 14, 2025 21:47

efd6 merged commit 288e51b into elastic:main Aug 14, 2025
9 checks passed

o365: reduce code complexity, repeated work and garbage generation #14932

o365: reduce code complexity, repeated work and garbage generation #14932

Conversation

efd6 commented Aug 14, 2025

Proposed commit message

Checklist

Author's Checklist

How to test this PR locally

Related issues

Screenshots

Uh oh!

elastic-vault-github-plugin-prod bot commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🚀 Benchmarks report

Uh oh!

elasticmachine commented Aug 14, 2025

Uh oh!

kcreddy left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

efd6 Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chrisberkhout left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elasticmachine commented Aug 14, 2025

💚 Build Succeeded

History

Uh oh!

elastic-sonarqube bot commented Aug 14, 2025

Quality Gate passed

Uh oh!

Uh oh!

elastic-vault-github-plugin-prod bot commented Aug 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

elastic-vault-github-plugin-prod bot commented Aug 14, 2025 •

edited

Loading

efd6 Aug 14, 2025 •

edited

Loading