Skip to content

Conversation

@efd6
Copy link
Contributor

@efd6 efd6 commented Aug 14, 2025

Proposed commit message

o365: reduce code complexity, repeated work and garbage generation

This is the first step in making the integration safer to use in a
memory-constrained setting. The current code does the full set of
requests for each element of a set of subscription results, causing a
heavy memory load. The intention is to make use of work lists to reduce
the individual evaluation memory load when the CEL program is running.
The existing code is relatively over-complex, so this reduces that
complexity, while replacing heavily allocating operations with lighter
calls, and omitting completely calls that are not required, and so
eliding the allocation that those calls fruitlessly make.

Rather than making this change a commit in a pull request to convert to
work lists, this optimised version is made a version release since it is
a less risky change, while still improving performance and memory use
characteristics.

The changes that are made include:

* using sprintf to construct strings
* using the exists macro rather than the size of a filter result
* retaining work rather than repeating it
* reducing copying large values
* using optional types to avoid ternaries
* removing redundant calls: flatten and drop_empty on flat, all
  non-empty lists

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.
  • I have verified that any added dashboard complies with Kibana's Dashboard good practices

Author's Checklist

  • [ ]

How to test this PR locally

Related issues

Screenshots

@efd6 efd6 self-assigned this Aug 14, 2025
@efd6 efd6 added enhancement New feature or request Integration:o365 Microsoft Office 365 Team:Security-Service Integrations Security Service Integrations team [elastic/security-service-integrations] labels Aug 14, 2025
@efd6 efd6 force-pushed the 14735-o365-optimisations branch from 453cd24 to 5d01c4b Compare August 14, 2025 07:09
@elastic-vault-github-plugin-prod
Copy link

elastic-vault-github-plugin-prod bot commented Aug 14, 2025

🚀 Benchmarks report

To see the full report comment with /test benchmark fullreport

@efd6 efd6 marked this pull request as ready for review August 14, 2025 08:04
@efd6 efd6 requested a review from a team as a code owner August 14, 2025 08:04
@elasticmachine
Copy link

Pinging @elastic/security-service-integrations (Team:Security-Service Integrations)

@efd6 efd6 requested a review from kcreddy August 14, 2025 08:04
Copy link
Contributor

@kcreddy kcreddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall. Just minor clarifications.

list_resp.Body.decode_json().as(list_body,
(
type(list_body) == list && list_body[?0].optMap(e,
e.contentUri != "" && e.contentCreated != ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does contentUri and contentCreated needs to be checked for != null as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. It looks like it based on the code. I'm not sure if they keys will always be there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

null != absence; does O365 use null to indicate absence? Depending on whether they do, the implementation would be different. I'm going to assume that they don't since the previous code does not tolerate them doing so.

This then becomes

                type(list_body) == list && list_body[?0].as(e,
                  e.contentUri.orValue("") != "" && e.contentCreated.orValue("") != ""
                )

Comment on lines +217 to +221
"content_created_at": (list_resp.StatusCode == 200 && (r_query.?endTime.optMap(t, t.size() > 0)).orValue(false)) ?
(r_query.endTime[0])
:
r_query.?startTime[0].orValue(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't r_query.?startTime[0].orValue( also have a condition on .size() > 0 just like endTime has (r_query.?endTime.optMap(t, t.size() > 0)).orValue(false)) ?

Just curious why we handle the same previous size conditions differently here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all the checking is redundant. The API requires either no times (and returns a default time range like last 15 mins or something) or it needs both startTime and endTime. I'm pretty sure we always provide the times.

Copy link
Contributor Author

@efd6 efd6 Aug 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is sort of clever, so I was a little reluctant to use it, but the syntax saving I think is worth it. If I missed this elsewhere, I'll fix it.

To clarify the behaviour here, the index into startTime is against the value as an optional type, so the index is allowed event if it would fail, if it's zero-length it would fail, returning an optional.none(), so we don't need to check that the length is safe (the expression here is returning a string and the types agree). In the case of the endTime, we have the same thing, but the types don't agree, on existence, we are a string and on absence we are a bool. This is why we need the comparison. I had briefly thought I'd missed some, but all the "missed" cases are the same shape and so cannot be simplified.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've gone over the code with Chris' comment in mind, and I believe that is true. Do we want to remove all the checks? While I agree, I'd like to keep the risk level down on this change, so I will not remove them now.

Copy link
Contributor

@chrisberkhout chrisberkhout left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice refactoring.

I agree with Krishna about the null check probably being needed.

Other points are some minor potential improvements.

code + " " + message
)
)
).orValue(sub_body),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to add the content type prefix for the sub_body case. I forgot that last time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICS that's already there; there is content_type + before the total expression here, which includes this value

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. Actually it was right in the previous version too.

I'd seen logs where the combined error messages look like this:

"The subscription was disabled.; Audit.Exchange: StartSubscription IDS AND STACK TRACE...; The subscription was disabled.; DLP.All: StartSubscription IDS AND STACK TRACE..."

It looked like the "The subscription was disabled" part was missing the content type prefix, but now I realize it's not coming from a subscription request. It's there because one of the data requests resulted in an event that coincidentally has the { "error": { "message": ... } structure.

)
).do_request().as(sub_resp,
sub_resp.Body.decode_json().as(sub_body,
(sub_body.?status == optional.of("enabled") || sub_body.?error.code == optional.of("AF20024")) ?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice.

Comment on lines +217 to +221
"content_created_at": (list_resp.StatusCode == 200 && (r_query.?endTime.optMap(t, t.size() > 0)).orValue(false)) ?
(r_query.endTime[0])
:
r_query.?startTime[0].orValue(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all the checking is redundant. The API requires either no times (and returns a default time range like last 15 mins or something) or it needs both startTime and endTime. I'm pretty sure we always provide the times.

list_resp.Body.decode_json().as(list_body,
(
type(list_body) == list && list_body[?0].optMap(e,
e.contentUri != "" && e.contentCreated != ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. It looks like it based on the code. I'm not sure if they keys will always be there.

efd6 added 2 commits August 15, 2025 07:17
This is the first step in making the integration safer to use in a
memory-constrained setting. The current code does the full set of
requests for each element of a set of subscription results, causing a
heavy memory load. The intention is to make use of work lists to reduce
the individual evaluation memory load when the CEL program is running.
The existing code is relatively over-complex, so this reduces that
complexity, while replacing heavily allocating operations with lighter
calls, and omitting completely calls that are not required, and so
eliding the allocation that those calls fruitlessly make.

Rather than making this change a commit in a pull request to convert to
work lists, this optimised version is made a version release since it is
a less risky change, while still improving performance and memory use
characteristics.

The changes that are made include:

* using sprintf to construct strings
* using the exists macro rather than the size of a filter result
* retaining work rather than repeating it
* reducing copying large values
* using optional types to avoid ternaries
* removing redundant calls: flatten and drop_empty on flat, all
  non-empty lists
@efd6 efd6 force-pushed the 14735-o365-optimisations branch from 302508b to f1b02a8 Compare August 14, 2025 21:47
@elasticmachine
Copy link

💚 Build Succeeded

History

cc @efd6

@elastic-sonarqube
Copy link

@efd6 efd6 merged commit 288e51b into elastic:main Aug 14, 2025
9 checks passed
@elastic-vault-github-plugin-prod

Package o365 - 2.23.0 containing this change is available at https://epr.elastic.co/package/o365/2.23.0/

tehbooom pushed a commit to tehbooom/integrations that referenced this pull request Nov 19, 2025
…lastic#14932)

This is the first step in making the integration safer to use in a
memory-constrained setting. The current code does the full set of
requests for each element of a set of subscription results, causing a
heavy memory load. The intention is to make use of work lists to reduce
the individual evaluation memory load when the CEL program is running.
The existing code is relatively over-complex, so this reduces that
complexity, while replacing heavily allocating operations with lighter
calls, and omitting completely calls that are not required, and so
eliding the allocation that those calls fruitlessly make.

Rather than making this change a commit in a pull request to convert to
work lists, this optimised version is made a version release since it is
a less risky change, while still improving performance and memory use
characteristics.

The changes that are made include:

* using sprintf to construct strings
* using the exists macro rather than the size of a filter result
* retaining work rather than repeating it
* reducing copying large values
* using optional types to avoid ternaries
* removing redundant calls: flatten and drop_empty on flat, all
  non-empty lists
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request Integration:o365 Microsoft Office 365 Team:Security-Service Integrations Security Service Integrations team [elastic/security-service-integrations]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants