-
Notifications
You must be signed in to change notification settings - Fork 467
fix(profiling): track running asyncio loop if it exists #15120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix(profiling): track running asyncio loop if it exists #15120
Conversation
|
|
c464371 to
c22f6d7
Compare
Bootstrap import analysisComparison of import times between this PR and base. SummaryThe average import time from this PR is: 253 ± 3 ms. The average import time from base is: 239 ± 3 ms. The import time difference between this PR and base is: 13.9 ± 0.1 ms. Import time breakdownThe following import paths have grown:
|
Performance SLOsComparing candidate kowalski/bug-profiling-track-existing-asyncio-loop-if-it-exists (fc60cfd) with baseline main (2555353) 🟡 Near SLO Breach (6 suites)🟡 djangosimple - 30/30✅ appsecTime: ✅ 20.405ms (SLO: <22.300ms -8.5%) vs baseline: ~same Memory: ✅ 66.168MB (SLO: <67.000MB 🟡 -1.2%) vs baseline: +5.0% ✅ exception-replay-enabledTime: ✅ 1.343ms (SLO: <1.450ms -7.4%) vs baseline: +0.5% Memory: ✅ 64.282MB (SLO: <67.000MB -4.1%) vs baseline: +4.8% ✅ iastTime: ✅ 20.440ms (SLO: <22.250ms -8.1%) vs baseline: -0.4% Memory: ✅ 66.218MB (SLO: <67.000MB 🟡 -1.2%) vs baseline: +5.0% ✅ profilerTime: ✅ 15.501ms (SLO: <16.550ms -6.3%) vs baseline: +0.4% Memory: ✅ 54.089MB (SLO: <54.500MB 🟡 -0.8%) vs baseline: +5.1% ✅ resource-renamingTime: ✅ 20.551ms (SLO: <21.750ms -5.5%) vs baseline: +0.3% Memory: ✅ 66.296MB (SLO: <67.000MB 🟡 -1.1%) vs baseline: +5.1% ✅ span-code-originTime: ✅ 25.330ms (SLO: <28.200ms 📉 -10.2%) vs baseline: -0.4% Memory: ✅ 68.119MB (SLO: <69.500MB 🟡 -2.0%) vs baseline: +6.3% ✅ tracerTime: ✅ 20.375ms (SLO: <21.750ms -6.3%) vs baseline: -0.7% Memory: ✅ 66.129MB (SLO: <67.000MB 🟡 -1.3%) vs baseline: +4.9% ✅ tracer-and-profilerTime: ✅ 22.746ms (SLO: <23.500ms -3.2%) vs baseline: +0.4% Memory: ✅ 67.731MB (SLO: <68.000MB 🟡 -0.4%) vs baseline: +4.6% ✅ tracer-dont-create-db-spansTime: ✅ 19.331ms (SLO: <21.500ms 📉 -10.1%) vs baseline: ~same Memory: ✅ 66.188MB (SLO: <67.000MB 🟡 -1.2%) vs baseline: +5.0% ✅ tracer-minimalTime: ✅ 16.650ms (SLO: <17.500ms -4.9%) vs baseline: +0.2% Memory: ✅ 66.178MB (SLO: <67.000MB 🟡 -1.2%) vs baseline: +5.0% ✅ tracer-nativeTime: ✅ 20.467ms (SLO: <21.750ms -5.9%) vs baseline: ~same Memory: ✅ 67.725MB (SLO: <72.500MB -6.6%) vs baseline: +4.8% ✅ tracer-no-cachesTime: ✅ 18.466ms (SLO: <19.650ms -6.0%) vs baseline: +0.4% Memory: ✅ 66.188MB (SLO: <67.000MB 🟡 -1.2%) vs baseline: +5.0% ✅ tracer-no-databasesTime: ✅ 18.801ms (SLO: <20.100ms -6.5%) vs baseline: +0.3% Memory: ✅ 65.856MB (SLO: <67.000MB 🟡 -1.7%) vs baseline: +4.5% ✅ tracer-no-middlewareTime: ✅ 20.169ms (SLO: <21.500ms -6.2%) vs baseline: +0.5% Memory: ✅ 66.257MB (SLO: <67.000MB 🟡 -1.1%) vs baseline: +5.2% ✅ tracer-no-templatesTime: ✅ 20.329ms (SLO: <22.000ms -7.6%) vs baseline: ~same Memory: ✅ 66.257MB (SLO: <67.000MB 🟡 -1.1%) vs baseline: +5.3% 🟡 errortrackingdjangosimple - 6/6✅ errortracking-enabled-allTime: ✅ 18.296ms (SLO: <19.850ms -7.8%) vs baseline: +1.3% Memory: ✅ 66.060MB (SLO: <66.500MB 🟡 -0.7%) vs baseline: +4.7% ✅ errortracking-enabled-userTime: ✅ 18.026ms (SLO: <19.400ms -7.1%) vs baseline: +0.3% Memory: ✅ 66.159MB (SLO: <66.500MB 🟡 -0.5%) vs baseline: +4.9% ✅ tracer-enabledTime: ✅ 18.072ms (SLO: <19.450ms -7.1%) vs baseline: +0.5% Memory: ✅ 65.844MB (SLO: <66.500MB 🟡 -1.0%) vs baseline: +4.7% 🟡 errortrackingflasksqli - 6/6✅ errortracking-enabled-allTime: ✅ 2.062ms (SLO: <2.300ms 📉 -10.4%) vs baseline: -0.5% Memory: ✅ 52.612MB (SLO: <53.500MB 🟡 -1.7%) vs baseline: +5.1% ✅ errortracking-enabled-userTime: ✅ 2.080ms (SLO: <2.250ms -7.6%) vs baseline: +0.4% Memory: ✅ 52.652MB (SLO: <53.500MB 🟡 -1.6%) vs baseline: +4.9% ✅ tracer-enabledTime: ✅ 2.069ms (SLO: <2.300ms 📉 -10.0%) vs baseline: -0.4% Memory: ✅ 52.671MB (SLO: <53.500MB 🟡 -1.5%) vs baseline: +4.8% 🟡 flasksimple - 18/18✅ appsec-getTime: ✅ 4.602ms (SLO: <4.750ms -3.1%) vs baseline: +0.6% Memory: ✅ 62.417MB (SLO: <65.000MB -4.0%) vs baseline: +4.9% ✅ appsec-postTime: ✅ 6.634ms (SLO: <6.750ms 🟡 -1.7%) vs baseline: +0.1% Memory: ✅ 62.387MB (SLO: <65.000MB -4.0%) vs baseline: +4.9% ✅ appsec-telemetryTime: ✅ 4.603ms (SLO: <4.750ms -3.1%) vs baseline: +0.3% Memory: ✅ 62.283MB (SLO: <65.000MB -4.2%) vs baseline: +4.7% ✅ debuggerTime: ✅ 1.859ms (SLO: <2.000ms -7.0%) vs baseline: +0.2% Memory: ✅ 45.355MB (SLO: <47.000MB -3.5%) vs baseline: +5.1% ✅ iast-getTime: ✅ 1.856ms (SLO: <2.000ms -7.2%) vs baseline: ~same Memory: ✅ 42.239MB (SLO: <49.000MB 📉 -13.8%) vs baseline: +4.8% ✅ profilerTime: ✅ 1.911ms (SLO: <2.100ms -9.0%) vs baseline: +0.2% Memory: ✅ 46.751MB (SLO: <47.000MB 🟡 -0.5%) vs baseline: +5.1% ✅ resource-renamingTime: ✅ 3.369ms (SLO: <3.650ms -7.7%) vs baseline: +0.4% Memory: ✅ 52.640MB (SLO: <53.500MB 🟡 -1.6%) vs baseline: +5.0% ✅ tracerTime: ✅ 3.350ms (SLO: <3.650ms -8.2%) vs baseline: ~same Memory: ✅ 52.549MB (SLO: <53.500MB 🟡 -1.8%) vs baseline: +4.9% ✅ tracer-nativeTime: ✅ 3.354ms (SLO: <3.650ms -8.1%) vs baseline: -0.3% Memory: ✅ 54.085MB (SLO: <60.000MB -9.9%) vs baseline: +4.8% 🟡 flasksqli - 6/6✅ appsec-enabledTime: ✅ 3.979ms (SLO: <4.200ms -5.3%) vs baseline: +0.3% Memory: ✅ 62.305MB (SLO: <66.000MB -5.6%) vs baseline: +4.8% ✅ iast-enabledTime: ✅ 2.453ms (SLO: <2.800ms 📉 -12.4%) vs baseline: +0.8% Memory: ✅ 59.179MB (SLO: <60.000MB 🟡 -1.4%) vs baseline: +4.8% ✅ tracer-enabledTime: ✅ 2.059ms (SLO: <2.250ms -8.5%) vs baseline: -0.6% Memory: ✅ 52.632MB (SLO: <54.500MB -3.4%) vs baseline: +4.9% 🟡 recursivecomputation - 8/8✅ deepTime: ✅ 309.428ms (SLO: <320.950ms -3.6%) vs baseline: +0.2% Memory: ✅ 32.755MB (SLO: <34.500MB -5.1%) vs baseline: +4.9% ✅ deep-profiledTime: ✅ 328.033ms (SLO: <359.150ms -8.7%) vs baseline: -0.2% Memory: ✅ 38.641MB (SLO: <39.000MB 🟡 -0.9%) vs baseline: +5.5% ✅ mediumTime: ✅ 7.054ms (SLO: <7.400ms -4.7%) vs baseline: +0.6% Memory: ✅ 32.008MB (SLO: <34.000MB -5.9%) vs baseline: +5.1% ✅ shallowTime: ✅ 0.947ms (SLO: <1.050ms -9.9%) vs baseline: +0.3% Memory: ✅ 31.949MB (SLO: <34.000MB -6.0%) vs baseline: +4.9%
|
c22f6d7 to
4b6068d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know you've already checked that this works on some example app or service, but would it be too hard to add a regression test for this using the sample code you put together?
Oh yes, good point. I was planning to add one but then I got carried away and forgot. Will do 🙏 |
9e60d80 to
f8a44eb
Compare
|
Okay I wasn't able to run my test locally yet and it's failing in the CI. I'll fix and merge tomorrow! 🙇 |
f8a44eb to
1324408
Compare
7b0f9ab to
57ec9b2
Compare
57ec9b2 to
5558205
Compare
5558205 to
234570a
Compare
234570a to
fc60cfd
Compare
Description
https://datadoghq.atlassian.net/browse/PROF-12842
This PR updates the wrapping and thread-registering logic for the Profiler in order to track the running loop when it exists.
This is needed because otherwise, importing/starting the Profiler after starting a Task (or a loop more generally) will make us blind to the existing running loop.
Currently, we
wraptheasyncio.set_event_loopfunction to capture when the Event Loop is first set (or is swapped). However, if the_asynciomodule that sets up wrapping is imported/executed after the loop has been set, we will miss that first call toset_event_loopand be blind toasyncioTasks until the Event Loop is changed (which in many cases never happens).Note that we also need to execute the "find loop and track it" logic when we start the Profiler generally speaking, as in this case we may have tried (earlier) to call
track_event_loopbut that would have failed as no thread was registered in the Profiler.I added four tests that account for various edge cases. Unfortunately, currently, two of them fail (marked them as
xfail) and there is no way to correctly fix them. The issue is that we can only get the current running loop and not the current (non-running) event loop.In other words, if an event loop is created and set in
asyncio, and immediately after the Profiler is started without a Task having first been started, we will not be able to see that loop from the initialisation code and we will thus not be able to observe it from the Profiler thread.In short, what works is the most common case:
It is OK to start with that as I really consider the latter two to be edge cases.
Example: today we miss all
asynciodata with the following codeTesting
I have tested this in
prof-correctness(initially just replicated that it did not work) and it now works as expected. I will be adding more correctness tests, one with a "top of file" import and Profiler start, one with a "top of file import" and "in-code Profiler start", and one with both an "in-code file import" and "in-code Profiler start".I also added four new tests to make sure we catch different edge cases with order of imports and order of task/profiler starts. Currently, two of them are marked as
XFAILEDbecause there is no way to reliably make them pass.