-
Notifications
You must be signed in to change notification settings - Fork 467
feat(profiling): add sample count to internal payload #15177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(profiling): add sample count to internal payload #15177
Conversation
|
|
Bootstrap import analysisComparison of import times between this PR and base. SummaryThe average import time from this PR is: 207 ± 3 ms. The average import time from base is: 212 ± 3 ms. The import time difference between this PR and base is: -4.9 ± 0.1 ms. Import time breakdownThe following import paths have shrunk:
|
Performance SLOsComparing candidate kowalski/feat-profiling-add-sample-count-to-internal-payload (59704b9) with baseline main (b7a2d2a) 🟡 Near SLO Breach (5 suites)🟡 djangosimple - 30/30✅ appsecTime: ✅ 20.476ms (SLO: <22.300ms -8.2%) vs baseline: -0.2% Memory: ✅ 66.218MB (SLO: <67.000MB 🟡 -1.2%) vs baseline: +5.0% ✅ exception-replay-enabledTime: ✅ 1.344ms (SLO: <1.450ms -7.3%) vs baseline: +0.4% Memory: ✅ 64.311MB (SLO: <67.000MB -4.0%) vs baseline: +5.0% ✅ iastTime: ✅ 20.497ms (SLO: <22.250ms -7.9%) vs baseline: +0.2% Memory: ✅ 66.090MB (SLO: <67.000MB 🟡 -1.4%) vs baseline: +5.0% ✅ profilerTime: ✅ 15.553ms (SLO: <16.550ms -6.0%) vs baseline: ~same Memory: ✅ 53.950MB (SLO: <54.500MB 🟡 -1.0%) vs baseline: +4.9% ✅ resource-renamingTime: ✅ 20.536ms (SLO: <21.750ms -5.6%) vs baseline: +0.3% Memory: ✅ 66.188MB (SLO: <67.000MB 🟡 -1.2%) vs baseline: +4.8% ✅ span-code-originTime: ✅ 25.474ms (SLO: <28.200ms -9.7%) vs baseline: +0.4% Memory: ✅ 67.250MB (SLO: <69.500MB -3.2%) vs baseline: +5.2% ✅ tracerTime: ✅ 20.506ms (SLO: <21.750ms -5.7%) vs baseline: ~same Memory: ✅ 66.080MB (SLO: <67.000MB 🟡 -1.4%) vs baseline: +4.9% ✅ tracer-and-profilerTime: ✅ 22.728ms (SLO: <23.500ms -3.3%) vs baseline: ~same Memory: ✅ 67.898MB (SLO: <68.000MB 🟡 -0.1%) vs baseline: +4.9% ✅ tracer-dont-create-db-spansTime: ✅ 19.261ms (SLO: <21.500ms 📉 -10.4%) vs baseline: -0.2% Memory: ✅ 66.190MB (SLO: <67.000MB 🟡 -1.2%) vs baseline: +5.2% ✅ tracer-minimalTime: ✅ 16.628ms (SLO: <17.500ms -5.0%) vs baseline: +0.2% Memory: ✅ 66.077MB (SLO: <67.000MB 🟡 -1.4%) vs baseline: +4.9% ✅ tracer-nativeTime: ✅ 20.450ms (SLO: <21.750ms -6.0%) vs baseline: ~same Memory: ✅ 67.751MB (SLO: <72.500MB -6.6%) vs baseline: +4.8% ✅ tracer-no-cachesTime: ✅ 18.440ms (SLO: <19.650ms -6.2%) vs baseline: -0.3% Memory: ✅ 66.208MB (SLO: <67.000MB 🟡 -1.2%) vs baseline: +5.1% ✅ tracer-no-databasesTime: ✅ 18.745ms (SLO: <20.100ms -6.7%) vs baseline: ~same Memory: ✅ 65.852MB (SLO: <67.000MB 🟡 -1.7%) vs baseline: +4.6% ✅ tracer-no-middlewareTime: ✅ 20.122ms (SLO: <21.500ms -6.4%) vs baseline: -0.5% Memory: ✅ 66.139MB (SLO: <67.000MB 🟡 -1.3%) vs baseline: +5.0% ✅ tracer-no-templatesTime: ✅ 20.236ms (SLO: <22.000ms -8.0%) vs baseline: -0.5% Memory: ✅ 66.178MB (SLO: <67.000MB 🟡 -1.2%) vs baseline: +5.1% 🟡 errortrackingdjangosimple - 6/6✅ errortracking-enabled-allTime: ✅ 18.021ms (SLO: <19.850ms -9.2%) vs baseline: -0.3% Memory: ✅ 66.198MB (SLO: <66.500MB 🟡 -0.5%) vs baseline: +5.1% ✅ errortracking-enabled-userTime: ✅ 18.033ms (SLO: <19.400ms -7.0%) vs baseline: -0.2% Memory: ✅ 66.198MB (SLO: <66.500MB 🟡 -0.5%) vs baseline: +5.0% ✅ tracer-enabledTime: ✅ 18.024ms (SLO: <19.450ms -7.3%) vs baseline: ~same Memory: ✅ 66.001MB (SLO: <66.500MB 🟡 -0.7%) vs baseline: +5.1% 🟡 errortrackingflasksqli - 6/6✅ errortracking-enabled-allTime: ✅ 2.076ms (SLO: <2.300ms -9.7%) vs baseline: ~same Memory: ✅ 52.652MB (SLO: <53.500MB 🟡 -1.6%) vs baseline: +4.9% ✅ errortracking-enabled-userTime: ✅ 2.093ms (SLO: <2.250ms -7.0%) vs baseline: +1.4% Memory: ✅ 52.652MB (SLO: <53.500MB 🟡 -1.6%) vs baseline: +4.9% ✅ tracer-enabledTime: ✅ 2.110ms (SLO: <2.300ms -8.3%) vs baseline: +2.5% Memory: ✅ 52.671MB (SLO: <53.500MB 🟡 -1.5%) vs baseline: +5.2% 🟡 flasksimple - 18/18✅ appsec-getTime: ✅ 4.601ms (SLO: <4.750ms -3.1%) vs baseline: +0.2% Memory: ✅ 62.394MB (SLO: <65.000MB -4.0%) vs baseline: +4.9% ✅ appsec-postTime: ✅ 6.645ms (SLO: <6.750ms 🟡 -1.5%) vs baseline: ~same Memory: ✅ 62.454MB (SLO: <65.000MB -3.9%) vs baseline: +5.0% ✅ appsec-telemetryTime: ✅ 4.592ms (SLO: <4.750ms -3.3%) vs baseline: ~same Memory: ✅ 62.376MB (SLO: <65.000MB -4.0%) vs baseline: +4.8% ✅ debuggerTime: ✅ 1.857ms (SLO: <2.000ms -7.2%) vs baseline: -0.3% Memory: ✅ 45.286MB (SLO: <47.000MB -3.6%) vs baseline: +4.6% ✅ iast-getTime: ✅ 1.856ms (SLO: <2.000ms -7.2%) vs baseline: +0.2% Memory: ✅ 42.153MB (SLO: <49.000MB 📉 -14.0%) vs baseline: +4.9% ✅ profilerTime: ✅ 1.917ms (SLO: <2.100ms -8.7%) vs baseline: ~same Memory: ✅ 46.797MB (SLO: <47.000MB 🟡 -0.4%) vs baseline: +5.1% ✅ resource-renamingTime: ✅ 3.368ms (SLO: <3.650ms -7.7%) vs baseline: +0.1% Memory: ✅ 52.598MB (SLO: <53.500MB 🟡 -1.7%) vs baseline: +4.9% ✅ tracerTime: ✅ 3.353ms (SLO: <3.650ms -8.1%) vs baseline: -0.3% Memory: ✅ 52.581MB (SLO: <53.500MB 🟡 -1.7%) vs baseline: +4.8% ✅ tracer-nativeTime: ✅ 3.349ms (SLO: <3.650ms -8.3%) vs baseline: ~same Memory: ✅ 54.192MB (SLO: <60.000MB -9.7%) vs baseline: +4.9% 🟡 flasksqli - 6/6✅ appsec-enabledTime: ✅ 3.965ms (SLO: <4.200ms -5.6%) vs baseline: -0.3% Memory: ✅ 62.403MB (SLO: <66.000MB -5.4%) vs baseline: +5.0% ✅ iast-enabledTime: ✅ 2.445ms (SLO: <2.800ms 📉 -12.7%) vs baseline: +0.3% Memory: ✅ 59.218MB (SLO: <60.000MB 🟡 -1.3%) vs baseline: +4.8% ✅ tracer-enabledTime: ✅ 2.062ms (SLO: <2.250ms -8.4%) vs baseline: +0.2% Memory: ✅ 52.691MB (SLO: <54.500MB -3.3%) vs baseline: +4.9%
|
|
|
||
| namespace Datadog { | ||
|
|
||
| class ProfilerState |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ProfilerState is strange something that has stats. did you mean stats ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely meant Stats, yes. Thanks, I'll fix that!
| class ProfilerState | ||
| { | ||
| private: | ||
| static inline size_t sample_count = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the sample count static ?
Is this guaranteed to be on a single thread ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I know, it is guaranteed to be on a single thread. I meant to double check at some point, but just wanted "validation" that I wasn't doing terribly stupid design-wise / regarding how to make the data flow from one place to the other.
| ProfilerState() = default; | ||
| ~ProfilerState() = default; | ||
|
|
||
| static void increment_sample_count(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the aim was to have something stateless.
NIT: I'd assume the API would have been increment(k_sample_count);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think of that, but I'm unsure whether there are cases where we'd want to increment by more than one – after all we should be calling this every time we sample, and I don't know why we'd want to increment by more than one in that context 🤔
Description
This PR adds data to the
internalpayload we send tolibdatadog. In thisinternalpayload, we can push custom metrics (that are not exposed to customers) but that we can access for analytics.For the time being, I only added the number of sampling events for the current profile (note: this is different from the number of Samples we send in the Profile, as we can have at least one Sample object per Thread, but we only count one sampling event every time we sample all the Threads) but we may want to add more in the short term (e.g. number of times adaptive sampling was adjusted, history of adaptive sampling intervals, etc.)
Open questions:
This is an example of querying through the Events UI.