-
Notifications
You must be signed in to change notification settings - Fork 40
Introduce a Single Threaded Stack Trace Sampler #2341
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce a Single Threaded Stack Trace Sampler #2341
Conversation
…samples for all threads associated with traces that have been selected for snapshotting.
… of StackTrace instances.
…DaemonThreadStackTraceSampler.
…ging in PeriodicallyExportingStagingArea.
…eadStackTraceSampler.
.../src/main/java/com/splunk/opentelemetry/profiler/snapshot/DaemonThreadStackTraceSampler.java
Outdated
Show resolved
Hide resolved
} else if (command.action == Action.START) { | ||
startSampling(command, traceThreads); | ||
} else if (command.action == Action.STOP) { | ||
stopSampling(command, traceThreads); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scheduling the start and end events to a different thread means that you will not be able to accurately capture the start and end time of the trace. If you are unlucky then the trace taken for start/end could happen when thread is already servicing a different request. Would that be an issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is how I understand the the hypothetical:
- Trace 1 starts
- Thread 1 is requested to be sampled for Trace 1 (
SnapshotProfilingSpanProcessor
) - Thread 1 is associated with Trace 1:Span 1 (
ActiveSpanTraker
) - Trace 1 ends
- Thread 1 is unassociated with Trace1:Span 1 ('ActiveSpanTracker`)
- Thread 1 is requested to no longer be sampled (
SnapshotProfilingSpanProcessor
) - Thread 1 added for sampling (
PeriodicStackTraceSampler
) - Thread 1 is removed from sampling (
PeriodicStackTraceSampler
)
In the above a sample would be taken at both steps 7 and 8, at which point Thread 1 is doing whatever else (maybe idle, maybe processing a new request). Once the sample is taken a StackTrace
instance is created using Trace 1's trace id and either an invalid span id or a span id from the next trace.
This is a problem. I'll figure out how to model it in a test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@laurit, I think I've sorted out this scenario with test ensureStartAndStopSamplesAreAssociatedWithCorrectTraceAndSpanId
and doNotStageStackTraceWhenThreadNoLongerAssociatedWithSameTraceId
in PeriodicStackTraceSamplerTest
.
.../src/main/java/com/splunk/opentelemetry/profiler/snapshot/DaemonThreadStackTraceSampler.java
Outdated
Show resolved
Hide resolved
…PeriodicallyExportingStagingArea.
...iler/src/main/java/com/splunk/opentelemetry/profiler/snapshot/PeriodicStackTraceSampler.java
Show resolved
Hide resolved
...iler/src/main/java/com/splunk/opentelemetry/profiler/snapshot/PeriodicStackTraceSampler.java
Outdated
Show resolved
Hide resolved
...iler/src/main/java/com/splunk/opentelemetry/profiler/snapshot/PeriodicStackTraceSampler.java
Outdated
Show resolved
Hide resolved
...iler/src/main/java/com/splunk/opentelemetry/profiler/snapshot/PeriodicStackTraceSampler.java
Outdated
Show resolved
Hide resolved
String traceId, | ||
String spanId, | ||
long currentTimestamp) { | ||
Duration samplingPeriod = Duration.ofNanos(currentTimestamp - context.timestamp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the context timestamp is updated both from the periodic thread and from the remove
method you could probably end up with a negative period here. Imo you really shouldn't be using this for the source.event.period
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would you suggest as an alternative? Ultimately the source.event.period
is used as the stack frame execution time. The start and stop samples are very likely to be less than the periodic sampling period which keeps the total reported execution time relatively close to the reported span execution time.
I've switched the underlying field to an AtomicLong
and added a check to prevent timestamps from the past from overwriting newer ones. If I understand the interaction correctly that should work. If the two sampling threads are so close that they are racing to set the field then the difference is probably negligible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would you suggest as an alternative? Ultimately the source.event.period is used as the stack frame execution time.
In https://github.com/signalfx/gdi-specification/blob/main/specification/semantic_conventions.md#logrecord-message-text-data-format-specific-attributes the description for source.event.period
says
MUST contain the sampling period in milliseconds if this LogRecord represents a periodic event
now here we have a combination of periodic and non periodic activity which imo does not align with the existing description. Both source.event.period
and source.event.time
are with millisecond precision, is this enough for your use cases? Perhaps you should consider adding a new attribute. Or you could consider altering the description for the existing attribute.
I've switched the underlying field to an AtomicLong and added a check to prevent timestamps from the past from overwriting newer ones.
Lets assume that context timestamp is set to T0
. When stop
and periodic timer run at the same time they'll capture time as T1
and T2
respectively so that T1 < T2
. Now lets assume that the periodic timer completes first for some reason and updates the context timestamp. Now the sampling period for stop is T2 - T1
which is a negative value.
I believe you could resolve this with some locking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been able to reproduce this inconsistently with the existing test and indeed the sampling time is negative. What's interesting to is I've only been able to reproduce it in with the test when using a lock. Without locks (using the same code as in this diff) the test doesn't fail.
I have been able to force the scenario with breakpoints and manually advancing the two competing threads so it is possible.
I'm curious how you think a lock may be used here. I've experimented with a couple of solutions. I'd like to avoid locking the sampling mechanism; it doesn't seem ideal to hold up the periodic sampling thread because a new thread is being added but maybe that's the only way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious how you think a lock may be used here.
If you lock around taking the end sample and taking the periodic sample then they can't run concurrently and see inconsistent state. For locking you could use a ReentrantReadWriteLock
where end sample uses the shaded lock (read lock) and periodic sample use the exclusive lock (write lock). Or you could synchronize on the sampling context or add a lock into it if you wish to make it more fine grained. My understanding is that you need to ensure is that the periodic sample is finalized before the end sample is added or that the periodic sample is not added at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you wish to avoid locking I guess you could keep state (e.g sample time) in SamplingContext
. Before you take the sample you read the state and after you use cas to update it. If cas succeeds all is fine. If it does not when you are taking a periodic sample you can assume that end sample was already added and discard the periodic sample. If the cas fails for end sample you can retry taking the end sample (or recompute the sample period) or perhaps when the periodic sample is later than the end sample discard the end sample.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced we need both samples, and I don't think there is any preference as to which one to prioritize. If the two threads are racing to the point where one of the computed sampling periods is negative it MAY be OK to drop that sample. Both samples should be using the same previous sample timestamp so the "newer" sample would fully encompass the time range of the "older" sample.
...iler/src/main/java/com/splunk/opentelemetry/profiler/snapshot/PeriodicStackTraceSampler.java
Outdated
Show resolved
Hide resolved
…ce sampling thread.
…id a sampling race condition between the periodic sampling thread and the removal thread.
...iler/src/main/java/com/splunk/opentelemetry/profiler/snapshot/PeriodicStackTraceSampler.java
Outdated
Show resolved
Hide resolved
…th negative sampling periods from being staged.
This PR introduces a
StackTraceSampler
that utilizes a single thread for taking samples of all threads associated with trace ids that have been selected for snapshotting. This newStackTraceSampler
replaces the existingScheduledExecutorStackTraceSampler
implementation which used a 1:1 ratio of sampling thread to trace id approach.