-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
ref(aci): use enqueue time to query Snuba in delayed workflow #93882
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
🔍 Existing Issues For ReviewYour pull request is modifying functions with the following pre-existing issues: 📄 File: src/sentry/workflow_engine/processors/workflow.py
Did you find this useful? React with a 👍 or 👎 |
Use the latest timestamp for a set of group IDs with the same Snuba query. | ||
We will query backwards in time from this point. | ||
""" | ||
if self.timestamp is None or (timestamp is not None and timestamp > self.timestamp): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if timestamp is not None:
self.timestamp = timestamp if self.timestamp is None else max(timestamp, self.timestamp)
perhaps.
|
||
|
||
@dataclass | ||
class TimeAndGroups: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is BulkQueryParameters is more accurate name? Or perhaps GroupQueryParameters
? There's a distinction between the unique queries and what we're associating them with that I'm not sure I'm capturing accurately, but TimeAndGroups
strikes me as a bit too literal.
def dcg_to_timestamp(self) -> dict[int, datetime | None]: | ||
""" | ||
Uses the latest timestamp each DataConditionGroup was enqueued with | ||
All groups enqueued for a DataConditionGroup will have the same query, hence the same max timestamp. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like I should know what this means, but I don't.
handler = unique_condition.handler() | ||
group_ids = time_and_groups.group_ids |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another option is to make groups
be a dict[GroupId, datetime | None]
and
do time = max(ts for ts in groups.values() if ts, default=current_time)
.
Stores a bit more data, but lets the summarizing happen where it is being forced, which has a certain appeal.
@@ -75,6 +77,7 @@ class DelayedWorkflowItem: | |||
delayed_conditions: list[DataCondition] | |||
event: GroupEvent | |||
source: WorkflowDataConditionGroupType | |||
timestamp: datetime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
worth explaining what this timestamp is and what it should correspond to. Or, rename the field to make the comment pointless.
{ | ||
"event_id": self.event.event_id, | ||
"occurrence_id": self.event.occurrence_id, | ||
"timestamp": self.timestamp, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL we can dumps
a datetime.
@@ -79,6 +79,7 @@ | |||
class EventInstance(BaseModel): | |||
event_id: str | |||
occurrence_id: str | None = None | |||
timestamp: datetime | None = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should probably add a test that requires us to parse this value correctly from the expected format. I strongly suspect we don't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need test to verify we can parse it and pydantic won't freak out.
Codecov ReportAttention: Patch coverage is ✅ All tests successful. No failed tests found.
Additional details and impacted files@@ Coverage Diff @@
## master #93882 +/- ##
========================================
Coverage 88.03% 88.04%
========================================
Files 10337 10337
Lines 596922 597052 +130
Branches 23196 23196
========================================
+ Hits 525525 525647 +122
- Misses 70938 70946 +8
Partials 459 459 |
For a slow alert condition (e.g. "> 10 events in the past 1 minute"), we always use the time the query is being made as the time to start looking backwards from. This is not optimal in the case where:
A better solution would be to use the time the event is enqueued into the delayed workflow buffer!
However, we also batch queries such that all alerts that end up making the same queries are grouped together, and we run a single query for multiple groups. For example:
In the above case, we'll need to decide which enqueue time to use. We should use the latest enqueue time out of a list of groups as it will cover all groups being queried for in the Snuba query.