-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
feat(code-mappings): Add code mappings task to post process #40882
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
src/sentry/killswitches.py
Outdated
@@ -163,6 +163,17 @@ class KillswitchInfo: | |||
"project_id": "A project ID to filter events by.", | |||
}, | |||
), | |||
"post_process.derive-code-mappings": KillswitchInfo( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this functionality work? Is there more documentation for this file? Is there a UI associated to modify kills switches on the fly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see WOR-2359
from sentry.tasks.derive_code_mappings import derive_code_mappings | ||
|
||
try: | ||
event = job["event"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the stacktraces available at this point? if all stacktraces match a code mapping we will not need to schedule anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to get stacktraces here like this. How would we check that it matches a code mapping?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking this more. We should not put the logic here, otherwise, we hit the DB for every event.
cache_key = f"code-mappings:{project.id}" | ||
project_queued = cache.get(cache_key) | ||
if project_queued is None: | ||
cache.set(cache_key, True, 3600) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use the project_id as the controlling mechanism here, we will need to control fetching the trees per org on the other task at the org id level. Otherwise, two events from two projects will trigger get_trees_for_org
twice in the same hour. Now that I see this here I can look into adding it tomorrow morning or feel free to add it. Let's just coordinate it about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may make sense to add logging in case we want to debug the system:
logger.info(f"derive_code_mappings: Events from {project.id} will not have code derivation until {date_time_here}"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added logging, good point!
Let's talk about your other point tomorrow. Our worst case scenario here is an org with a large number of projects (N) that all get events each hour. I'm not sure how to prevent get_trees_for_org
being called N times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What could happen within an hour for events for the same org:
- Event A1 for project A
- We try to derive code and
get_trees_for_org
gets called
- We try to derive code and
- Event A2 for project A (cache prevents calling task)
- Event B1 for project B
- We try to derive code and
get_trees_for_org
gets called a 2nd time
- We try to derive code and
We could check the cache key here (or an extra one) to only allow one event processed per hour per org for now so we can go live.
I will work today on adding a way to track what is the current GH api limit and how to control when get_trees_for_org
gets called. I need to research on memcache and the potential for OOM since the trees for org object becomes quite large.
src/sentry/tasks/post_process.py
Outdated
{"project_id": project.id}, | ||
): | ||
return | ||
derive_code_mappings.delay(project.organization_id, project.id, event.data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer only sending event.data
so all the logic is derived in there (even though I know it feels that repeating code over there; I find it clearer since it saves looking for the code of the caller).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is alright since the only additional param is now the project ID. Otherwise, we would need to send the whole event to derive the project since the project_id
isn't contained in event.data
.
src/sentry/tasks/post_process.py
Outdated
f"derive_code_mappings: Events from {project.id} will not have code mapping derivation until {timezone.now() + timedelta(hours=1)}" | ||
) | ||
|
||
if project_queued or not features.has( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feature check will incorporate the option / killswitch added in https://github.com/getsentry/getsentry/pull/8777.
We don't need to add separate logic here, which is pretty neat!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we check this earlier? or only check once an hour?
It's probably good here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this brings up a good point. We should add the killswitch check in derive_code_mappings.py
as well. Even though it's a duplicate, we might want to stop all processing and right now, anything that's queued will still go through. I'll send a PR for that.
from sentry.tasks.derive_code_mappings import derive_code_mappings | ||
|
||
try: | ||
event = job["event"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking this more. We should not put the logic here, otherwise, we hit the DB for every event.
cache_key = f"code-mappings:{project.id}" | ||
project_queued = cache.get(cache_key) | ||
if project_queued is None: | ||
cache.set(cache_key, True, 3600) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What could happen within an hour for events for the same org:
- Event A1 for project A
- We try to derive code and
get_trees_for_org
gets called
- We try to derive code and
- Event A2 for project A (cache prevents calling task)
- Event B1 for project B
- We try to derive code and
get_trees_for_org
gets called a 2nd time
- We try to derive code and
We could check the cache key here (or an extra one) to only allow one event processed per hour per org for now so we can go live.
I will work today on adding a way to track what is the current GH api limit and how to control when get_trees_for_org
gets called. I need to research on memcache and the potential for OOM since the trees for org object becomes quite large.
src/sentry/tasks/post_process.py
Outdated
f"derive_code_mappings: Events from {project.id} will not have code mapping derivation until {timezone.now() + timedelta(hours=1)}" | ||
) | ||
|
||
if project_queued or not features.has( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we check this earlier? or only check once an hour?
It's probably good here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing the approval just temporarily until we make sure we have a plan to prevent the GH API rate limiting exhaustion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
Add a new task to post process to derive code mappings for a single event per project per hour.
FIXES WOR-2356