Skip to content

feat(code-mappings): Add code mappings task to post process #40882

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Nov 3, 2022

Conversation

snigdhas
Copy link
Member

@snigdhas snigdhas commented Nov 2, 2022

Add a new task to post process to derive code mappings for a single event per project per hour.

FIXES WOR-2356

@snigdhas snigdhas changed the base branch from master to snigdha/post-process-code-mapping November 2, 2022 05:44
@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Nov 2, 2022
@snigdhas snigdhas requested review from armenzg and a team November 2, 2022 05:44
@snigdhas snigdhas marked this pull request as ready for review November 2, 2022 06:22
@@ -163,6 +163,17 @@ class KillswitchInfo:
"project_id": "A project ID to filter events by.",
},
),
"post_process.derive-code-mappings": KillswitchInfo(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this functionality work? Is there more documentation for this file? Is there a UI associated to modify kills switches on the fly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see WOR-2359

from sentry.tasks.derive_code_mappings import derive_code_mappings

try:
event = job["event"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the stacktraces available at this point? if all stacktraces match a code mapping we will not need to schedule anything.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to get stacktraces here like this. How would we check that it matches a code mapping?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking this more. We should not put the logic here, otherwise, we hit the DB for every event.

cache_key = f"code-mappings:{project.id}"
project_queued = cache.get(cache_key)
if project_queued is None:
cache.set(cache_key, True, 3600)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use the project_id as the controlling mechanism here, we will need to control fetching the trees per org on the other task at the org id level. Otherwise, two events from two projects will trigger get_trees_for_org twice in the same hour. Now that I see this here I can look into adding it tomorrow morning or feel free to add it. Let's just coordinate it about it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may make sense to add logging in case we want to debug the system:
logger.info(f"derive_code_mappings: Events from {project.id} will not have code derivation until {date_time_here}"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added logging, good point!

Let's talk about your other point tomorrow. Our worst case scenario here is an org with a large number of projects (N) that all get events each hour. I'm not sure how to prevent get_trees_for_org being called N times.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What could happen within an hour for events for the same org:

  • Event A1 for project A
    • We try to derive code and get_trees_for_org gets called
  • Event A2 for project A (cache prevents calling task)
  • Event B1 for project B
    • We try to derive code and get_trees_for_org gets called a 2nd time

We could check the cache key here (or an extra one) to only allow one event processed per hour per org for now so we can go live.

I will work today on adding a way to track what is the current GH api limit and how to control when get_trees_for_org gets called. I need to research on memcache and the potential for OOM since the trees for org object becomes quite large.

{"project_id": project.id},
):
return
derive_code_mappings.delay(project.organization_id, project.id, event.data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer only sending event.data so all the logic is derived in there (even though I know it feels that repeating code over there; I find it clearer since it saves looking for the code of the caller).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is alright since the only additional param is now the project ID. Otherwise, we would need to send the whole event to derive the project since the project_id isn't contained in event.data.

Base automatically changed from snigdha/post-process-code-mapping to master November 2, 2022 21:56
f"derive_code_mappings: Events from {project.id} will not have code mapping derivation until {timezone.now() + timedelta(hours=1)}"
)

if project_queued or not features.has(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feature check will incorporate the option / killswitch added in https://github.com/getsentry/getsentry/pull/8777.

We don't need to add separate logic here, which is pretty neat!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check this earlier? or only check once an hour?
It's probably good here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this brings up a good point. We should add the killswitch check in derive_code_mappings.py as well. Even though it's a duplicate, we might want to stop all processing and right now, anything that's queued will still go through. I'll send a PR for that.

from sentry.tasks.derive_code_mappings import derive_code_mappings

try:
event = job["event"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking this more. We should not put the logic here, otherwise, we hit the DB for every event.

cache_key = f"code-mappings:{project.id}"
project_queued = cache.get(cache_key)
if project_queued is None:
cache.set(cache_key, True, 3600)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What could happen within an hour for events for the same org:

  • Event A1 for project A
    • We try to derive code and get_trees_for_org gets called
  • Event A2 for project A (cache prevents calling task)
  • Event B1 for project B
    • We try to derive code and get_trees_for_org gets called a 2nd time

We could check the cache key here (or an extra one) to only allow one event processed per hour per org for now so we can go live.

I will work today on adding a way to track what is the current GH api limit and how to control when get_trees_for_org gets called. I need to research on memcache and the potential for OOM since the trees for org object becomes quite large.

f"derive_code_mappings: Events from {project.id} will not have code mapping derivation until {timezone.now() + timedelta(hours=1)}"
)

if project_queued or not features.has(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check this earlier? or only check once an hour?
It's probably good here.

@armenzg armenzg self-requested a review November 3, 2022 16:36
Copy link
Member

@armenzg armenzg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the approval just temporarily until we make sure we have a plan to prevent the GH API rate limiting exhaustion.

Copy link
Member

@armenzg armenzg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@snigdhas snigdhas merged commit 281e26b into master Nov 3, 2022
@snigdhas snigdhas deleted the snigdha/post-process-new branch November 3, 2022 19:51
@github-actions github-actions bot locked and limited conversation to collaborators Nov 25, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Scope: Backend Automatically applied to PRs that change backend components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants