feat(code-mappings): Add code mappings task to post process #40882

snigdhas · 2022-11-02T05:43:53Z

Add a new task to post process to derive code mappings for a single event per project per hour.

FIXES WOR-2356

src/sentry/tasks/post_process.py

armenzg · 2022-11-02T19:38:04Z

src/sentry/killswitches.py

@@ -163,6 +163,17 @@ class KillswitchInfo:
            "project_id": "A project ID to filter events by.",
        },
    ),
+    "post_process.derive-code-mappings": KillswitchInfo(


How does this functionality work? Is there more documentation for this file? Is there a UI associated to modify kills switches on the fly?

see WOR-2359

src/sentry/tasks/post_process.py

armenzg · 2022-11-02T19:39:57Z

src/sentry/tasks/post_process.py

+    from sentry.tasks.derive_code_mappings import derive_code_mappings
+
+    try:
+        event = job["event"]


Are the stacktraces available at this point? if all stacktraces match a code mapping we will not need to schedule anything.

We should be able to get stacktraces here like this. How would we check that it matches a code mapping?

Thinking this more. We should not put the logic here, otherwise, we hit the DB for every event.

armenzg · 2022-11-02T19:43:09Z

src/sentry/tasks/post_process.py

+        cache_key = f"code-mappings:{project.id}"
+        project_queued = cache.get(cache_key)
+        if project_queued is None:
+            cache.set(cache_key, True, 3600)


If we use the project_id as the controlling mechanism here, we will need to control fetching the trees per org on the other task at the org id level. Otherwise, two events from two projects will trigger get_trees_for_org twice in the same hour. Now that I see this here I can look into adding it tomorrow morning or feel free to add it. Let's just coordinate it about it.

It may make sense to add logging in case we want to debug the system:
logger.info(f"derive_code_mappings: Events from {project.id} will not have code derivation until {date_time_here}"

Added logging, good point!

Let's talk about your other point tomorrow. Our worst case scenario here is an org with a large number of projects (N) that all get events each hour. I'm not sure how to prevent get_trees_for_org being called N times.

What could happen within an hour for events for the same org:

Event A1 for project A

We try to derive code and get_trees_for_org gets called

Event A2 for project A (cache prevents calling task)

Event B1 for project B

We try to derive code and get_trees_for_org gets called a 2nd time

We could check the cache key here (or an extra one) to only allow one event processed per hour per org for now so we can go live.

I will work today on adding a way to track what is the current GH api limit and how to control when get_trees_for_org gets called. I need to research on memcache and the potential for OOM since the trees for org object becomes quite large.

src/sentry/tasks/post_process.py

armenzg · 2022-11-02T19:58:15Z

src/sentry/tasks/post_process.py

+            {"project_id": project.id},
+        ):
+            return
+        derive_code_mappings.delay(project.organization_id, project.id, event.data)


I prefer only sending event.data so all the logic is derived in there (even though I know it feels that repeating code over there; I find it clearer since it saves looking for the code of the caller).

I think this is alright since the only additional param is now the project ID. Otherwise, we would need to send the whole event to derive the project since the project_id isn't contained in event.data.

snigdhas · 2022-11-02T23:48:35Z

src/sentry/tasks/post_process.py

+                f"derive_code_mappings: Events from {project.id} will not have code mapping derivation until {timezone.now() + timedelta(hours=1)}"
+            )
+
+        if project_queued or not features.has(


This feature check will incorporate the option / killswitch added in https://github.com/getsentry/getsentry/pull/8777.

We don't need to add separate logic here, which is pretty neat!

Should we check this earlier? or only check once an hour?
It's probably good here.

Actually, this brings up a good point. We should add the killswitch check in derive_code_mappings.py as well. Even though it's a duplicate, we might want to stop all processing and right now, anything that's queued will still go through. I'll send a PR for that.

armenzg · 2022-11-03T14:10:16Z

src/sentry/tasks/post_process.py

+    from sentry.tasks.derive_code_mappings import derive_code_mappings
+
+    try:
+        event = job["event"]


Thinking this more. We should not put the logic here, otherwise, we hit the DB for every event.

armenzg · 2022-11-03T14:16:47Z

src/sentry/tasks/post_process.py

+        cache_key = f"code-mappings:{project.id}"
+        project_queued = cache.get(cache_key)
+        if project_queued is None:
+            cache.set(cache_key, True, 3600)


What could happen within an hour for events for the same org:

Event A1 for project A

We try to derive code and get_trees_for_org gets called

Event A2 for project A (cache prevents calling task)

Event B1 for project B

We try to derive code and get_trees_for_org gets called a 2nd time

We could check the cache key here (or an extra one) to only allow one event processed per hour per org for now so we can go live.

I will work today on adding a way to track what is the current GH api limit and how to control when get_trees_for_org gets called. I need to research on memcache and the potential for OOM since the trees for org object becomes quite large.

armenzg · 2022-11-03T14:18:44Z

src/sentry/tasks/post_process.py

+                f"derive_code_mappings: Events from {project.id} will not have code mapping derivation until {timezone.now() + timedelta(hours=1)}"
+            )
+
+        if project_queued or not features.has(


Should we check this earlier? or only check once an hour?
It's probably good here.

armenzg

Removing the approval just temporarily until we make sure we have a plan to prevent the GH API rate limiting exhaustion.

armenzg

🎉

…0973) From #40882 (comment)

Add a new task to the post process pipeline

ee08e24

snigdhas changed the base branch from master to snigdha/post-process-code-mapping November 2, 2022 05:44

github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Nov 2, 2022

snigdhas requested review from armenzg and a team November 2, 2022 05:44

vercel bot deployed to Preview – sentry November 2, 2022 05:45 View deployment

vercel bot deployed to Preview – storybook November 2, 2022 05:46 View deployment

Add a killswitch

308414c

vercel bot deployed to Preview – sentry November 2, 2022 06:00 View deployment

vercel bot deployed to Preview – storybook November 2, 2022 06:00 View deployment

snigdhas mentioned this pull request Nov 2, 2022

feat(code-mappings): Modify derive-code-mappings task to handle a single event #40881

Merged

snigdhas commented Nov 2, 2022

View reviewed changes

src/sentry/tasks/post_process.py Outdated Show resolved Hide resolved

snigdhas marked this pull request as ready for review November 2, 2022 06:22

armenzg added this to the Derived code mappings (Internal Release) milestone Nov 2, 2022

armenzg reviewed Nov 2, 2022

View reviewed changes

Base automatically changed from snigdha/post-process-code-mapping to master November 2, 2022 21:56

Merge branch 'master' into snigdha/post-process-new

799ce53

vercel bot deployed to Preview – sentry November 2, 2022 22:02 View deployment

vercel bot deployed to Preview – storybook November 2, 2022 22:04 View deployment

Snigdha Sharma added 2 commits November 2, 2022 15:08

Remove killswitch logic

cc24c06

Add logging

8af2d70

vercel bot deployed to Preview – sentry November 2, 2022 22:34 View deployment

vercel bot deployed to Preview – storybook November 2, 2022 22:59 View deployment

Merge branch 'master' into snigdha/post-process-new

0522e91

vercel bot deployed to Preview – storybook November 2, 2022 23:37 View deployment

vercel bot deployed to Preview – sentry November 2, 2022 23:37 View deployment

snigdhas commented Nov 2, 2022

View reviewed changes

armenzg approved these changes Nov 3, 2022

View reviewed changes

armenzg self-requested a review November 3, 2022 16:36

armenzg requested changes Nov 3, 2022

View reviewed changes

Switch cache to org-level

a0e1ee4

vercel bot deployed to Preview – storybook November 3, 2022 17:46 View deployment

vercel bot deployed to Preview – sentry November 3, 2022 17:46 View deployment

snigdhas requested a review from armenzg November 3, 2022 17:54

snigdhas mentioned this pull request Nov 3, 2022

feat(code-mappings): Add flag check before deriving code mappings #40973

Merged

armenzg approved these changes Nov 3, 2022

View reviewed changes

snigdhas added a commit that referenced this pull request Nov 3, 2022

feat(code-mappings): Add flag check before deriving code mappings (#4…

a956eae

…0973) From #40882 (comment)

snigdhas merged commit 281e26b into master Nov 3, 2022

snigdhas deleted the snigdha/post-process-new branch November 3, 2022 19:51

armenzg assigned snigdhas Nov 9, 2022

github-actions bot locked and limited conversation to collaborators Nov 25, 2022

Uh oh!

feat(code-mappings): Add code mappings task to post process #40882

feat(code-mappings): Add code mappings task to post process #40882

Uh oh!

Conversation

snigdhas commented Nov 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

armenzg left a comment

Choose a reason for hiding this comment

Uh oh!

armenzg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

snigdhas commented Nov 2, 2022 •

edited

Loading