Skip to content

fix(ebpf) fix stderr reading deadlock #1292

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 22, 2022
Merged

fix(ebpf) fix stderr reading deadlock #1292

merged 1 commit into from
Jul 22, 2022

Conversation

korniltsev
Copy link
Collaborator

Here we read until eof synchronously
https://github.com/pyroscope-io/pyroscope/blob/9cb58b167c45a667da1ebb1010d28bff2f454672/pkg/agent/ebpfspy/session_linux.go#L103

Here we read one line of stdout asynchronously and block because the used channel is not buffered
https://github.com/pyroscope-io/pyroscope/blob/9cb58b167c45a667da1ebb1010d28bff2f454672/pkg/agent/ebpfspy/session_linux.go#L87

When profile-bpfcc receives SIGINT it tries to write a bunch of lines and blocks hitting kernel limit (I suppose 64k) and blocks. When it receives SIGINT it retries.

sudo strace -fp $(pgrep bpf)
strace: Process 63089 attached
write(1, "ThreadPoolForeg;start_thread;bas"..., 6969) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[]})                 = -1 EINTR (Interrupted system call)
write(1, "ThreadPoolForeg;start_thread;bas"..., 6969) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[]})                 = -1 EINTR (Interrupted system call)
write(1, "ThreadPoolForeg;start_thread;bas"..., 6969

tldr

  1. Pyroscope is reading bpfcc's full stderr synchronously - means to wait for bpfcc process to die.
  2. The bpfcc is not dying because it is trying to write profiles to it's stdout and blocks.
  3. Pyroscope read one line of bpfcc's stdout, tries to put it into chan and blocks
  4. The blocked chan is read after bpfcc's stderr [1] so it is never read

With this PR we read stderr asynchronously

@github-actions
Copy link
Contributor

size-limit report 📦

Path Size Loading time (3g) Running time (snapdragon) Total time
webapp/public/assets/app.js 399.01 KB (0%) 8 s (0%) 3.3 s (+15.23% 🔺) 11.3 s
webapp/public/assets/app.css 12.93 KB (0%) 259 ms (0%) 0 ms (+100% 🔺) 259 ms
webapp/public/assets/styles.css 9.8 KB (0%) 197 ms (0%) 0 ms (+100% 🔺) 197 ms
packages/pyroscope-flamegraph/dist/index.js 89.24 KB (0%) 1.8 s (0%) 1.3 s (+15.79% 🔺) 3.1 s
packages/pyroscope-flamegraph/dist/index.node.js 80.62 KB (0%) 1.7 s (0%) 408 ms (-11.23% 🔽) 2.1 s
packages/pyroscope-flamegraph/dist/index.css 5.84 KB (0%) 117 ms (0%) 0 ms (+100% 🔺) 117 ms

@codecov
Copy link

codecov bot commented Jul 21, 2022

Codecov Report

Merging #1292 (d663b84) into main (5f5b911) will decrease coverage by 1.36%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##             main    #1292      +/-   ##
==========================================
- Coverage   69.88%   68.52%   -1.35%     
==========================================
  Files         105      112       +7     
  Lines        3419     3713     +294     
  Branches      897      842      -55     
==========================================
+ Hits         2389     2544     +155     
- Misses       1027     1167     +140     
+ Partials        3        2       -1     
Impacted Files Coverage Δ
...ages/pyroscope-flamegraph/src/FlameGraph/decode.ts 56.76% <0.00%> (-33.24%) ⬇️
packages/pyroscope-models/src/index.ts 76.93% <0.00%> (-23.07%) ⬇️
...egraph/src/FlameGraph/FlameGraphComponent/color.ts 79.13% <0.00%> (-1.77%) ⬇️
...graph/src/FlameGraph/FlameGraphComponent/index.tsx 83.08% <0.00%> (-1.32%) ⬇️
packages/pyroscope-flamegraph/src/format/format.ts 68.43% <0.00%> (-1.04%) ⬇️
webapp/javascript/components/Sidebar.tsx 87.81% <0.00%> (-0.29%) ⬇️
webapp/javascript/ui/Sidebar.tsx 92.31% <0.00%> (-0.19%) ⬇️
...avascript/components/AppSelector/SelectorModal.tsx 88.24% <0.00%> (-0.13%) ⬇️
webapp/javascript/ui/Box.tsx 60.00% <0.00%> (ø)
webapp/javascript/services/render.ts 30.56% <0.00%> (ø)
... and 23 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5f5b911...d663b84. Read the comment docs.

@petethepig petethepig merged commit c163b37 into main Jul 22, 2022
@petethepig petethepig deleted the fix/ebpf_deadlock branch July 22, 2022 06:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants