





















































This week’s CloudPro is a guest special from Kaiwan N Billimoria, the author of Linux Kernel Programming. Kaiwan runs world-class, seriously-valuable, high on returns, technical Linux OS (Corporate and Individual-Online) training programs at https://kaiwantech.com.
In today’s issue, Kaiwan walks us through Flame Graphs: a powerful tool to visualize which call paths dominate at runtime and uncover performance bottlenecks.
If you want to go deeper, his book Linux Kernel Programming is available for just $9.99 as part of Packt’s Summer Sale.
Cheers,
Editor-in-Chief
P.S. If you’re into platform engineering, check out Platform Weekly: the world’s largest newsletter for platform engineers with 100,000+ readers. Subscribe here.
P.P.S. DeployCon is happening June 25. An engineer-first GenAI summit featuring teams from Meta, Tinder, DoorDash, and more. Join in person at the AWS Loft SF or online. Register now.
Analyzing workloads is something all engineers end up doing at some point or another (or it’s their job description!). An obvious reason is performance analysis; for example, CPU usage may spike at times, causing issues or even outages.
The need of the hour: observe, analyze, and figure out the root cause of the performance issue! Of course, that’s often easier said than done; this kind of work can bog down even experienced professionals...
Borrowing from Brendan Gregg’s wonderful presentation (though old, it’s still relevant):
In general, answering the ‘Who’ and the ‘How’ are simple(r):
The harder questions tend to be the ‘Why?’ and ‘What?’:
The following slide illustrates this (again, from Brendan Gregg):
Right. So what the heck’s this Flame Graph thingy? Let’s explore!
We’ll abbreviate Flame Graphs as FG.
There are several types of FGs (CPU, GPU, memory, off-cpu, etc.); here we keep the focus on just one: CPU FGs via Linux’s powerful perf CPU profiler.
The moment a tool can generate profiling data that includes stack traces, it implies that FGs can be generated! Thus, there are several tools besides perf that generate FGs:
We’ll focus only on using Linux perf; it’s considered one of the best modern CPU profiling tools on the platform
With perf, you can indeed profile your workload and see where exactly CPU usage shoots up. It’s easy: record something, get the report, and analyze it (well… it sounds easy at least).
Example:
sudo perf record -F 99 -a --call-graph dwarf -- sleep 10
(Instead of the -a option switch, you can use the -p PID option to profile a particular process. The generated perf.data file’s owned by root; do a chown to place its ownership under your account if you wish.)
sudo perf report --stdio # or --tui
…
(Try it!).
This begs the question – so why not just use perf? Ah, that’s the thing: on non-trivial workloads, the report can be simply humongous, even going into dozens of (printed) pages! Are you really going to read through all of it, trying to spot the outliers?
It’s why we use the so-called Flame Graph (FG) – to visualize dense textual data and make sense of it; it’s so much clearer (so much more humane, literally).
Installation
First off, ensure both the perf utility and the FlameGraph scripts are installed.
Quick note: to install perf on Ubuntu/Debian, you typically need to be on a distro kernel (not a custom one).Why? Because – unusually for an app – it’s tightly coupled to the kernel it runs on! Assuming you’re on an Ubuntu/Debian distro, do this: sudo apt install linux-perf-$(uname -r) linux-tools-generic (even the linux-tools-generic package might be sufficient).
If you’re on a custom-built kernel, build perf (it’s easy): cd <kernel-src-tree>/tools/perf ; make .
Install FG from here or do (in an empty folder):
git clone --depth 1 https://github.com/brendangregg/FlameGraph.git
perf record -F 99 --call-graph dwarf [-a]|[-p pid]
Generates the perf.data binary file.
perf script > perfscript_out.dat
The FG repo includes several stackcollapse-* scripts; we use the stackcollapse-perf.pl one:
cat perfscript_out.dat | FlameGraph/stackcollapse-perf.pl \
| FlameGraph/flamegraph.pl > out.svg
We’ll assume you’ve installed both perf and the Flame Graph GitHub repo (the latter under your home dir).
sudo perf record -F 99 -a --call-graph dwarf -- sleep 10sudo chown ${LOGNAME}:${LOGNAME} perf.data
perf script > perfscript_out.dat
cat perfscript_out.dat | ~/FlameGraph/stackcollapse-perf.pl |
~/FlameGraph/flamegraph.pl > out.svg
Hmm, better if we zoom in… so I click on one of the rectangles on the lower-left (say on the gnome-shell one):
Ah, better.
Some really key points regarding how to interpret the Flame Graph:
In effect: the hottest code-paths – the ones that dominate - are the widest rectangles!
The top-edge – the rectangle at the very top - is the function on-CPU; beneath is ancestry (how it was invoked).
Here’s another FG I captured while SSH was running (truncated screenshot showing the interesting portion):
Interesting; the “towers” seem to be inverted! Yes, they’ve becomes top-down (downward-growing stacks) instead of bottom-up… they’re called icicles!
An option to the perf script command sets this up.
A fantastic thing about the FG is that both userspace and kernel-space functions are captured! It’s thus called a mixed-mode FG. For e.g., with the ‘ssh’ FG, you can clearly see the call path leading down to the kernel network protocol stack code – functions from the socket/INET layer sock_*(), followed by L4 tcp_*(), followed by the L3 ip_*() functions; even the invocation of the (network) device transmit – the dev_hard_start_xmit() and others – are visible!
Next, to make this a bit easier to use (no need to remember the syntax, easier options), I wrote a wrapper over the original Flame Graph scripts; the top-level one’s named flamegrapher.sh: https://github.com/kaiwan/L5_user_debug/tree/main/flamegraph (it forms a portion of my ‘Linux Userspace Debugging – Tools & Techniques’ training repo).
It’s Help screen reveals how you can – very easily! – use it to generate FGs:
$ ./flame_grapher.sh
Usage:
flame_grapher.sh -o svg-out-filename(without .svg) [options ...]
-o svg-out-filename(without .svg): name of SVG file to generate (saved under /tmp/flamegraphs/)
Optional switches:
[-p PID]: PID = generate a FlameGraph for ONLY this process or thread
If not passed, the *entire system* is sampled...
[-s <style>]: normal = draw the stack frames growing upward [default]
icicle = draw the stack frames growing downward
[-t <type>]: graph= produce a flame graph (X axis is NOT time, merges stacks) [default]
Good for performance outliers (who's eating CPU? using max stack?); works well for multi-threaded apps
chart= produce a flame chart (sort by time, do not merge stacks)
Good for seeing all calls; works well for single-threaded apps
[-f <freq>]: frequency (HZ) to have perf sample the system/process at [default=99]
Too high a value here can cause issues
-h|-?: show this help screen.
Note:
Notice a few points:
(Do read README.md as well. Hey, this wrapper’s lightly tested; please help me (and everyone!) out by raising Issues, as and when you come across them!)
Tip: Try the speedscope.app site to interact with your FlameGraph!
B Gregg’s Linux Performance Observability Tools diagram across the stack!
alias ptop='sudo perf top --sort pid,comm,dso,symbol 2>/dev/null'
alias ptopv='sudo perf top -r 80 -f 99 --sort pid,comm,dso,symbol \
--demangle-kernel -v --call-graph dwarf,fractal 2>/dev/null'
📢 If your company is interested in reaching an audience of developers and, technical professionals, and decision makers, you may want toadvertise with us.
If you have any comments or feedback, just reply back to this email.
Thanks for reading and have a great day!