Skip to content

kallsyms: optimize things a bit #1785

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 2, 2025
Merged

kallsyms: optimize things a bit #1785

merged 3 commits into from
Jun 2, 2025

Conversation

lmb
Copy link
Collaborator

@lmb lmb commented May 20, 2025

Two small optimisations which drop CPU time by 25% on my machine, plus a lot less allocations.

@lmb lmb force-pushed the kallsyms-faster branch 3 times, most recently from b83350a to 1f92af2 Compare May 20, 2025 16:12
@lmb lmb marked this pull request as ready for review May 20, 2025 16:32
@lmb lmb requested a review from a team as a code owner May 20, 2025 16:32
Copy link
Collaborator

@ti-mo ti-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Making parseSymbol return an iterator makes its implementation a little awkward since it itself contains a loop over an iterator type. I'd probably drop that change, I think the optimization stands well on its own.

@lmb lmb force-pushed the kallsyms-faster branch from 1f92af2 to 303d68e Compare May 21, 2025 15:42
ti-mo and others added 3 commits June 2, 2025 13:48
Add a few fast benchmarks to make optimization easier.

Signed-off-by: Timo Beckers <[email protected]>
It's possible for /proc/kallsyms to contain utf8, by virtue of the kernel
treating symbol names as bags of bytes. Needless to say these symbols are
going to be rare in the wild, if they exist at all.

Do not pessimise the common case of parsing an all-ASCII kallsyms table
by checking for invalid utf8. This means that invalid utf8 will not be
rejected anymore, while valid encodings continue to work as before.

```
goos: linux
goarch: amd64
pkg: github.com/cilium/ebpf/internal/kallsyms
cpu: AMD Ryzen 7 3700X 8-Core Processor
                           │   old.txt   │              new.txt               │
                           │   sec/op    │   sec/op     vs base               │
AssignModules-16             4.740µ ± 3%   3.326µ ± 0%  -29.85% (p=0.002 n=6)
AssignAddresses-16           6.327µ ± 1%   4.187µ ± 1%  -33.82% (p=0.002 n=6)
AssignModulesKallsyms-16     376.3m ± 1%   307.1m ± 3%  -18.40% (p=0.002 n=6)
AssignAddressesKallsyms-16   405.5m ± 5%   322.7m ± 2%  -20.41% (p=0.002 n=6)
geomean                      1.463m        1.084m       -25.90%

                           │   old.txt    │              new.txt               │
                           │     B/op     │     B/op      vs base              │
AssignModules-16             4.242Ki ± 0%   4.258Ki ± 0%  +0.37% (p=0.002 n=6)
AssignAddresses-16           4.438Ki ± 0%   4.453Ki ± 0%  +0.35% (p=0.002 n=6)
AssignModulesKallsyms-16     8.260Mi ± 0%   8.260Mi ± 0%       ~ (p=0.777 n=6)
AssignAddressesKallsyms-16   12.29Mi ± 0%   12.29Mi ± 0%       ~ (p=0.167 n=6)
geomean                      211.6Ki        212.0Ki       +0.18%

                           │   old.txt   │               new.txt                │
                           │  allocs/op  │  allocs/op   vs base                 │
AssignModules-16              9.000 ± 0%   11.000 ± 0%  +22.22% (p=0.002 n=6)
AssignAddresses-16            19.00 ± 0%    21.00 ± 0%  +10.53% (p=0.002 n=6)
AssignModulesKallsyms-16     323.6k ± 0%   323.6k ± 0%        ~ (p=1.000 n=6) ¹
AssignAddressesKallsyms-16   536.6k ± 0%   536.6k ± 0%        ~ (p=1.000 n=6)
geomean                      2.334k        2.517k        +7.81%
¹ all samples are equal
```

Signed-off-by: Lorenz Bauer <[email protected]>
Co-authored-by: Timo Beckers <[email protected]>
We currently allocate a lot of small strings to hold symbol and
module name. This happens even if we aren't interested in that
specific symbol.

Use the map[string([]byte)] optimization to remove temporary
allocations.

```
goos: linux
goarch: amd64
pkg: github.com/cilium/ebpf/internal/kallsyms
cpu: AMD Ryzen 7 3700X 8-Core Processor
                           │   old.txt   │              new.txt               │
                           │   sec/op    │   sec/op     vs base               │
AssignModules-16             3.326µ ± 0%   3.009µ ± 2%   -9.53% (p=0.002 n=6)
AssignAddresses-16           4.187µ ± 1%   3.351µ ± 1%  -19.98% (p=0.002 n=6)
AssignModulesKallsyms-16     307.1m ± 3%   294.3m ± 7%   -4.15% (p=0.002 n=6)
AssignAddressesKallsyms-16   322.7m ± 2%   301.7m ± 2%   -6.52% (p=0.002 n=6)
geomean                      1.084m        972.6µ       -10.26%

                           │     old.txt      │               new.txt               │
                           │       B/op       │     B/op      vs base               │
AssignModules-16                 4.258Ki ± 0%   4.172Ki ± 0%   -2.02% (p=0.002 n=6)
AssignAddresses-16               4.453Ki ± 0%   4.141Ki ± 0%   -7.02% (p=0.002 n=6)
AssignModulesKallsyms-16      8458.543Ki ± 0%   4.312Ki ± 0%  -99.95% (p=0.002 n=6)
AssignAddressesKallsyms-16   12586.352Ki ± 0%   4.219Ki ± 0%  -99.97% (p=0.002 n=6)
geomean                          212.0Ki        4.210Ki       -98.01%

                           │     old.txt     │              new.txt               │
                           │    allocs/op    │ allocs/op   vs base                │
AssignModules-16                 11.000 ± 0%   5.000 ± 0%   -54.55% (p=0.002 n=6)
AssignAddresses-16               21.000 ± 0%   3.000 ± 0%   -85.71% (p=0.002 n=6)
AssignModulesKallsyms-16      323560.00 ± 0%   12.00 ± 0%  -100.00% (p=0.002 n=6)
AssignAddressesKallsyms-16   536565.000 ± 0%   7.000 ± 0%  -100.00% (p=0.002 n=6)
geomean                          2.517k        5.958        -99.76%
```

Signed-off-by: Lorenz Bauer <[email protected]>
Co-authored-by: Timo Beckers <[email protected]>
@ti-mo ti-mo force-pushed the kallsyms-faster branch from 303d68e to 9a48c33 Compare June 2, 2025 12:55
@ti-mo ti-mo merged commit 1171085 into cilium:main Jun 2, 2025
17 of 18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants