Commit dbffe91
committed
Add PGO+LTO Makefile
Adds a convenient way to enable PGO+LTO on Julia and LLVM together:
1. `cd contrib/pgo-lto`
2. `make -j$(nproc) stage1`
3. `make clean-profiles`
4. `./stage1.build/julia -O3 -e 'using Pkg; Pkg.add("LoopVectorization"); Pkg.test("LoopVectorization")'`
5. `make -j$(nproc) stage2`
This results quite often in spectacular speedups for time to first X as
it reduces the time spent in LLVM optimization passes by 25 or even 30%.
Example 1:
```julia
using LoopVectorization
function f!(a, b)
@turbo for i in eachindex(a)
a[i] *= b[i]
end
return a
end
f!(rand(1), rand(1))
```
```console
$ time ./julia -O3 lv.jl
```
Without PGO+LTO: 14.801s
With PGO+LTO: 11.978s (-19%)
Example 2:
```console
$ time ./julia -e 'using Pkg; Pkg.test("Unitful");'
```
Without PGO+LTO: 1m47.688s
With PGO+LTO: 1m35.704s (-11%)
Example 3 (taken from issue #45395, which is almost only LLVM):
```console
$ JULIA_LLVM_ARGS=-time-passes ./julia script-45395.jl
```
Without PGO+LTO:
```
===-------------------------------------------------------------------------===
... Pass execution timing report ...
===-------------------------------------------------------------------------===
Total Execution Time: 101.0130 seconds (98.6253 wall clock)
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
53.6961 ( 54.7%) 0.1050 ( 3.8%) 53.8012 ( 53.3%) 53.8045 ( 54.6%) Unroll loops
25.5423 ( 26.0%) 0.0072 ( 0.3%) 25.5495 ( 25.3%) 25.5444 ( 25.9%) Global Value Numbering
7.1995 ( 7.3%) 0.0526 ( 1.9%) 7.2521 ( 7.2%) 7.2517 ( 7.4%) Induction Variable Simplification
5.0541 ( 5.1%) 0.0098 ( 0.3%) 5.0639 ( 5.0%) 5.0561 ( 5.1%)
Combine redundant instructions #2
```
Wit PGO+LTO:
```
===-------------------------------------------------------------------------===
... Pass execution timing report ...
===-------------------------------------------------------------------------===
Total Execution Time: 72.6507 seconds (70.1337 wall clock)
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
36.0894 ( 51.7%) 0.0825 ( 2.9%) 36.1719 ( 49.8%) 36.1738 ( 51.6%) Unroll loops
16.5713 ( 23.7%) 0.0129 ( 0.5%) 16.5843 ( 22.8%) 16.5794 ( 23.6%) Global Value Numbering
5.9047 ( 8.5%) 0.0395 ( 1.4%) 5.9442 ( 8.2%) 5.9438 ( 8.5%) Induction Variable Simplification
4.7566 ( 6.8%) 0.0078 ( 0.3%) 4.7645 ( 6.6%) 4.7575 ( 6.8%) Combine redundant instructions #2
```
Or -28% time spent in LLVM.
---
Finally there's a significant reduction in binary sizes. For libLLVM.so:
```
79M usr/lib/libLLVM-13jl.so (before)
67M usr/lib/libLLVM-13jl.so (after)
```
And it can be reduced by another 2MB with `--icf=safe` when using LLD as
a linker anways.
Turn into makefile
Newline
Use two out of source builds
Ignore profiles + build dirs
Add --icf=safe
stage0 setup prebuilt clang with [cd]tors->init/fini patch1 parent 4015e0d commit dbffe91
2 files changed
+84
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
0 commit comments