Skip to content

Commit 51c51f6

Browse files
committed
fix(sci-biology/cmdock): update pgo task to ebola target
1 parent b7b3d25 commit 51c51f6

File tree

6 files changed

+64806
-80943
lines changed

6 files changed

+64806
-80943
lines changed

sci-biology/cmdock/cmdock-0.2.0-r14.ebuild renamed to sci-biology/cmdock/cmdock-0.2.0-r15.ebuild

Lines changed: 14 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,20 +2,28 @@
22
# Distributed under the terms of the GNU General Public License v2
33

44
# notes about optimization
5-
# CXXFLAGS="-O3" recommended
5+
# CXXFLAGS="-march=native -O3" recommended
66
# LTO recommended with clang, but hit or miss with gcc
77

88
# USE=pgo implements traditional compile => train => recompile
99
# trains on static data from an actual cmdock boinc job
10-
# env PGO_TIMEOUT=2h to change training time limit
10+
# env PGO_TIMEOUT=2h to train longer but it does not help much
11+
# training data needs updated whenever sidock switches target disease
1112

1213
# perfdata-sample implements live sampling PGO
1314
# see https://clang.llvm.org/docs/UsersManual.html#using-sampling-profilers
1415
# clang only - gcc tooling is not really usable
1516
# may require special CPU features for branch sampling
1617
# traditional pgo builds can be sampled but not both applied to the same build
1718
# can be repeated indefinitely, as any build with debug symbols can be sampled
18-
# adds about 10% runtime sample conversion overhead (todo: reduce)
19+
# adds about 20% runtime sample conversion overhead (todo: reduce)
20+
# no noticeable overhead unless perfdata is actually sampling
21+
# todo: might get better results when gathering and applying samples on same CPU
22+
23+
# top performers from tests on bdver2 - rough comparison with official project binaries
24+
# 1. 11% faster - gcc-13.3.1 USE="-clang pgo" CXXFLAGS="-march=native -O3 -flto -fno-profile-partial-training"
25+
# 2. 06% faster - gcc-13.3.1 USE="-clang -pgo" CXXFLAGS="-march=native -O3"
26+
# 3. 02% faster - clang-18.1.8 USE="clang perfdata-sample-use" CXXFLAGS="-march=native -O3 -flto -fno-profile-sample-accurate -fno-sample-profile-use-profi" with samples from skylake
1927
#
2028

2129
EAPI=8
@@ -161,7 +169,9 @@ src_configure() {
161169

162170
if use pgo || use perfdata-sample-use; then
163171
# do not assume all code paths are exercised during pgo training
164-
tc-is-clang && prepend-flags '-fno-profile-sample-accurate' || prepend-flags '-fprofile-partial-training'
172+
# without this flag, unused paths are optimized for size rather than speed
173+
# gcc has similar -fprofile-partial-training but it hurts slightly rather than help slightly
174+
tc-is-clang && prepend-flags '-fno-profile-sample-accurate'
165175
fi
166176

167177
if use perfdata-sample-gen; then
Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,8 @@
1-
4
2-
if - -10 SCORE.INTER 1.0 if - SCORE.NRUNS 6 0.0 -1.0,
3-
if - -15 SCORE.INTER 1.0 if - SCORE.NRUNS 14 0.0 -1.0,
4-
if - -20 SCORE.INTER 1.0 if - SCORE.NRUNS 24 0.0 -1.0,
5-
if - SCORE.NRUNS 50 0.0 -1.0,
1+
5
2+
if - -8 SCORE.INTER 1.0 if - SCORE.NRUNS 9 0.0 -1.0,
3+
if - -15 SCORE.INTER 1.0 if - SCORE.NRUNS 19 0.0 -1.0,
4+
if - -20 SCORE.INTER 1.0 if - SCORE.NRUNS 29 0.0 -1.0,
5+
if - -25 SCORE.INTER 1.0 if - SCORE.NRUNS 49 0.0 -1.0,
6+
if - SCORE.NRUNS 75 0.0 -1.0,
67
1
7-
- SCORE.INTER -20,
8+
- SCORE.INTER -20.0,

0 commit comments

Comments
 (0)