Skip to content

Commit 969c5bb

Browse files
authored
[BACKEND] Hotfix for perf regression (triton-lang#2822)
When annotated with ".target sm_80, debug" in a ptx file, `ptxas` is not able to apply compiler optimizations. To validate, adding "-O3" to the compilation command would report conflicts between the "debug" constraints and the optimization flag. To fix the problem, this PR converts `.target sm_<arch>, debug` to `.target sm_<arch>` before applying ptxas.
1 parent 56e7a3a commit 969c5bb

File tree

1 file changed

+2
-0
lines changed
  • python/triton/compiler/backends

1 file changed

+2
-0
lines changed

python/triton/compiler/backends/cuda.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -184,6 +184,8 @@ def make_ptx(src, metadata, opt, capability):
184184
ptx_version = ptx_get_version(cuda_version)
185185
ptx_version = f'{ptx_version//10}.{ptx_version%10}'
186186
ret = re.sub(r'\.version \d+\.\d+', f'.version {ptx_version}', ret, flags=re.MULTILINE)
187+
# Remove the debug flag that prevents ptxas from optimizing the code
188+
ret = re.sub(r",\s*debug|debug,\s*", "", ret)
187189
return ret
188190

189191
@staticmethod

0 commit comments

Comments
 (0)