[AMDGPU] si-peephole-sdwa: Fix cndmask vcc use for wave32 #139541

frederik-h · 2025-05-12T12:14:46Z

Before V_CNDMASK_B32_e64 gets converted to SDWA form, a conversion to V_CNDMASK_B32_e32 occurs.
The vcc use of this instruction must be fixed into a vcc_lo use for wave32. This fix only happens after the final
conversion to the SDWA form. This led to a compiler error in situations where the conversion to SDWA aborts.

Make sure that the vcc-fix gets applied even if the SDWA conversion is not completed.

…DMASK

llvmbot · 2025-05-12T12:15:16Z

@llvm/pr-subscribers-backend-amdgpu

Author: Frederik Harwath (frederik-h)

Changes

Before V_CNDMASK_B32_e64 gets converted to SDWA form, a conversion to V_CNDMASK_B32_e32 occurs.
The vcc use of this instruction must be fixed into a vcc_lo use for wave32. This fix only happens after the final
conversion to the SDWA form. This led to a compiler error in situations where the conversion to SDWA aborts.

Make sure that the vcc-fix gets applied even if the SDWA conversion is not completed.

Full diff: https://github.com/llvm/llvm-project/pull/139541.diff

2 Files Affected:

(modified) llvm/lib/Target/AMDGPU/SIPeepholeSDWA.cpp (+1)
(added) llvm/test/CodeGen/AMDGPU/sdwa-peephole-cndmask-fail.ll (+51)

diff --git a/llvm/lib/Target/AMDGPU/SIPeepholeSDWA.cpp b/llvm/lib/Target/AMDGPU/SIPeepholeSDWA.cpp
index 8eb1d7253cd48..bd8baaaa3df20 100644
--- a/llvm/lib/Target/AMDGPU/SIPeepholeSDWA.cpp
+++ b/llvm/lib/Target/AMDGPU/SIPeepholeSDWA.cpp
@@ -1105,6 +1105,7 @@ void SIPeepholeSDWA::convertVcndmaskToVOP2(MachineInstr &MI,
                        .add(*TII->getNamedOperand(MI, AMDGPU::OpName::src0))
                        .add(*TII->getNamedOperand(MI, AMDGPU::OpName::src1))
                        .setMIFlags(MI.getFlags());
+  TII->fixImplicitOperands(*Converted);
   LLVM_DEBUG(dbgs() << "Converted to VOP2: " << *Converted);
   (void)Converted;
   MI.eraseFromParent();
diff --git a/llvm/test/CodeGen/AMDGPU/sdwa-peephole-cndmask-fail.ll b/llvm/test/CodeGen/AMDGPU/sdwa-peephole-cndmask-fail.ll
new file mode 100644
index 0000000000000..9ab5a31b52441
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/sdwa-peephole-cndmask-fail.ll
@@ -0,0 +1,51 @@
+; RUN: llc %s -march=amdgcn -mcpu=gfx1030 -o - 2>&1 | FileCheck %s
+
+; In this test, V_CNDMASK_B32_e64 gets converted to V_CNDMASK_B32_e32,
+; but the expected conversion to SDWA does not occur.  This led to a
+; compilation error, because the use of $vcc in the resulting
+; instruction must be fixed to $vcc_lo for wave32 which only happened
+; after the full conversion to SDWA.
+
+
+; CHECK-NOT: {{.*}}V_CNDMASK_B32_e32{{.*}}$vcc
+; CHECK-NOT: {{.*}}Bad machine code: Virtual register defs don't dominate all uses
+; CHECK: {{.*}}v_cndmask_b32_e32{{.*}}vcc_lo
+
+; ModuleID = 'test.ll'
+source_filename = "test.ll"
+target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128-p9:192:256:256:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7:8:9"
+target triple = "amdgcn-amd-amdhsa"
+
+define amdgpu_kernel void @quux(i32 %arg, i1 %arg1, i1 %arg2) #0 {
+bb:
+  br i1 %arg1, label %bb9, label %bb3
+
+bb3:                                              ; preds = %bb
+  %call = tail call i32 @llvm.amdgcn.workitem.id.x()
+  %mul = mul i32 %call, 5
+  %zext = zext i32 %mul to i64
+  %getelementptr = getelementptr i8, ptr addrspace(1) null, i64 %zext
+  %getelementptr4 = getelementptr i8, ptr addrspace(1) %getelementptr, i64 4
+  %load = load i8, ptr addrspace(1) %getelementptr4, align 1
+  %getelementptr5 = getelementptr i8, ptr addrspace(1) %getelementptr, i64 3
+  %load6 = load i8, ptr addrspace(1) %getelementptr5, align 1
+  %insertelement = insertelement <5 x i8> poison, i8 %load, i64 4
+  %select = select i1 %arg2, <5 x i8> %insertelement, <5 x i8> <i8 poison, i8 poison, i8 poison, i8 poison, i8 0>
+  %insertelement7 = insertelement <5 x i8> %select, i8 %load6, i64 0
+  %icmp = icmp ult i32 0, %arg
+  %select8 = select i1 %icmp, <5 x i8> zeroinitializer, <5 x i8> %insertelement7
+  %shufflevector = shufflevector <5 x i8> zeroinitializer, <5 x i8> %select8, <5 x i32> <i32 0, i32 1, i32 7, i32 8, i32 9>
+  br label %bb9
+
+bb9:                                              ; preds = %bb3, %bb
+  %phi = phi <5 x i8> [ %shufflevector, %bb3 ], [ zeroinitializer, %bb ]
+  %extractelement = extractelement <5 x i8> %phi, i64 0
+  store i8 %extractelement, ptr addrspace(1) null, align 1
+  ret void
+}
+
+; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
+declare noundef range(i32 0, 1024) i32 @llvm.amdgcn.workitem.id.x() #1
+
+attributes #0 = { "target-cpu"="gfx1030" }
+attributes #1 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) "target-cpu"="gfx1030" }

llvm/test/CodeGen/AMDGPU/sdwa-peephole-cndmask-fail.ll

llvm/lib/Target/AMDGPU/SIPeepholeSDWA.cpp

Co-authored-by: Matt Arsenault <[email protected]>

llvm/test/CodeGen/AMDGPU/sdwa-peephole-cndmask-fail.ll

llvm/test/CodeGen/AMDGPU/sdwa-peephole-cndmask-wave32.mir

Co-authored-by: Matt Arsenault <[email protected]>

- Must use -mtriple to reproduce the bug on the unfixed branch - Function does not need to be a kernel

llvm/test/CodeGen/AMDGPU/sdwa-peephole-cndmask-fail.ll

…sk-fix-vcc

frederik-h added 2 commits May 12, 2025 11:20

[AMDGPU] Add tests that demonstrates si-peephole-sdwa failure on V_CN…

d32f060

…DMASK

[AMDGPU] si-peephole-sdwa: Fix cndmask vcc use for wave32

85e8efa

frederik-h requested a review from arsenm May 12, 2025 12:14

llvmbot added the backend:AMDGPU label May 12, 2025

Clean up test

dc740ce

arsenm reviewed May 12, 2025

View reviewed changes

llvm/test/CodeGen/AMDGPU/sdwa-peephole-cndmask-fail.ll Outdated Show resolved Hide resolved

llvm/test/CodeGen/AMDGPU/sdwa-peephole-cndmask-fail.ll Outdated Show resolved Hide resolved

llvm/lib/Target/AMDGPU/SIPeepholeSDWA.cpp Show resolved Hide resolved

frederik-h and others added 2 commits May 12, 2025 14:46

Update llvm/test/CodeGen/AMDGPU/sdwa-peephole-cndmask-fail.ll

6f24263

Co-authored-by: Matt Arsenault <[email protected]>

Add mir test

4b52ec5

frederik-h requested a review from arsenm May 12, 2025 15:32

arsenm reviewed May 12, 2025

View reviewed changes

frederik-h and others added 3 commits May 13, 2025 09:15

Update llvm/test/CodeGen/AMDGPU/sdwa-peephole-cndmask-wave32.mir

b5f3df9

Co-authored-by: Matt Arsenault <[email protected]>

Update llvm/test/CodeGen/AMDGPU/sdwa-peephole-cndmask-fail.ll

d68738f

Co-authored-by: Matt Arsenault <[email protected]>

Simplify and correct ll test

5b1a158

- Must use -mtriple to reproduce the bug on the unfixed branch - Function does not need to be a kernel

arsenm reviewed May 13, 2025

View reviewed changes

llvm/test/CodeGen/AMDGPU/sdwa-peephole-cndmask-fail.ll Outdated Show resolved Hide resolved

arsenm approved these changes May 13, 2025

View reviewed changes

llvm/test/CodeGen/AMDGPU/sdwa-peephole-cndmask-fail.ll Outdated Show resolved Hide resolved

frederik-h added 3 commits May 13, 2025 10:01

Update test expectations

3a1e535

Update mir test

245792d

Merge remote-tracking branch 'upstream/main' into peephole-sdwa-cndma…

2416eff

…sk-fix-vcc

frederik-h merged commit 1377535 into llvm:main May 14, 2025
11 checks passed

frederik-h deleted the peephole-sdwa-cndmask-fix-vcc branch May 14, 2025 05:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMDGPU] si-peephole-sdwa: Fix cndmask vcc use for wave32 #139541

[AMDGPU] si-peephole-sdwa: Fix cndmask vcc use for wave32 #139541

Uh oh!

frederik-h commented May 12, 2025

Uh oh!

llvmbot commented May 12, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[AMDGPU] si-peephole-sdwa: Fix cndmask vcc use for wave32 #139541

[AMDGPU] si-peephole-sdwa: Fix cndmask vcc use for wave32 #139541

Uh oh!

Conversation

frederik-h commented May 12, 2025

Uh oh!

llvmbot commented May 12, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!