Skip to content

[Codegen] Remove redundant instruction using machinelateCleanup #139716

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 15 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion llvm/lib/CodeGen/MachineLateInstrsCleanup.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -186,11 +186,13 @@ static bool isCandidate(const MachineInstr *MI, Register &DefedReg,
for (unsigned i = 0, e = MI->getNumOperands(); i != e; ++i) {
const MachineOperand &MO = MI->getOperand(i);
if (MO.isReg()) {
if (MO.isDef()) {
if (MO.isDef() && DefedReg == MCRegister::NoRegister) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't understand why this is necessary, in general this would be a malformed instruction

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some case instruction's MachineOperands MO.isDef() is getting true for more than one operand. Instruction having more than one define. So it just add a check that if the DefedReg is not assigned then try to find the value of register and assign it to DefedReg. Once it is assigned, do not reassign it.

As per my understanding, in any instruction's MOs there is only def and all other are use().

Please correct me if my understanding is incorrect.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instructions can have multiple defs, but this code specifically only wants to handle instructions with a single def. This should probably early exit if it encounters a second def

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the patch. Please check

if (i == 0 && !MO.isImplicit() && !MO.isDead())
DefedReg = MO.getReg();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd appreciate some comments to explain what's going on, for the existing cases and the new cases. Example:
// If the first def is explicit and not dead, remember it.
I'm not sure why explicit matters here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

else
return false;
} else if (MI->isMoveImmediate()) {
return DefedReg.isValid();
} else if (MO.getReg() && MO.getReg() != FrameReg)
return false;
} else if (!(MO.isImm() || MO.isCImm() || MO.isFPImm() || MO.isCPI() ||
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,6 @@ define void @widget(i32 %arg, i32 %arg1, ptr %arg2, ptr %arg3, ptr %arg4, i32 %a
; CHECK-NEXT: ; in Loop: Header=BB0_2 Depth=1
; CHECK-NEXT: mov x0, xzr
; CHECK-NEXT: mov x1, xzr
; CHECK-NEXT: mov w8, #1 ; =0x1
; CHECK-NEXT: stp xzr, xzr, [sp]
; CHECK-NEXT: stp x8, xzr, [sp, #16]
; CHECK-NEXT: bl _fprintf
Expand Down
1 change: 0 additions & 1 deletion llvm/test/CodeGen/AMDGPU/call-waitcnt.ll
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,6 @@ define amdgpu_kernel void @call_memory_no_dep(ptr addrspace(1) %ptr, i32) #0 {
; GCN-NEXT: s_waitcnt lgkmcnt(0)
; GCN-NEXT: global_store_dword v0, v0, s[6:7]
; GCN-NEXT: s_mov_b64 s[6:7], s[4:5]
; GCN-NEXT: v_mov_b32_e32 v0, 0
; GCN-NEXT: s_mov_b32 s32, 0
; GCN-NEXT: s_swappc_b64 s[30:31], s[8:9]
; GCN-NEXT: s_endpgm
Expand Down
3 changes: 1 addition & 2 deletions llvm/test/CodeGen/AMDGPU/captured-frame-index.ll
Original file line number Diff line number Diff line change
Expand Up @@ -113,8 +113,7 @@ define amdgpu_kernel void @stored_fi_to_fi() #0 {

; GCN-LABEL: {{^}}stored_fi_to_global:
; GCN: buffer_store_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, 0{{$}}
; GCN: v_mov_b32_e32 [[FI:v[0-9]+]], 0{{$}}
; GCN: buffer_store_dword [[FI]]
Comment on lines -116 to -117
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lost the point of the test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will check

; GCN: buffer_store_dword v{{[0-9]+}}
define amdgpu_kernel void @stored_fi_to_global(ptr addrspace(1) %ptr) #0 {
%tmp = alloca float, addrspace(5)
store float 0.0, ptr addrspace(5) %tmp
Expand Down
1 change: 0 additions & 1 deletion llvm/test/CodeGen/AMDGPU/cgp-addressing-modes-gfx1030.ll
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,6 @@ define amdgpu_kernel void @test_sink_small_offset_global_atomic_csub_i32(ptr add
; GCN-NEXT: v_cmpx_ne_u32_e32 0, v1
; GCN-NEXT: s_cbranch_execz .LBB0_2
; GCN-NEXT: ; %bb.1: ; %if
; GCN-NEXT: v_mov_b32_e32 v0, 0
; GCN-NEXT: v_mov_b32_e32 v1, 2
; GCN-NEXT: s_waitcnt lgkmcnt(0)
; GCN-NEXT: global_atomic_csub v0, v0, v1, s[2:3] offset:28 glc
Expand Down
1 change: 0 additions & 1 deletion llvm/test/CodeGen/AMDGPU/cgp-addressing-modes-gfx908.ll
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,6 @@ define amdgpu_kernel void @test_sink_small_offset_global_atomic_fadd_f32(ptr add
; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
; GCN-NEXT: s_cbranch_execz .LBB0_2
; GCN-NEXT: ; %bb.1: ; %if
; GCN-NEXT: v_mov_b32_e32 v0, 0
; GCN-NEXT: v_mov_b32_e32 v1, 2.0
; GCN-NEXT: s_waitcnt lgkmcnt(0)
; GCN-NEXT: global_atomic_add_f32 v0, v1, s[2:3] offset:28
Expand Down
3 changes: 1 addition & 2 deletions llvm/test/CodeGen/AMDGPU/cgp-addressing-modes.ll
Original file line number Diff line number Diff line change
Expand Up @@ -72,8 +72,7 @@ done:
; GCN-LABEL: {{^}}test_sink_global_small_max_mubuf_offset:
; GCN: s_and_saveexec_b64
; SICIVI: buffer_load_sbyte {{v[0-9]+}}, off, {{s\[[0-9]+:[0-9]+\]}}, 0 offset:4095{{$}}
; GFX9: v_mov_b32_e32 [[ZERO:v[0-9]+]], 0{{$}}
; GFX9: global_load_sbyte {{v[0-9]+}}, [[ZERO]], {{s\[[0-9]+:[0-9]+\]}} offset:4095{{$}}
; GFX9: global_load_sbyte {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}} offset:4095{{$}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still needs the zero check, the mov is just above here now?

; GCN: {{^}}.LBB2_2:
; GCN: s_or_b64 exec
define amdgpu_kernel void @test_sink_global_small_max_mubuf_offset(ptr addrspace(1) %out, ptr addrspace(1) %in) {
Expand Down
12 changes: 0 additions & 12 deletions llvm/test/CodeGen/AMDGPU/div_v2i128.ll
Original file line number Diff line number Diff line change
Expand Up @@ -323,8 +323,6 @@ define <2 x i128> @v_sdiv_v2i128_vv(<2 x i128> %lhs, <2 x i128> %rhs) {
; SDAG-NEXT: v_subrev_i32_e32 v36, vcc, 64, v30
; SDAG-NEXT: v_lshr_b64 v[37:38], v[6:7], v30
; SDAG-NEXT: v_add_i32_e32 v34, vcc, -1, v29
; SDAG-NEXT: v_mov_b32_e32 v12, 0
; SDAG-NEXT: v_mov_b32_e32 v13, 0
; SDAG-NEXT: v_mov_b32_e32 v14, 0
; SDAG-NEXT: v_mov_b32_e32 v15, 0
; SDAG-NEXT: s_mov_b64 s[10:11], 0
Expand Down Expand Up @@ -1107,8 +1105,6 @@ define <2 x i128> @v_udiv_v2i128_vv(<2 x i128> %lhs, <2 x i128> %rhs) {
; SDAG-NEXT: v_subrev_i32_e32 v28, vcc, 64, v22
; SDAG-NEXT: v_lshr_b64 v[29:30], v[6:7], v22
; SDAG-NEXT: v_add_i32_e32 v26, vcc, -1, v12
; SDAG-NEXT: v_mov_b32_e32 v20, 0
; SDAG-NEXT: v_mov_b32_e32 v21, 0
; SDAG-NEXT: v_mov_b32_e32 v10, 0
; SDAG-NEXT: v_mov_b32_e32 v11, 0
; SDAG-NEXT: s_mov_b64 s[10:11], 0
Expand Down Expand Up @@ -1679,8 +1675,6 @@ define <2 x i128> @v_srem_v2i128_vv(<2 x i128> %lhs, <2 x i128> %rhs) {
; SDAG-NEXT: v_subrev_i32_e32 v37, vcc, 64, v32
; SDAG-NEXT: v_lshr_b64 v[24:25], v[0:1], v32
; SDAG-NEXT: v_add_i32_e32 v36, vcc, -1, v31
; SDAG-NEXT: v_mov_b32_e32 v18, 0
; SDAG-NEXT: v_mov_b32_e32 v19, 0
; SDAG-NEXT: v_mov_b32_e32 v22, 0
; SDAG-NEXT: v_mov_b32_e32 v23, 0
; SDAG-NEXT: s_mov_b64 s[10:11], 0
Expand Down Expand Up @@ -1874,8 +1868,6 @@ define <2 x i128> @v_srem_v2i128_vv(<2 x i128> %lhs, <2 x i128> %rhs) {
; SDAG-NEXT: v_subrev_i32_e32 v51, vcc, 64, v38
; SDAG-NEXT: v_lshr_b64 v[22:23], v[4:5], v38
; SDAG-NEXT: v_add_i32_e32 v50, vcc, -1, v37
; SDAG-NEXT: v_mov_b32_e32 v18, 0
; SDAG-NEXT: v_mov_b32_e32 v19, 0
; SDAG-NEXT: v_mov_b32_e32 v20, 0
; SDAG-NEXT: v_mov_b32_e32 v21, 0
; SDAG-NEXT: s_mov_b64 s[10:11], 0
Expand Down Expand Up @@ -2562,8 +2554,6 @@ define <2 x i128> @v_urem_v2i128_vv(<2 x i128> %lhs, <2 x i128> %rhs) {
; SDAG-NEXT: v_subrev_i32_e32 v35, vcc, 64, v30
; SDAG-NEXT: v_lshr_b64 v[26:27], v[2:3], v30
; SDAG-NEXT: v_add_i32_e32 v34, vcc, -1, v8
; SDAG-NEXT: v_mov_b32_e32 v20, 0
; SDAG-NEXT: v_mov_b32_e32 v21, 0
; SDAG-NEXT: v_mov_b32_e32 v24, 0
; SDAG-NEXT: v_mov_b32_e32 v25, 0
; SDAG-NEXT: s_mov_b64 s[10:11], 0
Expand Down Expand Up @@ -2737,8 +2727,6 @@ define <2 x i128> @v_urem_v2i128_vv(<2 x i128> %lhs, <2 x i128> %rhs) {
; SDAG-NEXT: v_subrev_i32_e32 v39, vcc, 64, v34
; SDAG-NEXT: v_lshr_b64 v[26:27], v[6:7], v34
; SDAG-NEXT: v_add_i32_e32 v38, vcc, -1, v12
; SDAG-NEXT: v_mov_b32_e32 v22, 0
; SDAG-NEXT: v_mov_b32_e32 v23, 0
; SDAG-NEXT: v_mov_b32_e32 v24, 0
; SDAG-NEXT: v_mov_b32_e32 v25, 0
; SDAG-NEXT: s_mov_b64 s[10:11], 0
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,6 @@ define void @callee_with_stack_and_call() #0 {
; NO-SPILL-TO-VGPR-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:16
; NO-SPILL-TO-VGPR-NEXT: s_waitcnt vmcnt(0)
; NO-SPILL-TO-VGPR-NEXT: s_mov_b64 exec, s[4:5]
; NO-SPILL-TO-VGPR-NEXT: s_mov_b64 s[4:5], exec
; NO-SPILL-TO-VGPR-NEXT: s_mov_b64 exec, 1
; NO-SPILL-TO-VGPR-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:16
; NO-SPILL-TO-VGPR-NEXT: v_writelane_b32 v0, s31, 0
Expand All @@ -77,7 +76,6 @@ define void @callee_with_stack_and_call() #0 {
; NO-SPILL-TO-VGPR-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:16
; NO-SPILL-TO-VGPR-NEXT: s_waitcnt vmcnt(0)
; NO-SPILL-TO-VGPR-NEXT: s_mov_b64 exec, s[4:5]
; NO-SPILL-TO-VGPR-NEXT: s_mov_b64 s[4:5], exec
; NO-SPILL-TO-VGPR-NEXT: s_mov_b64 exec, 1
; NO-SPILL-TO-VGPR-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:16
; NO-SPILL-TO-VGPR-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
Expand Down
28 changes: 14 additions & 14 deletions llvm/test/CodeGen/AMDGPU/gfx-callable-return-types.ll
Original file line number Diff line number Diff line change
Expand Up @@ -3234,20 +3234,20 @@ define amdgpu_gfx void @call_72xi32() #1 {
; GFX11-NEXT: scratch_store_b128 off, v[0:3], s1
; GFX11-NEXT: v_dual_mov_b32 v0, s2 :: v_dual_mov_b32 v3, 0
; GFX11-NEXT: v_dual_mov_b32 v1, 0 :: v_dual_mov_b32 v2, 0
; GFX11-NEXT: v_dual_mov_b32 v5, 0 :: v_dual_mov_b32 v4, 0
; GFX11-NEXT: v_dual_mov_b32 v7, 0 :: v_dual_mov_b32 v6, 0
; GFX11-NEXT: v_dual_mov_b32 v9, 0 :: v_dual_mov_b32 v8, 0
; GFX11-NEXT: v_dual_mov_b32 v11, 0 :: v_dual_mov_b32 v10, 0
; GFX11-NEXT: v_dual_mov_b32 v13, 0 :: v_dual_mov_b32 v12, 0
; GFX11-NEXT: v_dual_mov_b32 v15, 0 :: v_dual_mov_b32 v14, 0
; GFX11-NEXT: v_dual_mov_b32 v17, 0 :: v_dual_mov_b32 v16, 0
; GFX11-NEXT: v_dual_mov_b32 v19, 0 :: v_dual_mov_b32 v18, 0
; GFX11-NEXT: v_dual_mov_b32 v21, 0 :: v_dual_mov_b32 v20, 0
; GFX11-NEXT: v_dual_mov_b32 v23, 0 :: v_dual_mov_b32 v22, 0
; GFX11-NEXT: v_dual_mov_b32 v25, 0 :: v_dual_mov_b32 v24, 0
; GFX11-NEXT: v_dual_mov_b32 v27, 0 :: v_dual_mov_b32 v26, 0
; GFX11-NEXT: v_dual_mov_b32 v29, 0 :: v_dual_mov_b32 v28, 0
; GFX11-NEXT: v_dual_mov_b32 v31, 0 :: v_dual_mov_b32 v30, 0
; GFX11-NEXT: v_dual_mov_b32 v5, 0 :: v_dual_mov_b32 v6, 0
; GFX11-NEXT: v_dual_mov_b32 v7, 0 :: v_dual_mov_b32 v8, 0
; GFX11-NEXT: v_dual_mov_b32 v9, 0 :: v_dual_mov_b32 v10, 0
; GFX11-NEXT: v_dual_mov_b32 v11, 0 :: v_dual_mov_b32 v12, 0
; GFX11-NEXT: v_dual_mov_b32 v13, 0 :: v_dual_mov_b32 v14, 0
; GFX11-NEXT: v_dual_mov_b32 v15, 0 :: v_dual_mov_b32 v16, 0
; GFX11-NEXT: v_dual_mov_b32 v17, 0 :: v_dual_mov_b32 v18, 0
; GFX11-NEXT: v_dual_mov_b32 v19, 0 :: v_dual_mov_b32 v20, 0
; GFX11-NEXT: v_dual_mov_b32 v21, 0 :: v_dual_mov_b32 v22, 0
; GFX11-NEXT: v_dual_mov_b32 v23, 0 :: v_dual_mov_b32 v24, 0
; GFX11-NEXT: v_dual_mov_b32 v25, 0 :: v_dual_mov_b32 v26, 0
; GFX11-NEXT: v_dual_mov_b32 v27, 0 :: v_dual_mov_b32 v28, 0
; GFX11-NEXT: v_dual_mov_b32 v29, 0 :: v_dual_mov_b32 v30, 0
; GFX11-NEXT: v_mov_b32_e32 v31, 0
; GFX11-NEXT: s_mov_b32 s1, return_72xi32@abs32@hi
; GFX11-NEXT: s_mov_b32 s0, return_72xi32@abs32@lo
; GFX11-NEXT: v_writelane_b32 v60, s31, 1
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -225,12 +225,10 @@ define amdgpu_kernel void @local_stack_offset_uses_sp_flat(ptr addrspace(1) %out
; MUBUF-NEXT: ; %bb.2: ; %split
; MUBUF-NEXT: v_mov_b32_e32 v1, 0x4000
; MUBUF-NEXT: v_or_b32_e32 v0, 0x12d4, v1
; MUBUF-NEXT: v_mov_b32_e32 v1, 0x4000
; MUBUF-NEXT: s_movk_i32 s4, 0x4000
; MUBUF-NEXT: buffer_load_dword v5, v0, s[0:3], 0 offen glc
; MUBUF-NEXT: s_waitcnt vmcnt(0)
; MUBUF-NEXT: v_or_b32_e32 v0, 0x12d0, v1
; MUBUF-NEXT: v_mov_b32_e32 v1, 0x4000
; MUBUF-NEXT: s_or_b32 s4, s4, 0x12c0
; MUBUF-NEXT: buffer_load_dword v4, v0, s[0:3], 0 offen glc
; MUBUF-NEXT: s_waitcnt vmcnt(0)
Expand Down
2 changes: 0 additions & 2 deletions llvm/test/CodeGen/AMDGPU/preserve-wwm-copy-dst-reg.ll
Original file line number Diff line number Diff line change
Expand Up @@ -395,7 +395,6 @@ define void @preserve_wwm_copy_dstreg(ptr %parg0, ptr %parg1, ptr %parg2) #0 {
; GFX908-NEXT: buffer_load_dword v2, off, s[0:3], s33 offset:168
; GFX908-NEXT: s_waitcnt vmcnt(0)
; GFX908-NEXT: s_mov_b64 exec, s[16:17]
; GFX908-NEXT: s_mov_b64 s[16:17], exec
; GFX908-NEXT: s_mov_b64 exec, 1
; GFX908-NEXT: buffer_store_dword v2, off, s[0:3], s33 offset:168
; GFX908-NEXT: v_writelane_b32 v2, s31, 0
Expand Down Expand Up @@ -743,7 +742,6 @@ define void @preserve_wwm_copy_dstreg(ptr %parg0, ptr %parg1, ptr %parg2) #0 {
; GFX908-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:168
; GFX908-NEXT: s_waitcnt vmcnt(0)
; GFX908-NEXT: s_mov_b64 exec, s[4:5]
; GFX908-NEXT: s_mov_b64 s[4:5], exec
; GFX908-NEXT: s_mov_b64 exec, 1
; GFX908-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:168
; GFX908-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
Expand Down
1 change: 0 additions & 1 deletion llvm/test/CodeGen/AMDGPU/required-export-priority.ll
Original file line number Diff line number Diff line change
Expand Up @@ -267,7 +267,6 @@ define amdgpu_ps void @test_export_across_store_load(i32 %idx, float %v) #0 {
; GCN-NEXT: v_cmp_eq_u32_e32 vcc_lo, 1, v0
; GCN-NEXT: s_delay_alu instid0(VALU_DEP_2)
; GCN-NEXT: v_cndmask_b32_e32 v0, 16, v2, vcc_lo
; GCN-NEXT: v_mov_b32_e32 v2, 0
; GCN-NEXT: scratch_store_b32 v0, v1, off
; GCN-NEXT: scratch_load_b32 v0, off, off
; GCN-NEXT: v_mov_b32_e32 v1, 1.0
Expand Down
15 changes: 6 additions & 9 deletions llvm/test/CodeGen/AMDGPU/sibling-call.ll
Original file line number Diff line number Diff line change
Expand Up @@ -388,11 +388,7 @@ define fastcc i32 @no_sibling_call_callee_more_stack_space(i32 %a, i32 %b) #1 {
; GCN-NEXT: s_add_u32 s4, s4, i32_fastcc_i32_i32_a32i32@gotpcrel32@lo+4
; GCN-NEXT: s_addc_u32 s5, s5, i32_fastcc_i32_i32_a32i32@gotpcrel32@hi+12
; GCN-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
; GCN-NEXT: v_mov_b32_e32 v2, 0
; GCN-NEXT: v_writelane_b32 v40, s30, 0
; GCN-NEXT: buffer_store_dword v2, off, s[0:3], s32
; GCN-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:4
; GCN-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:8
; GCN-NEXT: v_mov_b32_e32 v2, 0
; GCN-NEXT: v_mov_b32_e32 v3, 0
; GCN-NEXT: v_mov_b32_e32 v4, 0
Expand Down Expand Up @@ -423,6 +419,9 @@ define fastcc i32 @no_sibling_call_callee_more_stack_space(i32 %a, i32 %b) #1 {
; GCN-NEXT: v_mov_b32_e32 v29, 0
; GCN-NEXT: v_mov_b32_e32 v30, 0
; GCN-NEXT: v_writelane_b32 v40, s31, 1
; GCN-NEXT: buffer_store_dword v2, off, s[0:3], s32
; GCN-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:4
; GCN-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:8
; GCN-NEXT: s_waitcnt lgkmcnt(0)
; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]
; GCN-NEXT: v_readlane_b32 s31, v40, 1
Expand Down Expand Up @@ -528,10 +527,6 @@ define fastcc i32 @sibling_call_stack_objecti32_fastcc_i32_i32_a32i32_larger_arg
; GCN-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:48
; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: v_mov_b32_e32 v2, 0
; GCN-NEXT: buffer_store_dword v2, off, s[0:3], s32
; GCN-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:4
; GCN-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:8
; GCN-NEXT: v_mov_b32_e32 v2, 0
; GCN-NEXT: v_mov_b32_e32 v3, 0
; GCN-NEXT: v_mov_b32_e32 v4, 0
; GCN-NEXT: v_mov_b32_e32 v5, 0
Expand Down Expand Up @@ -560,6 +555,9 @@ define fastcc i32 @sibling_call_stack_objecti32_fastcc_i32_i32_a32i32_larger_arg
; GCN-NEXT: v_mov_b32_e32 v28, 0
; GCN-NEXT: v_mov_b32_e32 v29, 0
; GCN-NEXT: v_mov_b32_e32 v30, 0
; GCN-NEXT: buffer_store_dword v2, off, s[0:3], s32
; GCN-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:4
; GCN-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:8
; GCN-NEXT: s_waitcnt lgkmcnt(0)
; GCN-NEXT: s_setpc_b64 s[4:5]
entry:
Expand Down Expand Up @@ -928,7 +926,6 @@ define fastcc void @sibling_call_byval_and_stack_passed(i32 %stack.out.arg, [64
; GCN-NEXT: s_add_u32 s16, s16, void_fastcc_byval_and_stack_passed@rel32@lo+4
; GCN-NEXT: s_addc_u32 s17, s17, void_fastcc_byval_and_stack_passed@rel32@hi+12
; GCN-NEXT: v_mov_b32_e32 v0, 0
; GCN-NEXT: v_mov_b32_e32 v1, 0
; GCN-NEXT: v_mov_b32_e32 v2, 0
; GCN-NEXT: v_mov_b32_e32 v3, 0
; GCN-NEXT: v_mov_b32_e32 v4, 0
Expand Down
8 changes: 0 additions & 8 deletions llvm/test/CodeGen/AMDGPU/spill-scavenge-offset.ll
Original file line number Diff line number Diff line change
Expand Up @@ -9971,7 +9971,6 @@ define amdgpu_kernel void @test_limited_sgpr(ptr addrspace(1) %out, ptr addrspac
; GFX6-NEXT: buffer_load_dword v4, off, s[40:43], 0
; GFX6-NEXT: s_waitcnt vmcnt(0)
; GFX6-NEXT: s_mov_b64 exec, s[6:7]
; GFX6-NEXT: s_mov_b64 s[6:7], exec
; GFX6-NEXT: s_mov_b64 exec, 0xff
; GFX6-NEXT: buffer_store_dword v4, off, s[40:43], 0
; GFX6-NEXT: s_mov_b32 s34, 0x80c00
Expand All @@ -9989,7 +9988,6 @@ define amdgpu_kernel void @test_limited_sgpr(ptr addrspace(1) %out, ptr addrspac
; GFX6-NEXT: buffer_load_dword v4, off, s[40:43], 0
; GFX6-NEXT: s_waitcnt vmcnt(0)
; GFX6-NEXT: s_mov_b64 exec, s[6:7]
; GFX6-NEXT: s_mov_b64 s[6:7], exec
; GFX6-NEXT: s_mov_b64 exec, 0xff
; GFX6-NEXT: buffer_store_dword v4, off, s[40:43], 0
; GFX6-NEXT: s_waitcnt expcnt(0)
Expand All @@ -10007,7 +10005,6 @@ define amdgpu_kernel void @test_limited_sgpr(ptr addrspace(1) %out, ptr addrspac
; GFX6-NEXT: buffer_load_dword v4, off, s[40:43], 0
; GFX6-NEXT: s_waitcnt vmcnt(0)
; GFX6-NEXT: s_mov_b64 exec, s[6:7]
; GFX6-NEXT: s_mov_b64 s[6:7], exec
; GFX6-NEXT: s_mov_b64 exec, 0xff
; GFX6-NEXT: buffer_store_dword v4, off, s[40:43], 0
; GFX6-NEXT: s_mov_b32 s34, 0x81400
Expand All @@ -10025,7 +10022,6 @@ define amdgpu_kernel void @test_limited_sgpr(ptr addrspace(1) %out, ptr addrspac
; GFX6-NEXT: buffer_load_dword v4, off, s[40:43], 0
; GFX6-NEXT: s_waitcnt vmcnt(0)
; GFX6-NEXT: s_mov_b64 exec, s[6:7]
; GFX6-NEXT: s_mov_b64 s[6:7], exec
; GFX6-NEXT: s_mov_b64 exec, 0xff
; GFX6-NEXT: buffer_store_dword v4, off, s[40:43], 0
; GFX6-NEXT: s_waitcnt expcnt(0)
Expand All @@ -10043,7 +10039,6 @@ define amdgpu_kernel void @test_limited_sgpr(ptr addrspace(1) %out, ptr addrspac
; GFX6-NEXT: buffer_load_dword v4, off, s[40:43], 0
; GFX6-NEXT: s_waitcnt vmcnt(0)
; GFX6-NEXT: s_mov_b64 exec, s[6:7]
; GFX6-NEXT: s_mov_b64 s[6:7], exec
; GFX6-NEXT: s_mov_b64 exec, 0xff
; GFX6-NEXT: buffer_store_dword v4, off, s[40:43], 0
; GFX6-NEXT: s_mov_b32 s34, 0x81c00
Expand All @@ -10061,7 +10056,6 @@ define amdgpu_kernel void @test_limited_sgpr(ptr addrspace(1) %out, ptr addrspac
; GFX6-NEXT: buffer_load_dword v4, off, s[40:43], 0
; GFX6-NEXT: s_waitcnt vmcnt(0)
; GFX6-NEXT: s_mov_b64 exec, s[6:7]
; GFX6-NEXT: s_mov_b64 s[6:7], exec
; GFX6-NEXT: s_mov_b64 exec, 15
; GFX6-NEXT: buffer_store_dword v4, off, s[40:43], 0
; GFX6-NEXT: s_waitcnt expcnt(0)
Expand Down Expand Up @@ -10105,7 +10099,6 @@ define amdgpu_kernel void @test_limited_sgpr(ptr addrspace(1) %out, ptr addrspac
; GFX6-NEXT: buffer_load_dword v4, off, s[40:43], 0
; GFX6-NEXT: s_waitcnt vmcnt(0)
; GFX6-NEXT: s_mov_b64 exec, s[34:35]
; GFX6-NEXT: s_mov_b64 s[34:35], exec
; GFX6-NEXT: s_mov_b64 exec, 15
; GFX6-NEXT: buffer_store_dword v4, off, s[40:43], 0
; GFX6-NEXT: s_mov_b32 s44, 0x82c00
Expand Down Expand Up @@ -10165,7 +10158,6 @@ define amdgpu_kernel void @test_limited_sgpr(ptr addrspace(1) %out, ptr addrspac
; GFX6-NEXT: s_waitcnt vmcnt(0)
; GFX6-NEXT: s_mov_b64 exec, s[4:5]
; GFX6-NEXT: s_mov_b64 s[36:37], s[0:1]
; GFX6-NEXT: s_mov_b64 s[4:5], exec
; GFX6-NEXT: s_mov_b64 exec, 15
; GFX6-NEXT: buffer_store_dword v4, off, s[40:43], 0
; GFX6-NEXT: s_mov_b32 s6, 0x80800
Expand Down
1 change: 0 additions & 1 deletion llvm/test/CodeGen/X86/2007-11-30-LoadFolding-Bug.ll
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,6 @@ define fastcc void @mp_sqrt(i32 %n, i32 %radix, ptr %in, ptr %out, ptr %tmp1, pt
; CHECK-NEXT: andl $1, %ebp
; CHECK-NEXT: xorpd %xmm0, %xmm0
; CHECK-NEXT: xorl %eax, %eax
; CHECK-NEXT: xorl %ecx, %ecx
; CHECK-NEXT: xorpd %xmm1, %xmm1
; CHECK-NEXT: .p2align 4
; CHECK-NEXT: .LBB0_7: # %bb.i28.i
Expand Down
2 changes: 0 additions & 2 deletions llvm/test/CodeGen/X86/AMX/amx-ldtilecfg-insert.ll
Original file line number Diff line number Diff line change
Expand Up @@ -174,7 +174,6 @@ define dso_local void @test4(i16 signext %0, i16 signext %1) nounwind {
; CHECK-NEXT: incl %edi
; CHECK-NEXT: movb %dil, -{{[0-9]+}}(%rsp)
; CHECK-NEXT: ldtilecfg -{{[0-9]+}}(%rsp)
; CHECK-NEXT: xorl %eax, %eax
; CHECK-NEXT: testb %al, %al
; CHECK-NEXT: jne .LBB3_4
; CHECK-NEXT: .LBB3_2: # %amx2
Expand All @@ -190,7 +189,6 @@ define dso_local void @test4(i16 signext %0, i16 signext %1) nounwind {
; CHECK-NEXT: decl %edi
; CHECK-NEXT: movb %dil, -{{[0-9]+}}(%rsp)
; CHECK-NEXT: ldtilecfg -{{[0-9]+}}(%rsp)
; CHECK-NEXT: xorl %eax, %eax
; CHECK-NEXT: testb %al, %al
; CHECK-NEXT: jne .LBB3_2
; CHECK-NEXT: .LBB3_4: # %amx1
Expand Down
1 change: 0 additions & 1 deletion llvm/test/CodeGen/X86/avx-load-store.ll
Original file line number Diff line number Diff line change
Expand Up @@ -216,7 +216,6 @@ define void @f_f() nounwind {
; CHECK-NEXT: jne .LBB9_2
; CHECK-NEXT: # %bb.1: # %cif_mask_all
; CHECK-NEXT: .LBB9_2: # %cif_mask_mixed
; CHECK-NEXT: xorl %eax, %eax
; CHECK-NEXT: testb %al, %al
; CHECK-NEXT: jne .LBB9_4
; CHECK-NEXT: # %bb.3: # %cif_mixed_test_all
Expand Down
14 changes: 7 additions & 7 deletions llvm/test/CodeGen/X86/avx512-i1test.ll
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,13 @@ define void @func() {
; CHECK-NEXT: retq
; CHECK-NEXT: .p2align 4
; CHECK-NEXT: .LBB0_1: # %bb33
; CHECK-NEXT: # =>This Inner Loop Header: Depth=1
; CHECK-NEXT: testb %al, %al
; CHECK-NEXT: jne .LBB0_1
; CHECK-NEXT: # %bb.2: # %bb35
; CHECK-NEXT: # in Loop: Header=BB0_1 Depth=1
; CHECK-NEXT: testb %al, %al
; CHECK-NEXT: jmp .LBB0_1
; CHECK-NEXT: # =>This Inner Loop Header: Depth=1
; CHECK-NEXT: testb %al, %al
; CHECK-NEXT: jne .LBB0_1
; CHECK-NEXT: # %bb.2: # %bb35
; CHECK-NEXT: # in Loop: Header=BB0_1 Depth=1
; CHECK-NEXT: testb %al, %al
; CHECK-NEXT: jmp .LBB0_1
bb1:
br i1 poison, label %L_10, label %L_10

Expand Down
Loading