[AMDGPU] Improve s_delay_alu insertion for instructions with multiple defs

See https://github.com/llvm/llvm-project/blob/main/llvm/test/CodeGen/AMDGPU/fcopysign.bf16.ll#L1233

The VOPD pair `v_dual_mov_b32 v0, s2 :: v_dual_mov_b32 v1, s3` is treated like a single instruction that writes to both v0 and v1.

`s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_2)` says to wait first for the VOPD pair to complete before the use of v0, and then again for the VOPD pair to complete before the use of v1. The second part of this is redundant and potentially decreases code density.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMDGPU] Improve s_delay_alu insertion for instructions with multiple defs #163589

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[AMDGPU] Improve s_delay_alu insertion for instructions with multiple defs #163589

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions