Skip to content

[AMDGPU] Improve s_delay_alu insertion for instructions with multiple defs #163589

@jayfoad

Description

@jayfoad

See https://github.com/llvm/llvm-project/blob/main/llvm/test/CodeGen/AMDGPU/fcopysign.bf16.ll#L1233

The VOPD pair v_dual_mov_b32 v0, s2 :: v_dual_mov_b32 v1, s3 is treated like a single instruction that writes to both v0 and v1.

s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_2) says to wait first for the VOPD pair to complete before the use of v0, and then again for the VOPD pair to complete before the use of v1. The second part of this is redundant and potentially decreases code density.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions