Skip to content

Conversation

lxq015
Copy link
Collaborator

@lxq015 lxq015 commented Oct 9, 2025

Add compiler support for Zba entensions, which are mandatory in the
rva22u64 profile. These can be used to accelerate address computation.

MouseSplinter and others added 3 commits September 30, 2025 16:51
Add compiler support for Zba entensions, which are mandatory in the
rva22u64 profile. These can be used to accelerate address computation.

Change-Id: Ie58eb5c899612f58dc2ccd0fe75a39ce957c6e3c
@lxq015 lxq015 marked this pull request as draft October 9, 2025 12:05
@lxq015 lxq015 marked this pull request as ready for review October 10, 2025 07:49
Comment on lines 856 to 866

// Fold shift-and-add with MOVWUreg (commutative cases)
(ADD x (MOVWUreg y)) && buildcfg.GORISCV64 >= 22 => (ADDUW y x)
(SH1ADD x (MOVWUreg y)) && buildcfg.GORISCV64 >= 22 => (SH1ADDUW y x)
(SH2ADD x (MOVWUreg y)) && buildcfg.GORISCV64 >= 22 => (SH2ADDUW y x)
(SH3ADD x (MOVWUreg y)) && buildcfg.GORISCV64 >= 22 => (SH3ADDUW y x)
(SH1ADD (MOVWUreg x) y) && buildcfg.GORISCV64 >= 22 => (SH1ADDUW x y)
(SH2ADD (MOVWUreg x) y) && buildcfg.GORISCV64 >= 22 => (SH2ADDUW x y)
(SH3ADD (MOVWUreg x) y) && buildcfg.GORISCV64 >= 22 => (SH3ADDUW x y)

// Integer minimum and maximum.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I encountered an issue while testing the Zba instruction set support locally (https://go-review.googlesource.com/c/go/+/580276):

The historical WIP pipeline passed successfully, and Joel Sing previously merged three of the instructions. However, when I tried to merge the remaining Zba instructions, I found test failures due to differences in actual generated IR/instruction patterns that prevented optimization rules from triggering:

                0x0000 00000 (/home/go/src/zte-riscv/go/test/codegen/shift.go:603)     FUNCDATA     $6, command-line-arguments.checkLeftShiftLowerUnsigned32Bits.argliveinfo(SB)
                0x0000 00000 (/home/go/src/zte-riscv/go/test/codegen/shift.go:603)     PCDATA  $3, $1
                0x0000 00000 (/home/go/src/zte-riscv/go/test/codegen/shift.go:605)     SLLI    $32, X10, X5
                0x0004 00004 (/home/go/src/zte-riscv/go/test/codegen/shift.go:605)     SRLI    $26, X5, X10
                0x0008 00008 (/home/go/src/zte-riscv/go/test/codegen/shift.go:605)     JALR    X0, X1
                0x0000 93 12 05 02 13 d5 a2 01 67 80 00 00              ........g...
    testdir_test.go:147: 
        codegen/shift.go:595: linux/riscv64/rva22u64: opcode not found: "^SH1ADDUW"
        codegen/shift.go:597: linux/riscv64/rva22u64: opcode not found: "^SH2ADDUW"
        codegen/shift.go:599: linux/riscv64/rva22u64: opcode not found: "^SH3ADDUW"
        codegen/shift.go:605: linux/riscv64/rva22u64: opcode not found: "^SLLIUW"

1. For addition folding test case:

func checkAddWithLeftShiftLowerUnsigned32Bits(a uint64, b int64) uint64 {
    // riscv64/rva22u64: "SH1ADDUW"
    x := a + uint64(uint32(b))<<1
    // riscv64/rva22u64: "SH2ADDUW" 
    y := a + uint64(uint32(b))<<2
    // riscv64/rva22u64: "SH3ADDUW"
    z := a + uint64(uint32(b))<<3
    return x + y + z
}

Rules like (ADD (SLLI [k] x) y) => (SHkADD x y) were triggered first, producing SHkADD instead of the expected SHkADDUW.

2. For shift folding test case:

func checkLeftShiftLowerUnsigned32Bits(a int64) uint64 {
    // riscv64/rva22u64: "SLLIUW"
    return uint64(uint32(a)) << 6
}

Zero-extension followed by left shift was compiled into paired shifts SLLI $32; SRLI $(32-k) to implement uint64(uint32(a)) << k. The existing rule (SLLI [c] (MOVWUreg x)) && c <= 32 => (SRLI [32-c] (SLLI <typ.UInt64> [32] x)) took precedence, so SLLIUW was not generated.

Solutions implemented:

  • Modified RISCV64latelower.rules:

    - (SLLI [c] (MOVWUreg x)) && c <= 32 => (SRLI [32-c] (SLLI <typ.UInt64> [32] x))
    + (SLLI [c] (MOVWUreg x)) && c <= 32 && buildcfg.GORISCV64 < 22 => (SRLI [32-c] (SLLI <typ.UInt64> [32] x))
  • Added rules to RISCV64.rules:

    // Fold shift-and-add with MOVWUreg (commutative cases)
    (ADD x (MOVWUreg y)) && buildcfg.GORISCV64 >= 22 => (ADDUW y x)
    (SH1ADD x (MOVWUreg y)) && buildcfg.GORISCV64 >= 22 => (SH1ADDUW y x)
    (SH2ADD x (MOVWUreg y)) && buildcfg.GORISCV64 >= 22 => (SH2ADDUW y x)
    (SH3ADD x (MOVWUreg y)) && buildcfg.GORISCV64 >= 22 => (SH3ADDUW y x)
    (SH1ADD (MOVWUreg x) y) && buildcfg.GORISCV64 >= 22 => (SH1ADDUW x y)
    (SH2ADD (MOVWUreg x) y) && buildcfg.GORISCV64 >= 22 => (SH2ADDUW x y)
    (SH3ADD (MOVWUreg x) y) && buildcfg.GORISCV64 >= 22 => (SH3ADDUW x y)

However, I'm puzzled why the earlier WIP pipeline passed — did I miss something in my testing approach?

Copy link
Collaborator

@ctk-1998 ctk-1998 Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are some issues with the original code.
In the switch statement, OpRISCV64ADD matches before OpRISCV64SLL.
So in the case like "(ADD (SLLIUW [1] x) y) && buildcfg.GORISCV64 >= 22 => (SH1ADDUW x y)", I think it will not rewrite the SLLI to SLLITUW but match the case "(ADD (SLLI [1] x) y) && buildcfg.GORISCV64 >= 22 => (SH1ADD x y)" and get an incorrect result with "SH1ADD" not "SH1ADDUW".
In my perspective, we should put part of "// Combine left shift and addition." into the file RISCV64latelower.rules.
I am commenting on the original submission, not this submission.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants