-
Notifications
You must be signed in to change notification settings - Fork 0
cmd/compile: support Zba extensions in riscv64 compiler #48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: go1.25.1-zte-dev
Are you sure you want to change the base?
Conversation
Add compiler support for Zba entensions, which are mandatory in the rva22u64 profile. These can be used to accelerate address computation. Change-Id: Ie58eb5c899612f58dc2ccd0fe75a39ce957c6e3c
|
||
// Fold shift-and-add with MOVWUreg (commutative cases) | ||
(ADD x (MOVWUreg y)) && buildcfg.GORISCV64 >= 22 => (ADDUW y x) | ||
(SH1ADD x (MOVWUreg y)) && buildcfg.GORISCV64 >= 22 => (SH1ADDUW y x) | ||
(SH2ADD x (MOVWUreg y)) && buildcfg.GORISCV64 >= 22 => (SH2ADDUW y x) | ||
(SH3ADD x (MOVWUreg y)) && buildcfg.GORISCV64 >= 22 => (SH3ADDUW y x) | ||
(SH1ADD (MOVWUreg x) y) && buildcfg.GORISCV64 >= 22 => (SH1ADDUW x y) | ||
(SH2ADD (MOVWUreg x) y) && buildcfg.GORISCV64 >= 22 => (SH2ADDUW x y) | ||
(SH3ADD (MOVWUreg x) y) && buildcfg.GORISCV64 >= 22 => (SH3ADDUW x y) | ||
|
||
// Integer minimum and maximum. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I encountered an issue while testing the Zba instruction set support locally (https://go-review.googlesource.com/c/go/+/580276):
The historical WIP pipeline passed successfully, and Joel Sing previously merged three of the instructions. However, when I tried to merge the remaining Zba instructions, I found test failures due to differences in actual generated IR/instruction patterns that prevented optimization rules from triggering:
0x0000 00000 (/home/go/src/zte-riscv/go/test/codegen/shift.go:603) FUNCDATA $6, command-line-arguments.checkLeftShiftLowerUnsigned32Bits.argliveinfo(SB)
0x0000 00000 (/home/go/src/zte-riscv/go/test/codegen/shift.go:603) PCDATA $3, $1
0x0000 00000 (/home/go/src/zte-riscv/go/test/codegen/shift.go:605) SLLI $32, X10, X5
0x0004 00004 (/home/go/src/zte-riscv/go/test/codegen/shift.go:605) SRLI $26, X5, X10
0x0008 00008 (/home/go/src/zte-riscv/go/test/codegen/shift.go:605) JALR X0, X1
0x0000 93 12 05 02 13 d5 a2 01 67 80 00 00 ........g...
testdir_test.go:147:
codegen/shift.go:595: linux/riscv64/rva22u64: opcode not found: "^SH1ADDUW"
codegen/shift.go:597: linux/riscv64/rva22u64: opcode not found: "^SH2ADDUW"
codegen/shift.go:599: linux/riscv64/rva22u64: opcode not found: "^SH3ADDUW"
codegen/shift.go:605: linux/riscv64/rva22u64: opcode not found: "^SLLIUW"
1. For addition folding test case:
func checkAddWithLeftShiftLowerUnsigned32Bits(a uint64, b int64) uint64 {
// riscv64/rva22u64: "SH1ADDUW"
x := a + uint64(uint32(b))<<1
// riscv64/rva22u64: "SH2ADDUW"
y := a + uint64(uint32(b))<<2
// riscv64/rva22u64: "SH3ADDUW"
z := a + uint64(uint32(b))<<3
return x + y + z
}
Rules like (ADD (SLLI [k] x) y) => (SHkADD x y)
were triggered first, producing SHkADD
instead of the expected SHkADDUW
.
2. For shift folding test case:
func checkLeftShiftLowerUnsigned32Bits(a int64) uint64 {
// riscv64/rva22u64: "SLLIUW"
return uint64(uint32(a)) << 6
}
Zero-extension followed by left shift was compiled into paired shifts SLLI $32; SRLI $(32-k)
to implement uint64(uint32(a)) << k
. The existing rule (SLLI [c] (MOVWUreg x)) && c <= 32 => (SRLI [32-c] (SLLI <typ.UInt64> [32] x))
took precedence, so SLLIUW
was not generated.
Solutions implemented:
-
Modified
RISCV64latelower.rules
:- (SLLI [c] (MOVWUreg x)) && c <= 32 => (SRLI [32-c] (SLLI <typ.UInt64> [32] x)) + (SLLI [c] (MOVWUreg x)) && c <= 32 && buildcfg.GORISCV64 < 22 => (SRLI [32-c] (SLLI <typ.UInt64> [32] x))
-
Added rules to
RISCV64.rules
:// Fold shift-and-add with MOVWUreg (commutative cases) (ADD x (MOVWUreg y)) && buildcfg.GORISCV64 >= 22 => (ADDUW y x) (SH1ADD x (MOVWUreg y)) && buildcfg.GORISCV64 >= 22 => (SH1ADDUW y x) (SH2ADD x (MOVWUreg y)) && buildcfg.GORISCV64 >= 22 => (SH2ADDUW y x) (SH3ADD x (MOVWUreg y)) && buildcfg.GORISCV64 >= 22 => (SH3ADDUW y x) (SH1ADD (MOVWUreg x) y) && buildcfg.GORISCV64 >= 22 => (SH1ADDUW x y) (SH2ADD (MOVWUreg x) y) && buildcfg.GORISCV64 >= 22 => (SH2ADDUW x y) (SH3ADD (MOVWUreg x) y) && buildcfg.GORISCV64 >= 22 => (SH3ADDUW x y)
However, I'm puzzled why the earlier WIP pipeline passed — did I miss something in my testing approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are some issues with the original code.
In the switch statement, OpRISCV64ADD matches before OpRISCV64SLL.
So in the case like "(ADD (SLLIUW [1] x) y) && buildcfg.GORISCV64 >= 22 => (SH1ADDUW x y)", I think it will not rewrite the SLLI to SLLITUW but match the case "(ADD (SLLI [1] x) y) && buildcfg.GORISCV64 >= 22 => (SH1ADD x y)" and get an incorrect result with "SH1ADD" not "SH1ADDUW".
In my perspective, we should put part of "// Combine left shift and addition." into the file RISCV64latelower.rules.
I am commenting on the original submission, not this submission.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea! Thank you!
Add compiler support for Zba entensions, which are mandatory in the
rva22u64 profile. These can be used to accelerate address computation.