-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[AArch64] Merge scaled and unscaled narrow zero stores #136705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-aarch64 Author: Guy David (guy-david) ChangesFull diff: https://github.com/llvm/llvm-project/pull/136705.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp b/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
index 7c47492cf1a8e..57540eeb5ef92 100644
--- a/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
+++ b/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
@@ -892,11 +892,10 @@ AArch64LoadStoreOpt::mergeNarrowZeroStores(MachineBasicBlock::iterator I,
OffsetImm = IOffsetInBytes;
int NewOpcode = getMatchingWideOpcode(Opc);
- bool FinalIsScaled = !TII->hasUnscaledLdStOffset(NewOpcode);
-
- // Adjust final offset if the result opcode is a scaled store.
- if (FinalIsScaled) {
- int NewOffsetStride = FinalIsScaled ? TII->getMemScale(NewOpcode) : 1;
+ // Adjust final offset on scaled stores because the new instruction
+ // has a different scale.
+ if (!TII->hasUnscaledLdStOffset(NewOpcode)) {
+ int NewOffsetStride = TII->getMemScale(NewOpcode);
assert(((OffsetImm % NewOffsetStride) == 0) &&
"Offset should be a multiple of the store memory scale");
OffsetImm = OffsetImm / NewOffsetStride;
@@ -906,7 +905,7 @@ AArch64LoadStoreOpt::mergeNarrowZeroStores(MachineBasicBlock::iterator I,
DebugLoc DL = I->getDebugLoc();
MachineBasicBlock *MBB = I->getParent();
MachineInstrBuilder MIB;
- MIB = BuildMI(*MBB, InsertionPoint, DL, TII->get(getMatchingWideOpcode(Opc)))
+ MIB = BuildMI(*MBB, InsertionPoint, DL, TII->get(NewOpcode))
.addReg(isNarrowStore(Opc) ? AArch64::WZR : AArch64::XZR)
.add(BaseRegOp)
.addImm(OffsetImm)
@@ -1539,10 +1538,12 @@ static bool areCandidatesToMergeOrPair(MachineInstr &FirstMI, MachineInstr &MI,
if (!PairIsValidLdStrOpc)
return false;
- // FIXME: We don't support merging narrow stores with mixed scaled/unscaled
- // offsets.
+ // Narrow stores do not have pair-wise opcodes, so constraint their merging
+ // iff they are zero stores.
if (isNarrowStore(OpcA) || isNarrowStore(OpcB))
- return false;
+ return getLdStRegOp(FirstMI).getReg() == AArch64::WZR &&
+ getLdStRegOp(MI).getReg() == AArch64::WZR &&
+ TII->getMemScale(FirstMI) == TII->getMemScale(MI);
// The STR<S,D,Q,W,X>pre - STR<S,D,Q,W,X>ui and
// LDR<S,D,Q,W,X,SW>pre-LDR<S,D,Q,W,X,SW>ui
diff --git a/llvm/test/CodeGen/AArch64/str-narrow-zero-merge.mir b/llvm/test/CodeGen/AArch64/str-narrow-zero-merge.mir
index e995c402c50a8..8ba7c1bcf305f 100644
--- a/llvm/test/CodeGen/AArch64/str-narrow-zero-merge.mir
+++ b/llvm/test/CodeGen/AArch64/str-narrow-zero-merge.mir
@@ -29,8 +29,7 @@ name: merge_scaled_str_with_unscaled_8
body: |
bb.0.entry:
; CHECK-LABEL: name: merge_scaled_str_with_unscaled_8
- ; CHECK: STRBBui $wzr, $x0, 4 :: (store (s8))
- ; CHECK-NEXT: STURBBi $wzr, $x0, 5 :: (store (s8))
+ ; CHECK: STRHHui $wzr, $x0, 2 :: (store (s8))
; CHECK-NEXT: RET undef $lr
STRBBui $wzr, $x0, 4 :: (store (s8))
STURBBi $wzr, $x0, 5 :: (store (s8))
@@ -41,8 +40,7 @@ name: merge_unscaled_str_with_scaled_8
body: |
bb.0.entry:
; CHECK-LABEL: name: merge_unscaled_str_with_scaled_8
- ; CHECK: STURBBi $wzr, $x0, 4 :: (store (s8))
- ; CHECK-NEXT: STRBBui $wzr, $x0, 5 :: (store (s8))
+ ; CHECK: STURHHi $wzr, $x0, 4 :: (store (s8))
; CHECK-NEXT: RET undef $lr
STURBBi $wzr, $x0, 4 :: (store (s8))
STRBBui $wzr, $x0, 5 :: (store (s8))
@@ -75,8 +73,7 @@ name: merge_scaled_str_with_unscaled_16
body: |
bb.0.entry:
; CHECK-LABEL: name: merge_scaled_str_with_unscaled_16
- ; CHECK: STRHHui $wzr, $x0, 2 :: (store (s16))
- ; CHECK-NEXT: STURHHi $wzr, $x0, 6 :: (store (s16))
+ ; CHECK: STRWui $wzr, $x0, 1 :: (store (s16))
; CHECK-NEXT: RET undef $lr
STRHHui $wzr, $x0, 2 :: (store (s16))
STURHHi $wzr, $x0, 6 :: (store (s16))
@@ -87,8 +84,7 @@ name: merge_unscaled_str_with_scaled_16
body: |
bb.0.entry:
; CHECK-LABEL: name: merge_unscaled_str_with_scaled_16
- ; CHECK: STURHHi $wzr, $x0, 4 :: (store (s16))
- ; CHECK-NEXT: STRHHui $wzr, $x0, 3 :: (store (s16))
+ ; CHECK: STURWi $wzr, $x0, 4 :: (store (s16))
; CHECK-NEXT: RET undef $lr
STURHHi $wzr, $x0, 4 :: (store (s16))
STRHHui $wzr, $x0, 3 :: (store (s16))
|
95377b3
to
7ccf02a
Compare
3e8eb9d
to
d674cbe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this come up often, or has ISel usually combined them previously? It seems OK so long as the wzr stores are treated OK by the rest of the code.
With recent LLVM, I no longer see any cases of this happening on a very large bitcode repository, so it's possible ISel combines the stores and nothing in the codegen creates this pattern. Nonetheless, I think it's good to have it. |
Yeah it comes up but quite rarely. Can you add some extra test cases for edge conditions? Otherwise LGTM. |
d674cbe
to
960bc5f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some test cases for the limits of address ranges? I think it might be 4095 for scaled loads and 255 for unscaled? Otherwise LGTM thanks.
Can you explain a bit more? Doesn't the new offset always end up being less than or equal to the offset in the first instruction? |
Yeah I just wanted to make sure there were not any problems - especially with unscaled and scaled representing the offsets differently with negative offsets and whatnot. We've had them go wrong in the past. This patch doesn't touch them directly but so I think they should be OK, but it's good to make sure they are tested around the extremes. |
960bc5f
to
378d212
Compare
* main: (420 commits) [AArch64] Merge scaled and unscaled narrow zero stores (llvm#136705) [RISCV] One last migration to getInsertSubvector [nfc] [flang][OpenMP] Update `do concurrent` mapping pass to use `fir.do_concurrent` op (llvm#138489) [MLIR][LLVM] Fix llvm.mlir.global mismatching print and parser order (llvm#138986) [lld][NFC] Fix minor typo in docs (llvm#138898) [RISCV] Migrate getConstant indexed insert/extract subvector to new API (llvm#139111) GlobalISel: Translate minimumnum and maximumnum (llvm#139106) [MemProf] Simplify unittest save and restore of options (llvm#139117) [BOLT][AArch64] Patch functions targeted by optional relocs (llvm#138750) [Coverage] Support -fprofile-list for cold function coverage (llvm#136333) Remove unused forward decl (llvm#139108) [AMDGPU][NFC] Get rid of OPW constants. (llvm#139074) [CIR] Upstream extract op for VectorType (llvm#138413) [mlir][xegpu] Handle scalar uniform ops in SIMT distribution. (llvm#138593) [GlobalISel][AMDGPU] Fix handling of v2i128 type for AND, OR, XOR (llvm#138574) AMDGPU][True16][CodeGen] FP_Round f64 to f16 in true16 (llvm#128911) Reland [Clang] Deprecate `__is_trivially_relocatable` (llvm#139061) [HLSL][NFC] Stricter Overload Tests (clamp,max,min,pow) (llvm#138993) [MLIR] Fixing the memref linearization size computation for non-packed memref (llvm#138922) [TableGen][NFC] Use early exit to simplify large block in emitAction. (llvm#138220) ...
No description provided.