[AArch64] Merge scaled and unscaled narrow zero stores #136705

guy-david · 2025-04-22T14:45:43Z

No description provided.

llvmbot · 2025-04-22T14:46:15Z

@llvm/pr-subscribers-backend-aarch64

Author: Guy David (guy-david)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/136705.diff

2 Files Affected:

(modified) llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp (+10-9)
(modified) llvm/test/CodeGen/AArch64/str-narrow-zero-merge.mir (+4-8)

diff --git a/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp b/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
index 7c47492cf1a8e..57540eeb5ef92 100644
--- a/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
+++ b/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
@@ -892,11 +892,10 @@ AArch64LoadStoreOpt::mergeNarrowZeroStores(MachineBasicBlock::iterator I,
     OffsetImm = IOffsetInBytes;
 
   int NewOpcode = getMatchingWideOpcode(Opc);
-  bool FinalIsScaled = !TII->hasUnscaledLdStOffset(NewOpcode);
-
-  // Adjust final offset if the result opcode is a scaled store.
-  if (FinalIsScaled) {
-    int NewOffsetStride = FinalIsScaled ? TII->getMemScale(NewOpcode) : 1;
+  // Adjust final offset on scaled stores because the new instruction
+  // has a different scale.
+  if (!TII->hasUnscaledLdStOffset(NewOpcode)) {
+    int NewOffsetStride = TII->getMemScale(NewOpcode);
     assert(((OffsetImm % NewOffsetStride) == 0) &&
            "Offset should be a multiple of the store memory scale");
     OffsetImm = OffsetImm / NewOffsetStride;
@@ -906,7 +905,7 @@ AArch64LoadStoreOpt::mergeNarrowZeroStores(MachineBasicBlock::iterator I,
   DebugLoc DL = I->getDebugLoc();
   MachineBasicBlock *MBB = I->getParent();
   MachineInstrBuilder MIB;
-  MIB = BuildMI(*MBB, InsertionPoint, DL, TII->get(getMatchingWideOpcode(Opc)))
+  MIB = BuildMI(*MBB, InsertionPoint, DL, TII->get(NewOpcode))
             .addReg(isNarrowStore(Opc) ? AArch64::WZR : AArch64::XZR)
             .add(BaseRegOp)
             .addImm(OffsetImm)
@@ -1539,10 +1538,12 @@ static bool areCandidatesToMergeOrPair(MachineInstr &FirstMI, MachineInstr &MI,
   if (!PairIsValidLdStrOpc)
     return false;
 
-  // FIXME: We don't support merging narrow stores with mixed scaled/unscaled
-  // offsets.
+  // Narrow stores do not have pair-wise opcodes, so constraint their merging
+  // iff they are zero stores.
   if (isNarrowStore(OpcA) || isNarrowStore(OpcB))
-    return false;
+    return getLdStRegOp(FirstMI).getReg() == AArch64::WZR &&
+           getLdStRegOp(MI).getReg() == AArch64::WZR &&
+           TII->getMemScale(FirstMI) == TII->getMemScale(MI);
 
   // The STR<S,D,Q,W,X>pre - STR<S,D,Q,W,X>ui and
   // LDR<S,D,Q,W,X,SW>pre-LDR<S,D,Q,W,X,SW>ui
diff --git a/llvm/test/CodeGen/AArch64/str-narrow-zero-merge.mir b/llvm/test/CodeGen/AArch64/str-narrow-zero-merge.mir
index e995c402c50a8..8ba7c1bcf305f 100644
--- a/llvm/test/CodeGen/AArch64/str-narrow-zero-merge.mir
+++ b/llvm/test/CodeGen/AArch64/str-narrow-zero-merge.mir
@@ -29,8 +29,7 @@ name:  merge_scaled_str_with_unscaled_8
 body:             |
   bb.0.entry:
     ; CHECK-LABEL: name: merge_scaled_str_with_unscaled_8
-    ; CHECK: STRBBui $wzr, $x0, 4 :: (store (s8))
-    ; CHECK-NEXT: STURBBi $wzr, $x0, 5 :: (store (s8))
+    ; CHECK: STRHHui $wzr, $x0, 2 :: (store (s8))
     ; CHECK-NEXT: RET undef $lr
     STRBBui $wzr, $x0, 4 :: (store (s8))
     STURBBi $wzr, $x0, 5 :: (store (s8))
@@ -41,8 +40,7 @@ name:  merge_unscaled_str_with_scaled_8
 body:             |
   bb.0.entry:
     ; CHECK-LABEL: name: merge_unscaled_str_with_scaled_8
-    ; CHECK: STURBBi $wzr, $x0, 4 :: (store (s8))
-    ; CHECK-NEXT: STRBBui $wzr, $x0, 5 :: (store (s8))
+    ; CHECK: STURHHi $wzr, $x0, 4 :: (store (s8))
     ; CHECK-NEXT: RET undef $lr
     STURBBi $wzr, $x0, 4 :: (store (s8))
     STRBBui $wzr, $x0, 5 :: (store (s8))
@@ -75,8 +73,7 @@ name:  merge_scaled_str_with_unscaled_16
 body:             |
   bb.0.entry:
     ; CHECK-LABEL: name: merge_scaled_str_with_unscaled_16
-    ; CHECK: STRHHui $wzr, $x0, 2 :: (store (s16))
-    ; CHECK-NEXT: STURHHi $wzr, $x0, 6 :: (store (s16))
+    ; CHECK: STRWui $wzr, $x0, 1 :: (store (s16))
     ; CHECK-NEXT: RET undef $lr
     STRHHui $wzr, $x0, 2 :: (store (s16))
     STURHHi $wzr, $x0, 6 :: (store (s16))
@@ -87,8 +84,7 @@ name:  merge_unscaled_str_with_scaled_16
 body:             |
   bb.0.entry:
     ; CHECK-LABEL: name: merge_unscaled_str_with_scaled_16
-    ; CHECK: STURHHi $wzr, $x0, 4 :: (store (s16))
-    ; CHECK-NEXT: STRHHui $wzr, $x0, 3 :: (store (s16))
+    ; CHECK: STURWi $wzr, $x0, 4 :: (store (s16))
     ; CHECK-NEXT: RET undef $lr
     STURHHi $wzr, $x0, 4 :: (store (s16))
     STRHHui $wzr, $x0, 3 :: (store (s16))

llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp

davemgreen

Does this come up often, or has ISel usually combined them previously? It seems OK so long as the wzr stores are treated OK by the rest of the code.

guy-david · 2025-05-06T14:57:15Z

With recent LLVM, I no longer see any cases of this happening on a very large bitcode repository, so it's possible ISel combines the stores and nothing in the codegen creates this pattern. Nonetheless, I think it's good to have it.

davemgreen · 2025-05-07T07:28:20Z

Yeah it comes up but quite rarely. Can you add some extra test cases for edge conditions? Otherwise LGTM.

davemgreen

Can you add some test cases for the limits of address ranges? I think it might be 4095 for scaled loads and 255 for unscaled? Otherwise LGTM thanks.

guy-david · 2025-05-07T15:39:08Z

Can you explain a bit more? Doesn't the new offset always end up being less than or equal to the offset in the first instruction?

davemgreen · 2025-05-07T16:14:43Z

Can you explain a bit more? Doesn't the new offset always end up being less than or equal to the offset in the first instruction?

Yeah I just wanted to make sure there were not any problems - especially with unscaled and scaled representing the offsets differently with negative offsets and whatnot. We've had them go wrong in the past. This patch doesn't touch them directly but so I think they should be OK, but it's good to make sure they are tested around the extremes.

* main: (420 commits) [AArch64] Merge scaled and unscaled narrow zero stores (llvm#136705) [RISCV] One last migration to getInsertSubvector [nfc] [flang][OpenMP] Update `do concurrent` mapping pass to use `fir.do_concurrent` op (llvm#138489) [MLIR][LLVM] Fix llvm.mlir.global mismatching print and parser order (llvm#138986) [lld][NFC] Fix minor typo in docs (llvm#138898) [RISCV] Migrate getConstant indexed insert/extract subvector to new API (llvm#139111) GlobalISel: Translate minimumnum and maximumnum (llvm#139106) [MemProf] Simplify unittest save and restore of options (llvm#139117) [BOLT][AArch64] Patch functions targeted by optional relocs (llvm#138750) [Coverage] Support -fprofile-list for cold function coverage (llvm#136333) Remove unused forward decl (llvm#139108) [AMDGPU][NFC] Get rid of OPW constants. (llvm#139074) [CIR] Upstream extract op for VectorType (llvm#138413) [mlir][xegpu] Handle scalar uniform ops in SIMT distribution. (llvm#138593) [GlobalISel][AMDGPU] Fix handling of v2i128 type for AND, OR, XOR (llvm#138574) AMDGPU][True16][CodeGen] FP_Round f64 to f16 in true16 (llvm#128911) Reland [Clang] Deprecate `__is_trivially_relocatable` (llvm#139061) [HLSL][NFC] Stricter Overload Tests (clamp,max,min,pow) (llvm#138993) [MLIR] Fixing the memref linearization size computation for non-packed memref (llvm#138922) [TableGen][NFC] Use early exit to simplify large block in emitAction. (llvm#138220) ...

llvmbot added the backend:AArch64 label Apr 22, 2025

guy-david requested a review from zjaffal April 22, 2025 14:46

fhahn requested review from fhahn, jroelofs and davemgreen April 23, 2025 10:10

fhahn reviewed Apr 23, 2025

View reviewed changes

llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp Outdated Show resolved Hide resolved

llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp Outdated Show resolved Hide resolved

jroelofs reviewed Apr 23, 2025

View reviewed changes

llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp Outdated Show resolved Hide resolved

llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp Outdated Show resolved Hide resolved

llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp Outdated Show resolved Hide resolved

guy-david force-pushed the users/guy-david/aarch64-narrow-store branch 2 times, most recently from 95377b3 to 7ccf02a Compare April 23, 2025 19:43

guy-david requested a review from jroelofs April 23, 2025 19:43

guy-david force-pushed the users/guy-david/aarch64-narrow-store branch 2 times, most recently from 3e8eb9d to d674cbe Compare May 4, 2025 19:10

davemgreen reviewed May 5, 2025

View reviewed changes

guy-david force-pushed the users/guy-david/aarch64-narrow-store branch from d674cbe to 960bc5f Compare May 7, 2025 09:37

guy-david requested a review from davemgreen May 7, 2025 09:37

davemgreen approved these changes May 7, 2025

View reviewed changes

[AArch64] Support merging of narrow zero stores

378d212

guy-david force-pushed the users/guy-david/aarch64-narrow-store branch from 960bc5f to 378d212 Compare May 8, 2025 08:39

guy-david merged commit ae6e127 into main May 8, 2025
9 of 11 checks passed

guy-david deleted the users/guy-david/aarch64-narrow-store branch May 8, 2025 18:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AArch64] Merge scaled and unscaled narrow zero stores #136705

[AArch64] Merge scaled and unscaled narrow zero stores #136705

guy-david commented Apr 22, 2025

llvmbot commented Apr 22, 2025

davemgreen left a comment

guy-david commented May 6, 2025

davemgreen commented May 7, 2025

davemgreen left a comment

guy-david commented May 7, 2025

davemgreen commented May 7, 2025

[AArch64] Merge scaled and unscaled narrow zero stores #136705

[AArch64] Merge scaled and unscaled narrow zero stores #136705

Conversation

guy-david commented Apr 22, 2025

llvmbot commented Apr 22, 2025

davemgreen left a comment

Choose a reason for hiding this comment

guy-david commented May 6, 2025

davemgreen commented May 7, 2025

davemgreen left a comment

Choose a reason for hiding this comment

guy-david commented May 7, 2025

davemgreen commented May 7, 2025