Skip to content

[AArch64] Merge scaled and unscaled narrow zero stores #136705

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 8, 2025

Conversation

guy-david
Copy link
Contributor

No description provided.

@llvmbot
Copy link
Member

llvmbot commented Apr 22, 2025

@llvm/pr-subscribers-backend-aarch64

Author: Guy David (guy-david)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/136705.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp (+10-9)
  • (modified) llvm/test/CodeGen/AArch64/str-narrow-zero-merge.mir (+4-8)
diff --git a/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp b/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
index 7c47492cf1a8e..57540eeb5ef92 100644
--- a/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
+++ b/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
@@ -892,11 +892,10 @@ AArch64LoadStoreOpt::mergeNarrowZeroStores(MachineBasicBlock::iterator I,
     OffsetImm = IOffsetInBytes;
 
   int NewOpcode = getMatchingWideOpcode(Opc);
-  bool FinalIsScaled = !TII->hasUnscaledLdStOffset(NewOpcode);
-
-  // Adjust final offset if the result opcode is a scaled store.
-  if (FinalIsScaled) {
-    int NewOffsetStride = FinalIsScaled ? TII->getMemScale(NewOpcode) : 1;
+  // Adjust final offset on scaled stores because the new instruction
+  // has a different scale.
+  if (!TII->hasUnscaledLdStOffset(NewOpcode)) {
+    int NewOffsetStride = TII->getMemScale(NewOpcode);
     assert(((OffsetImm % NewOffsetStride) == 0) &&
            "Offset should be a multiple of the store memory scale");
     OffsetImm = OffsetImm / NewOffsetStride;
@@ -906,7 +905,7 @@ AArch64LoadStoreOpt::mergeNarrowZeroStores(MachineBasicBlock::iterator I,
   DebugLoc DL = I->getDebugLoc();
   MachineBasicBlock *MBB = I->getParent();
   MachineInstrBuilder MIB;
-  MIB = BuildMI(*MBB, InsertionPoint, DL, TII->get(getMatchingWideOpcode(Opc)))
+  MIB = BuildMI(*MBB, InsertionPoint, DL, TII->get(NewOpcode))
             .addReg(isNarrowStore(Opc) ? AArch64::WZR : AArch64::XZR)
             .add(BaseRegOp)
             .addImm(OffsetImm)
@@ -1539,10 +1538,12 @@ static bool areCandidatesToMergeOrPair(MachineInstr &FirstMI, MachineInstr &MI,
   if (!PairIsValidLdStrOpc)
     return false;
 
-  // FIXME: We don't support merging narrow stores with mixed scaled/unscaled
-  // offsets.
+  // Narrow stores do not have pair-wise opcodes, so constraint their merging
+  // iff they are zero stores.
   if (isNarrowStore(OpcA) || isNarrowStore(OpcB))
-    return false;
+    return getLdStRegOp(FirstMI).getReg() == AArch64::WZR &&
+           getLdStRegOp(MI).getReg() == AArch64::WZR &&
+           TII->getMemScale(FirstMI) == TII->getMemScale(MI);
 
   // The STR<S,D,Q,W,X>pre - STR<S,D,Q,W,X>ui and
   // LDR<S,D,Q,W,X,SW>pre-LDR<S,D,Q,W,X,SW>ui
diff --git a/llvm/test/CodeGen/AArch64/str-narrow-zero-merge.mir b/llvm/test/CodeGen/AArch64/str-narrow-zero-merge.mir
index e995c402c50a8..8ba7c1bcf305f 100644
--- a/llvm/test/CodeGen/AArch64/str-narrow-zero-merge.mir
+++ b/llvm/test/CodeGen/AArch64/str-narrow-zero-merge.mir
@@ -29,8 +29,7 @@ name:  merge_scaled_str_with_unscaled_8
 body:             |
   bb.0.entry:
     ; CHECK-LABEL: name: merge_scaled_str_with_unscaled_8
-    ; CHECK: STRBBui $wzr, $x0, 4 :: (store (s8))
-    ; CHECK-NEXT: STURBBi $wzr, $x0, 5 :: (store (s8))
+    ; CHECK: STRHHui $wzr, $x0, 2 :: (store (s8))
     ; CHECK-NEXT: RET undef $lr
     STRBBui $wzr, $x0, 4 :: (store (s8))
     STURBBi $wzr, $x0, 5 :: (store (s8))
@@ -41,8 +40,7 @@ name:  merge_unscaled_str_with_scaled_8
 body:             |
   bb.0.entry:
     ; CHECK-LABEL: name: merge_unscaled_str_with_scaled_8
-    ; CHECK: STURBBi $wzr, $x0, 4 :: (store (s8))
-    ; CHECK-NEXT: STRBBui $wzr, $x0, 5 :: (store (s8))
+    ; CHECK: STURHHi $wzr, $x0, 4 :: (store (s8))
     ; CHECK-NEXT: RET undef $lr
     STURBBi $wzr, $x0, 4 :: (store (s8))
     STRBBui $wzr, $x0, 5 :: (store (s8))
@@ -75,8 +73,7 @@ name:  merge_scaled_str_with_unscaled_16
 body:             |
   bb.0.entry:
     ; CHECK-LABEL: name: merge_scaled_str_with_unscaled_16
-    ; CHECK: STRHHui $wzr, $x0, 2 :: (store (s16))
-    ; CHECK-NEXT: STURHHi $wzr, $x0, 6 :: (store (s16))
+    ; CHECK: STRWui $wzr, $x0, 1 :: (store (s16))
     ; CHECK-NEXT: RET undef $lr
     STRHHui $wzr, $x0, 2 :: (store (s16))
     STURHHi $wzr, $x0, 6 :: (store (s16))
@@ -87,8 +84,7 @@ name:  merge_unscaled_str_with_scaled_16
 body:             |
   bb.0.entry:
     ; CHECK-LABEL: name: merge_unscaled_str_with_scaled_16
-    ; CHECK: STURHHi $wzr, $x0, 4 :: (store (s16))
-    ; CHECK-NEXT: STRHHui $wzr, $x0, 3 :: (store (s16))
+    ; CHECK: STURWi $wzr, $x0, 4 :: (store (s16))
     ; CHECK-NEXT: RET undef $lr
     STURHHi $wzr, $x0, 4 :: (store (s16))
     STRHHui $wzr, $x0, 3 :: (store (s16))

@guy-david guy-david requested a review from zjaffal April 22, 2025 14:46
@fhahn fhahn requested review from fhahn, jroelofs and davemgreen April 23, 2025 10:10
@guy-david guy-david force-pushed the users/guy-david/aarch64-narrow-store branch 2 times, most recently from 95377b3 to 7ccf02a Compare April 23, 2025 19:43
@guy-david guy-david requested a review from jroelofs April 23, 2025 19:43
@guy-david guy-david force-pushed the users/guy-david/aarch64-narrow-store branch 2 times, most recently from 3e8eb9d to d674cbe Compare May 4, 2025 19:10
Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this come up often, or has ISel usually combined them previously? It seems OK so long as the wzr stores are treated OK by the rest of the code.

@guy-david
Copy link
Contributor Author

With recent LLVM, I no longer see any cases of this happening on a very large bitcode repository, so it's possible ISel combines the stores and nothing in the codegen creates this pattern. Nonetheless, I think it's good to have it.

@davemgreen
Copy link
Collaborator

Yeah it comes up but quite rarely. Can you add some extra test cases for edge conditions? Otherwise LGTM.

@guy-david guy-david force-pushed the users/guy-david/aarch64-narrow-store branch from d674cbe to 960bc5f Compare May 7, 2025 09:37
@guy-david guy-david requested a review from davemgreen May 7, 2025 09:37
Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some test cases for the limits of address ranges? I think it might be 4095 for scaled loads and 255 for unscaled? Otherwise LGTM thanks.

@guy-david
Copy link
Contributor Author

Can you explain a bit more? Doesn't the new offset always end up being less than or equal to the offset in the first instruction?

@davemgreen
Copy link
Collaborator

Can you explain a bit more? Doesn't the new offset always end up being less than or equal to the offset in the first instruction?

Yeah I just wanted to make sure there were not any problems - especially with unscaled and scaled representing the offsets differently with negative offsets and whatnot. We've had them go wrong in the past. This patch doesn't touch them directly but so I think they should be OK, but it's good to make sure they are tested around the extremes.

@guy-david guy-david force-pushed the users/guy-david/aarch64-narrow-store branch from 960bc5f to 378d212 Compare May 8, 2025 08:39
@guy-david guy-david merged commit ae6e127 into main May 8, 2025
9 of 11 checks passed
@guy-david guy-david deleted the users/guy-david/aarch64-narrow-store branch May 8, 2025 18:34
lenary added a commit to lenary/llvm-project that referenced this pull request May 9, 2025
* main: (420 commits)
  [AArch64] Merge scaled and unscaled narrow zero stores (llvm#136705)
  [RISCV] One last migration to getInsertSubvector [nfc]
  [flang][OpenMP] Update `do concurrent` mapping pass to use `fir.do_concurrent` op (llvm#138489)
  [MLIR][LLVM] Fix llvm.mlir.global mismatching print and parser order (llvm#138986)
  [lld][NFC] Fix minor typo in docs (llvm#138898)
  [RISCV] Migrate getConstant indexed insert/extract subvector to new API (llvm#139111)
  GlobalISel: Translate minimumnum and maximumnum (llvm#139106)
  [MemProf] Simplify unittest save and restore of options (llvm#139117)
  [BOLT][AArch64] Patch functions targeted by optional relocs (llvm#138750)
  [Coverage] Support -fprofile-list for cold function coverage (llvm#136333)
  Remove unused forward decl (llvm#139108)
  [AMDGPU][NFC] Get rid of OPW constants. (llvm#139074)
  [CIR] Upstream extract op for VectorType (llvm#138413)
  [mlir][xegpu] Handle scalar uniform ops in SIMT distribution.  (llvm#138593)
  [GlobalISel][AMDGPU] Fix handling of v2i128 type for AND, OR, XOR (llvm#138574)
  AMDGPU][True16][CodeGen] FP_Round f64 to f16 in true16 (llvm#128911)
  Reland [Clang] Deprecate `__is_trivially_relocatable` (llvm#139061)
  [HLSL][NFC] Stricter Overload Tests (clamp,max,min,pow) (llvm#138993)
  [MLIR] Fixing the memref linearization size computation for non-packed memref (llvm#138922)
  [TableGen][NFC] Use early exit to simplify large block in emitAction. (llvm#138220)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants