[LoongArch] Introduce `32s` target feature for LA32S ISA extensions #139695

heiher · 2025-05-13T09:16:52Z

According to the offical LoongArch reference manual, the 32-bit LoongArch is divied into two variants: the Reduced version (LA32R) and Standard version (LA32S). LA32S extends LA32R by adding additional instructions, and the 64-bit version (LA64) fully includes the LA32S instruction set.

This patch introduces a new target feature 32s for the LoongArch backend, enabling support for instructions specific to the LA32S variant.

The LA32S exntension includes the following additional instructions:

ALSL.W
{AND,OR}N
B{EQ,NE}Z
BITREV.{4B,W}
BSTR{INS,PICK}.W
BYTEPICK.W
CL{O,Z}.W
CPUCFG
CT{O,Z}.W
EXT.W,{B,H}
F{LD,ST}X.{D,S}
MASK{EQ,NE}Z
PC{ADDI,ALAU12I}
REVB.2H
ROTR{I},W

Additionally, LA32R defines three new instruction aliases:

RDCNTID.W RJ => RDTIMEL.W ZERO, RJ
RDCNTVH.W RD => RDTIMEH.W RD, ZERO
RDCNTVL.W RD => RDTIMEL.W RD, ZERO

llvmbot · 2025-05-13T09:17:26Z

@llvm/pr-subscribers-mc

@llvm/pr-subscribers-backend-loongarch

Author: hev (heiher)

Changes

According to the offical LoongArch reference manual, the 32-bit LoongArch is divied into two variants: the Reduced version (LA32R) and Standard version (LA32S). LA32S extends LA32R by adding additional instructions, and the 64-bit version (LA64) fully includes the LA32S instruction set.

This patch introduces a new target feature 32s for the LoongArch backend, enabling support for instructions specific to the LA32S variant.

The LA32S exntension includes the following additional instructions:

ALSL.W
{AND,OR}N
B{EQ,NE}Z
BITREV.{4B,W}
BSTR{INS,PICK}.W
BYTEPICK.W
CL{O,Z}.W
CPUCFG
CT{O,Z}.W
EXT.W,{B,H}
F{LD,ST}X.{D,S}
MASK{EQ,NE}Z
PC{ADDI,ALAU12I}
REVB.2H
ROTR{I},W

Additionally, LA32R defines three new instruction aliases:

RDCNTID.W RJ => RDTIMEL.W ZERO, RJ
RDCNTVH.W RD => RDTIMEH.W RD, ZERO
RDCNTVL.W RD => RDTIMEL.W RD, ZERO

Patch is 979.81 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/139695.diff

60 Files Affected:

(modified) llvm/lib/Target/LoongArch/LoongArch.td (+12)
(modified) llvm/lib/Target/LoongArch/LoongArchExpandAtomicPseudoInsts.cpp (+12-6)
(modified) llvm/lib/Target/LoongArch/LoongArchISelDAGToDAG.h (+22)
(modified) llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp (+506-11)
(modified) llvm/lib/Target/LoongArch/LoongArchISelLowering.h (+4)
(modified) llvm/lib/Target/LoongArch/LoongArchInstrInfo.td (+126-49)
(modified) llvm/test/CodeGen/LoongArch/alloca.ll (+148-71)
(modified) llvm/test/CodeGen/LoongArch/alsl.ll (+273-118)
(modified) llvm/test/CodeGen/LoongArch/annotate-tablejump.ll (+1-1)
(modified) llvm/test/CodeGen/LoongArch/atomicrmw-cond-sub-clamp.ll (+8-8)
(modified) llvm/test/CodeGen/LoongArch/atomicrmw-uinc-udec-wrap.ll (+8-8)
(modified) llvm/test/CodeGen/LoongArch/bitreverse.ll (+573-72)
(modified) llvm/test/CodeGen/LoongArch/bnez-beqz.ll (+66-31)
(modified) llvm/test/CodeGen/LoongArch/branch-relaxation.ll (+101-53)
(modified) llvm/test/CodeGen/LoongArch/bstrins_w.ll (+1-1)
(modified) llvm/test/CodeGen/LoongArch/bstrpick_w.ll (+1-1)
(modified) llvm/test/CodeGen/LoongArch/bswap-bitreverse.ll (+225-39)
(modified) llvm/test/CodeGen/LoongArch/bswap.ll (+229-65)
(modified) llvm/test/CodeGen/LoongArch/bytepick.ll (+1-1)
(modified) llvm/test/CodeGen/LoongArch/ctlz-cttz-ctpop.ll (+803-200)
(modified) llvm/test/CodeGen/LoongArch/ctpop-with-lsx.ll (+80-37)
(modified) llvm/test/CodeGen/LoongArch/exception-pointer-register.ll (+1-1)
(modified) llvm/test/CodeGen/LoongArch/fabs.ll (+2-2)
(modified) llvm/test/CodeGen/LoongArch/fcopysign.ll (+2-2)
(modified) llvm/test/CodeGen/LoongArch/feature-32bit.ll (+1)
(modified) llvm/test/CodeGen/LoongArch/intrinsic-csr-side-effects.ll (+1-1)
(modified) llvm/test/CodeGen/LoongArch/ir-instruction/and.ll (+303-136)
(modified) llvm/test/CodeGen/LoongArch/ir-instruction/ashr.ll (+20-22)
(modified) llvm/test/CodeGen/LoongArch/ir-instruction/atomic-cmpxchg-128.ll (+10-10)
(modified) llvm/test/CodeGen/LoongArch/ir-instruction/atomic-cmpxchg.ll (+44-44)
(modified) llvm/test/CodeGen/LoongArch/ir-instruction/atomicrmw-fp.ll (+40-40)
(modified) llvm/test/CodeGen/LoongArch/ir-instruction/atomicrmw-lam-bh.ll (+557-591)
(modified) llvm/test/CodeGen/LoongArch/ir-instruction/atomicrmw-lamcas.ll (+90-90)
(modified) llvm/test/CodeGen/LoongArch/ir-instruction/atomicrmw-minmax.ll (+40-40)
(modified) llvm/test/CodeGen/LoongArch/ir-instruction/atomicrmw.ll (+4752-2266)
(modified) llvm/test/CodeGen/LoongArch/ir-instruction/br.ll (+201-90)
(modified) llvm/test/CodeGen/LoongArch/ir-instruction/double-convert.ll (+9-8)
(modified) llvm/test/CodeGen/LoongArch/ir-instruction/fcmp-dbl.ll (+1-1)
(modified) llvm/test/CodeGen/LoongArch/ir-instruction/fcmp-flt.ll (+1-1)
(modified) llvm/test/CodeGen/LoongArch/ir-instruction/float-convert.ll (+18-16)
(modified) llvm/test/CodeGen/LoongArch/ir-instruction/load-store-fp.ll (+2-2)
(modified) llvm/test/CodeGen/LoongArch/ir-instruction/load-store.ll (+818-390)
(modified) llvm/test/CodeGen/LoongArch/ir-instruction/lshr.ll (+132-57)
(modified) llvm/test/CodeGen/LoongArch/ir-instruction/mul.ll (+1276-594)
(modified) llvm/test/CodeGen/LoongArch/ir-instruction/sdiv-udiv-srem-urem.ll (+1041-490)
(modified) llvm/test/CodeGen/LoongArch/ir-instruction/select-bare-int.ll (+36-25)
(modified) llvm/test/CodeGen/LoongArch/ir-instruction/select-fpcc-int.ll (+112-84)
(modified) llvm/test/CodeGen/LoongArch/ir-instruction/select-icc-int.ll (+50-46)
(modified) llvm/test/CodeGen/LoongArch/ir-instruction/sext-zext-trunc.ll (+281-121)
(modified) llvm/test/CodeGen/LoongArch/ir-instruction/shl.ll (+13-11)
(modified) llvm/test/CodeGen/LoongArch/jump-table.ll (+2-2)
(modified) llvm/test/CodeGen/LoongArch/rotl-rotr.ll (+711-320)
(modified) llvm/test/CodeGen/LoongArch/select-to-shiftand.ll (+4-4)
(modified) llvm/test/CodeGen/LoongArch/shift-masked-shamt.ll (+40-40)
(modified) llvm/test/CodeGen/LoongArch/smul-with-overflow.ll (+132-140)
(modified) llvm/test/CodeGen/LoongArch/stack-realignment-with-variable-sized-objects.ll (+1-1)
(modified) llvm/test/CodeGen/LoongArch/typepromotion-overflow.ll (+468-222)
(modified) llvm/test/MC/LoongArch/Basic/Integer/atomic.s (+8-8)
(modified) llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/loongarch_generated_funcs.ll.generated.expected (+3-3)
(modified) llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/loongarch_generated_funcs.ll.nogenerated.expected (+3-3)

diff --git a/llvm/lib/Target/LoongArch/LoongArch.td b/llvm/lib/Target/LoongArch/LoongArch.td
index 5fd52babfc6ec..707d2de23cdfe 100644
--- a/llvm/lib/Target/LoongArch/LoongArch.td
+++ b/llvm/lib/Target/LoongArch/LoongArch.td
@@ -32,6 +32,14 @@ def IsLA32
 defvar LA32 = DefaultMode;
 def LA64 : HwMode<"+64bit", [IsLA64]>;
 
+// LoongArch 32-bit is divided into variants, the reduced 32-bit variant (LA32R)
+// and the standard 32-bit variant (LA32S).
+def Feature32S
+    : SubtargetFeature<"32s", "Has32S", "true",
+                       "LA32 Standard Basic Instruction Extension">;
+def Has32S : Predicate<"Subtarget->has32S()">;
+def Not32S : Predicate<"!Subtarget->has32S()">;
+
 // Single Precision floating point
 def FeatureBasicF
     : SubtargetFeature<"f", "HasBasicF", "true",
@@ -159,11 +167,13 @@ include "LoongArchInstrInfo.td"
 
 def : ProcessorModel<"generic-la32", NoSchedModel, [Feature32Bit]>;
 def : ProcessorModel<"generic-la64", NoSchedModel, [Feature64Bit,
+                                                    Feature32S,
                                                     FeatureUAL,
                                                     FeatureExtLSX]>;
 
 // Generic 64-bit processor with double-precision floating-point support.
 def : ProcessorModel<"loongarch64", NoSchedModel, [Feature64Bit,
+                                                   Feature32S,
                                                    FeatureUAL,
                                                    FeatureBasicD]>;
 
@@ -172,12 +182,14 @@ def : ProcessorModel<"loongarch64", NoSchedModel, [Feature64Bit,
 def : ProcessorModel<"generic", NoSchedModel, []>;
 
 def : ProcessorModel<"la464", NoSchedModel, [Feature64Bit,
+                                             Feature32S,
                                              FeatureUAL,
                                              FeatureExtLASX,
                                              FeatureExtLVZ,
                                              FeatureExtLBT]>;
 
 def : ProcessorModel<"la664", NoSchedModel, [Feature64Bit,
+                                             Feature32S,
                                              FeatureUAL,
                                              FeatureExtLASX,
                                              FeatureExtLVZ,
diff --git a/llvm/lib/Target/LoongArch/LoongArchExpandAtomicPseudoInsts.cpp b/llvm/lib/Target/LoongArch/LoongArchExpandAtomicPseudoInsts.cpp
index 27d20390eb6ae..3be012feb2385 100644
--- a/llvm/lib/Target/LoongArch/LoongArchExpandAtomicPseudoInsts.cpp
+++ b/llvm/lib/Target/LoongArch/LoongArchExpandAtomicPseudoInsts.cpp
@@ -214,8 +214,9 @@ static void doAtomicBinOpExpansion(const LoongArchInstrInfo *TII,
       .addReg(ScratchReg)
       .addReg(AddrReg)
       .addImm(0);
-  BuildMI(LoopMBB, DL, TII->get(LoongArch::BEQZ))
+  BuildMI(LoopMBB, DL, TII->get(LoongArch::BEQ))
       .addReg(ScratchReg)
+      .addReg(LoongArch::R0)
       .addMBB(LoopMBB);
 }
 
@@ -296,8 +297,9 @@ static void doMaskedAtomicBinOpExpansion(
       .addReg(ScratchReg)
       .addReg(AddrReg)
       .addImm(0);
-  BuildMI(LoopMBB, DL, TII->get(LoongArch::BEQZ))
+  BuildMI(LoopMBB, DL, TII->get(LoongArch::BEQ))
       .addReg(ScratchReg)
+      .addReg(LoongArch::R0)
       .addMBB(LoopMBB);
 }
 
@@ -454,8 +456,9 @@ bool LoongArchExpandAtomicPseudo::expandAtomicMinMaxOp(
       .addReg(Scratch1Reg)
       .addReg(AddrReg)
       .addImm(0);
-  BuildMI(LoopTailMBB, DL, TII->get(LoongArch::BEQZ))
+  BuildMI(LoopTailMBB, DL, TII->get(LoongArch::BEQ))
       .addReg(Scratch1Reg)
+      .addReg(LoongArch::R0)
       .addMBB(LoopHeadMBB);
 
   NextMBBI = MBB.end();
@@ -529,8 +532,9 @@ bool LoongArchExpandAtomicPseudo::expandAtomicCmpXchg(
         .addReg(ScratchReg)
         .addReg(AddrReg)
         .addImm(0);
-    BuildMI(LoopTailMBB, DL, TII->get(LoongArch::BEQZ))
+    BuildMI(LoopTailMBB, DL, TII->get(LoongArch::BEQ))
         .addReg(ScratchReg)
+        .addReg(LoongArch::R0)
         .addMBB(LoopHeadMBB);
     BuildMI(LoopTailMBB, DL, TII->get(LoongArch::B)).addMBB(DoneMBB);
   } else {
@@ -569,8 +573,9 @@ bool LoongArchExpandAtomicPseudo::expandAtomicCmpXchg(
         .addReg(ScratchReg)
         .addReg(AddrReg)
         .addImm(0);
-    BuildMI(LoopTailMBB, DL, TII->get(LoongArch::BEQZ))
+    BuildMI(LoopTailMBB, DL, TII->get(LoongArch::BEQ))
         .addReg(ScratchReg)
+        .addReg(LoongArch::R0)
         .addMBB(LoopHeadMBB);
     BuildMI(LoopTailMBB, DL, TII->get(LoongArch::B)).addMBB(DoneMBB);
   }
@@ -677,8 +682,9 @@ bool LoongArchExpandAtomicPseudo::expandAtomicCmpXchg128(
       .addReg(ScratchReg)
       .addReg(NewValHiReg)
       .addReg(AddrReg);
-  BuildMI(LoopTailMBB, DL, TII->get(LoongArch::BEQZ))
+  BuildMI(LoopTailMBB, DL, TII->get(LoongArch::BEQ))
       .addReg(ScratchReg)
+      .addReg(LoongArch::R0)
       .addMBB(LoopHeadMBB);
   BuildMI(LoopTailMBB, DL, TII->get(LoongArch::B)).addMBB(DoneMBB);
   int hint;
diff --git a/llvm/lib/Target/LoongArch/LoongArchISelDAGToDAG.h b/llvm/lib/Target/LoongArch/LoongArchISelDAGToDAG.h
index 8a7eba418d804..e94f249c14be2 100644
--- a/llvm/lib/Target/LoongArch/LoongArchISelDAGToDAG.h
+++ b/llvm/lib/Target/LoongArch/LoongArchISelDAGToDAG.h
@@ -64,6 +64,28 @@ class LoongArchDAGToDAGISel : public SelectionDAGISel {
   bool selectVSplatUimmInvPow2(SDValue N, SDValue &SplatImm) const;
   bool selectVSplatUimmPow2(SDValue N, SDValue &SplatImm) const;
 
+  // Return the LoongArch branch opcode that matches the given DAG integer
+  // condition code. The CondCode must be one of those supported by the
+  // LoongArch ISA (see translateSetCCForBranch).
+  static unsigned getBranchOpcForIntCC(ISD::CondCode CC) {
+    switch (CC) {
+    default:
+      llvm_unreachable("Unsupported CondCode");
+    case ISD::SETEQ:
+      return LoongArch::BEQ;
+    case ISD::SETNE:
+      return LoongArch::BNE;
+    case ISD::SETLT:
+      return LoongArch::BLT;
+    case ISD::SETGE:
+      return LoongArch::BGE;
+    case ISD::SETULT:
+      return LoongArch::BLTU;
+    case ISD::SETUGE:
+      return LoongArch::BGEU;
+    }
+  }
+
 // Include the pieces autogenerated from the target description.
 #include "LoongArchGenDAGISel.inc"
 };
diff --git a/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp b/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
index b729b4ea6f9b4..6e3e1396e6aeb 100644
--- a/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
+++ b/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
@@ -18,6 +18,7 @@
 #include "LoongArchSubtarget.h"
 #include "MCTargetDesc/LoongArchBaseInfo.h"
 #include "MCTargetDesc/LoongArchMCTargetDesc.h"
+#include "llvm/ADT/SmallSet.h"
 #include "llvm/ADT/Statistic.h"
 #include "llvm/ADT/StringExtras.h"
 #include "llvm/CodeGen/ISDOpcodes.h"
@@ -102,15 +103,26 @@ LoongArchTargetLowering::LoongArchTargetLowering(const TargetMachine &TM,
 
   setOperationAction(ISD::PREFETCH, MVT::Other, Custom);
 
-  // Expand bitreverse.i16 with native-width bitrev and shift for now, before
-  // we get to know which of sll and revb.2h is faster.
-  setOperationAction(ISD::BITREVERSE, MVT::i8, Custom);
-  setOperationAction(ISD::BITREVERSE, GRLenVT, Legal);
-
-  // LA32 does not have REVB.2W and REVB.D due to the 64-bit operands, and
-  // the narrower REVB.W does not exist. But LA32 does have REVB.2H, so i16
-  // and i32 could still be byte-swapped relatively cheaply.
-  setOperationAction(ISD::BSWAP, MVT::i16, Custom);
+  // BITREV/REVB requires the 32S feature.
+  if (STI.has32S()) {
+    // Expand bitreverse.i16 with native-width bitrev and shift for now, before
+    // we get to know which of sll and revb.2h is faster.
+    setOperationAction(ISD::BITREVERSE, MVT::i8, Custom);
+    setOperationAction(ISD::BITREVERSE, GRLenVT, Legal);
+
+    // LA32 does not have REVB.2W and REVB.D due to the 64-bit operands, and
+    // the narrower REVB.W does not exist. But LA32 does have REVB.2H, so i16
+    // and i32 could still be byte-swapped relatively cheaply.
+    setOperationAction(ISD::BSWAP, MVT::i16, Custom);
+  } else {
+    setOperationAction(ISD::BSWAP, GRLenVT, Expand);
+    setOperationAction(ISD::CTTZ, GRLenVT, Expand);
+    setOperationAction(ISD::CTLZ, GRLenVT, Expand);
+    setOperationAction(ISD::ROTR, GRLenVT, Expand);
+    setOperationAction(ISD::SELECT, GRLenVT, Custom);
+    setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i8, Expand);
+    setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i16, Expand);
+  }
 
   setOperationAction(ISD::BR_JT, MVT::Other, Expand);
   setOperationAction(ISD::BR_CC, GRLenVT, Expand);
@@ -476,6 +488,8 @@ SDValue LoongArchTargetLowering::LowerOperation(SDValue Op,
     return lowerSCALAR_TO_VECTOR(Op, DAG);
   case ISD::PREFETCH:
     return lowerPREFETCH(Op, DAG);
+  case ISD::SELECT:
+    return lowerSELECT(Op, DAG);
   }
   return SDValue();
 }
@@ -492,6 +506,327 @@ SDValue LoongArchTargetLowering::lowerPREFETCH(SDValue Op,
   return Op;
 }
 
+// Return true if Val is equal to (setcc LHS, RHS, CC).
+// Return false if Val is the inverse of (setcc LHS, RHS, CC).
+// Otherwise, return std::nullopt.
+static std::optional<bool> matchSetCC(SDValue LHS, SDValue RHS,
+                                      ISD::CondCode CC, SDValue Val) {
+  assert(Val->getOpcode() == ISD::SETCC);
+  SDValue LHS2 = Val.getOperand(0);
+  SDValue RHS2 = Val.getOperand(1);
+  ISD::CondCode CC2 = cast<CondCodeSDNode>(Val.getOperand(2))->get();
+
+  if (LHS == LHS2 && RHS == RHS2) {
+    if (CC == CC2)
+      return true;
+    if (CC == ISD::getSetCCInverse(CC2, LHS2.getValueType()))
+      return false;
+  } else if (LHS == RHS2 && RHS == LHS2) {
+    CC2 = ISD::getSetCCSwappedOperands(CC2);
+    if (CC == CC2)
+      return true;
+    if (CC == ISD::getSetCCInverse(CC2, LHS2.getValueType()))
+      return false;
+  }
+
+  return std::nullopt;
+}
+
+static SDValue combineSelectToBinOp(SDNode *N, SelectionDAG &DAG,
+                                    const LoongArchSubtarget &Subtarget) {
+  SDValue CondV = N->getOperand(0);
+  SDValue TrueV = N->getOperand(1);
+  SDValue FalseV = N->getOperand(2);
+  MVT VT = N->getSimpleValueType(0);
+  SDLoc DL(N);
+
+  // (select c, -1, y) -> -c | y
+  if (isAllOnesConstant(TrueV)) {
+    SDValue Neg = DAG.getNegative(CondV, DL, VT);
+    return DAG.getNode(ISD::OR, DL, VT, Neg, DAG.getFreeze(FalseV));
+  }
+  // (select c, y, -1) -> (c-1) | y
+  if (isAllOnesConstant(FalseV)) {
+    SDValue Neg =
+        DAG.getNode(ISD::ADD, DL, VT, CondV, DAG.getAllOnesConstant(DL, VT));
+    return DAG.getNode(ISD::OR, DL, VT, Neg, DAG.getFreeze(TrueV));
+  }
+
+  // (select c, 0, y) -> (c-1) & y
+  if (isNullConstant(TrueV)) {
+    SDValue Neg =
+        DAG.getNode(ISD::ADD, DL, VT, CondV, DAG.getAllOnesConstant(DL, VT));
+    return DAG.getNode(ISD::AND, DL, VT, Neg, DAG.getFreeze(FalseV));
+  }
+  // (select c, y, 0) -> -c & y
+  if (isNullConstant(FalseV)) {
+    SDValue Neg = DAG.getNegative(CondV, DL, VT);
+    return DAG.getNode(ISD::AND, DL, VT, Neg, DAG.getFreeze(TrueV));
+  }
+
+  // select c, ~x, x --> xor -c, x
+  if (isa<ConstantSDNode>(TrueV) && isa<ConstantSDNode>(FalseV)) {
+    const APInt &TrueVal = TrueV->getAsAPIntVal();
+    const APInt &FalseVal = FalseV->getAsAPIntVal();
+    if (~TrueVal == FalseVal) {
+      SDValue Neg = DAG.getNegative(CondV, DL, VT);
+      return DAG.getNode(ISD::XOR, DL, VT, Neg, FalseV);
+    }
+  }
+
+  // Try to fold (select (setcc lhs, rhs, cc), truev, falsev) into bitwise ops
+  // when both truev and falsev are also setcc.
+  if (CondV.getOpcode() == ISD::SETCC && TrueV.getOpcode() == ISD::SETCC &&
+      FalseV.getOpcode() == ISD::SETCC) {
+    SDValue LHS = CondV.getOperand(0);
+    SDValue RHS = CondV.getOperand(1);
+    ISD::CondCode CC = cast<CondCodeSDNode>(CondV.getOperand(2))->get();
+
+    // (select x, x, y) -> x | y
+    // (select !x, x, y) -> x & y
+    if (std::optional<bool> MatchResult = matchSetCC(LHS, RHS, CC, TrueV)) {
+      return DAG.getNode(*MatchResult ? ISD::OR : ISD::AND, DL, VT, TrueV,
+                         DAG.getFreeze(FalseV));
+    }
+    // (select x, y, x) -> x & y
+    // (select !x, y, x) -> x | y
+    if (std::optional<bool> MatchResult = matchSetCC(LHS, RHS, CC, FalseV)) {
+      return DAG.getNode(*MatchResult ? ISD::AND : ISD::OR, DL, VT,
+                         DAG.getFreeze(TrueV), FalseV);
+    }
+  }
+
+  return SDValue();
+}
+
+// Transform `binOp (select cond, x, c0), c1` where `c0` and `c1` are constants
+// into `select cond, binOp(x, c1), binOp(c0, c1)` if profitable.
+// For now we only consider transformation profitable if `binOp(c0, c1)` ends up
+// being `0` or `-1`. In such cases we can replace `select` with `and`.
+// TODO: Should we also do this if `binOp(c0, c1)` is cheaper to materialize
+// than `c0`?
+static SDValue
+foldBinOpIntoSelectIfProfitable(SDNode *BO, SelectionDAG &DAG,
+                                const LoongArchSubtarget &Subtarget) {
+  unsigned SelOpNo = 0;
+  SDValue Sel = BO->getOperand(0);
+  if (Sel.getOpcode() != ISD::SELECT || !Sel.hasOneUse()) {
+    SelOpNo = 1;
+    Sel = BO->getOperand(1);
+  }
+
+  if (Sel.getOpcode() != ISD::SELECT || !Sel.hasOneUse())
+    return SDValue();
+
+  unsigned ConstSelOpNo = 1;
+  unsigned OtherSelOpNo = 2;
+  if (!isa<ConstantSDNode>(Sel->getOperand(ConstSelOpNo))) {
+    ConstSelOpNo = 2;
+    OtherSelOpNo = 1;
+  }
+  SDValue ConstSelOp = Sel->getOperand(ConstSelOpNo);
+  ConstantSDNode *ConstSelOpNode = dyn_cast<ConstantSDNode>(ConstSelOp);
+  if (!ConstSelOpNode || ConstSelOpNode->isOpaque())
+    return SDValue();
+
+  SDValue ConstBinOp = BO->getOperand(SelOpNo ^ 1);
+  ConstantSDNode *ConstBinOpNode = dyn_cast<ConstantSDNode>(ConstBinOp);
+  if (!ConstBinOpNode || ConstBinOpNode->isOpaque())
+    return SDValue();
+
+  SDLoc DL(Sel);
+  EVT VT = BO->getValueType(0);
+
+  SDValue NewConstOps[2] = {ConstSelOp, ConstBinOp};
+  if (SelOpNo == 1)
+    std::swap(NewConstOps[0], NewConstOps[1]);
+
+  SDValue NewConstOp =
+      DAG.FoldConstantArithmetic(BO->getOpcode(), DL, VT, NewConstOps);
+  if (!NewConstOp)
+    return SDValue();
+
+  const APInt &NewConstAPInt = NewConstOp->getAsAPIntVal();
+  if (!NewConstAPInt.isZero() && !NewConstAPInt.isAllOnes())
+    return SDValue();
+
+  SDValue OtherSelOp = Sel->getOperand(OtherSelOpNo);
+  SDValue NewNonConstOps[2] = {OtherSelOp, ConstBinOp};
+  if (SelOpNo == 1)
+    std::swap(NewNonConstOps[0], NewNonConstOps[1]);
+  SDValue NewNonConstOp = DAG.getNode(BO->getOpcode(), DL, VT, NewNonConstOps);
+
+  SDValue NewT = (ConstSelOpNo == 1) ? NewConstOp : NewNonConstOp;
+  SDValue NewF = (ConstSelOpNo == 1) ? NewNonConstOp : NewConstOp;
+  return DAG.getSelect(DL, VT, Sel.getOperand(0), NewT, NewF);
+}
+
+// Changes the condition code and swaps operands if necessary, so the SetCC
+// operation matches one of the comparisons supported directly by branches
+// in the LoongArch ISA. May adjust compares to favor compare with 0 over
+// compare with 1/-1.
+static void translateSetCCForBranch(const SDLoc &DL, SDValue &LHS, SDValue &RHS,
+                                    ISD::CondCode &CC, SelectionDAG &DAG) {
+  // If this is a single bit test that can't be handled by ANDI, shift the
+  // bit to be tested to the MSB and perform a signed compare with 0.
+  if (isIntEqualitySetCC(CC) && isNullConstant(RHS) &&
+      LHS.getOpcode() == ISD::AND && LHS.hasOneUse() &&
+      isa<ConstantSDNode>(LHS.getOperand(1))) {
+    uint64_t Mask = LHS.getConstantOperandVal(1);
+    if ((isPowerOf2_64(Mask) || isMask_64(Mask)) && !isInt<12>(Mask)) {
+      unsigned ShAmt = 0;
+      if (isPowerOf2_64(Mask)) {
+        CC = CC == ISD::SETEQ ? ISD::SETGE : ISD::SETLT;
+        ShAmt = LHS.getValueSizeInBits() - 1 - Log2_64(Mask);
+      } else {
+        ShAmt = LHS.getValueSizeInBits() - llvm::bit_width(Mask);
+      }
+
+      LHS = LHS.getOperand(0);
+      if (ShAmt != 0)
+        LHS = DAG.getNode(ISD::SHL, DL, LHS.getValueType(), LHS,
+                          DAG.getConstant(ShAmt, DL, LHS.getValueType()));
+      return;
+    }
+  }
+
+  if (auto *RHSC = dyn_cast<ConstantSDNode>(RHS)) {
+    int64_t C = RHSC->getSExtValue();
+    switch (CC) {
+    default:
+      break;
+    case ISD::SETGT:
+      // Convert X > -1 to X >= 0.
+      if (C == -1) {
+        RHS = DAG.getConstant(0, DL, RHS.getValueType());
+        CC = ISD::SETGE;
+        return;
+      }
+      break;
+    case ISD::SETLT:
+      // Convert X < 1 to 0 >= X.
+      if (C == 1) {
+        RHS = LHS;
+        LHS = DAG.getConstant(0, DL, RHS.getValueType());
+        CC = ISD::SETGE;
+        return;
+      }
+      break;
+    }
+  }
+
+  switch (CC) {
+  default:
+    break;
+  case ISD::SETGT:
+  case ISD::SETLE:
+  case ISD::SETUGT:
+  case ISD::SETULE:
+    CC = ISD::getSetCCSwappedOperands(CC);
+    std::swap(LHS, RHS);
+    break;
+  }
+}
+
+SDValue LoongArchTargetLowering::lowerSELECT(SDValue Op,
+                                             SelectionDAG &DAG) const {
+  SDValue CondV = Op.getOperand(0);
+  SDValue TrueV = Op.getOperand(1);
+  SDValue FalseV = Op.getOperand(2);
+  SDLoc DL(Op);
+  MVT VT = Op.getSimpleValueType();
+  MVT GRLenVT = Subtarget.getGRLenVT();
+
+  if (SDValue V = combineSelectToBinOp(Op.getNode(), DAG, Subtarget))
+    return V;
+
+  if (Op.hasOneUse()) {
+    unsigned UseOpc = Op->user_begin()->getOpcode();
+    if (isBinOp(UseOpc) && DAG.isSafeToSpeculativelyExecute(UseOpc)) {
+      SDNode *BinOp = *Op->user_begin();
+      if (SDValue NewSel = foldBinOpIntoSelectIfProfitable(*Op->user_begin(),
+                                                           DAG, Subtarget)) {
+        DAG.ReplaceAllUsesWith(BinOp, &NewSel);
+        // Opcode check is necessary because foldBinOpIntoSelectIfProfitable
+        // may return a constant node and cause crash in lowerSELECT.
+        if (NewSel.getOpcode() == ISD::SELECT)
+          return lowerSELECT(NewSel, DAG);
+        return NewSel;
+      }
+    }
+  }
+
+  // If the condition is not an integer SETCC which operates on GRLenVT, we need
+  // to emit a LoongArchISD::SELECT_CC comparing the condition to zero. i.e.:
+  // (select condv, truev, falsev)
+  // -> (loongarchisd::select_cc condv, zero, setne, truev, falsev)
+  if (CondV.getOpcode() != ISD::SETCC ||
+      CondV.getOperand(0).getSimpleValueType() != GRLenVT) {
+    SDValue Zero = DAG.getConstant(0, DL, GRLenVT);
+    SDValue SetNE = DAG.getCondCode(ISD::SETNE);
+
+    SDValue Ops[] = {CondV, Zero, SetNE, TrueV, FalseV};
+
+    return DAG.getNode(LoongArchISD::SELECT_CC, DL, VT, Ops);
+  }
+
+  // If the CondV is the output of a SETCC node which operates on GRLenVT
+  // inputs, then merge the SETCC node into the lowered LoongArchISD::SELECT_CC
+  // to take advantage of the integer compare+branch instructions. i.e.: (select
+  // (setcc lhs, rhs, cc), truev, falsev)
+  // -> (loongarchisd::select_cc lhs, rhs, cc, truev, falsev)
+  SDValue LHS = CondV.getOperand(0);
+  SDValue RHS = CondV.getOperand(1);
+  ISD::CondCode CCVal = cast<CondCodeSDNode>(CondV.getOperand(2))->get();
+
+  // Special case for a select of 2 constants that have a difference of 1.
+  // Normally this is done by DAGCombine, but if the select is introduced by
+  // type legalization or op legalization, we miss it. Restricting to SETLT
+  // case for now because that is what signed saturating add/sub need.
+  // FIXME: We don't need the condition to be SETLT or even a SETCC,
+  // but we would probably want to swap the true/false values if the condition
+  // is SETGE/SETLE to avoid an XORI.
+  if (isa<ConstantSDNode>(TrueV) && isa<ConstantSDNode>(FalseV) &&
+      CCVal == ISD::SETLT) {
+    const APInt &TrueVal = TrueV->getAsAPIntVal();
+    const APInt &FalseVal = FalseV->getAsAPIntVal();
+    if (TrueVal - 1 == FalseVal)
+      return DAG.getNode(ISD::ADD, DL, VT, CondV, FalseV);
+    if (TrueVal + 1 == FalseVal)
+      return DAG.getNode(ISD::SUB, DL, VT, FalseV, CondV);
+  }
+
+  translateSetCCForBranch(DL, LHS, RHS, CCVal, DAG);
+  // 1 < x ? x : 1 -> 0 < x ? x : 1
+  if (isOneConstant(LHS) && (CCVal == ISD::SETLT || CCVal == ISD::SETULT) &&
+      RHS == TrueV && LHS == FalseV) {
+    LHS = DAG.getConstant(0, DL, VT);
+    // 0 <u x is the same as x != 0.
+    if (CCVal == ISD::SETULT) {
+      std::swap(LHS, RHS);
+      CCVal = ISD::SETNE;
+    }
+  }
+
+  // x <s -1 ? x : -1 -> x <s 0 ? x : -1
+  if (isAllOnesConstant(RH...
[truncated]

According to the offical LoongArch reference manual, the 32-bit LoongArch is divied into two variants: the Reduced version (LA32R) and Standard version (LA32S). LA32S extends LA32R by adding additional instructions, and the 64-bit version (LA64) fully includes the LA32S instruction set. This patch introduces a new target feature `32s` for the LoongArch backend, enabling support for instructions specific to the LA32S variant. The LA32S exntension includes the following additional instructions: - ALSL.W - {AND,OR}N - B{EQ,NE}Z - BITREV.{4B,W} - BSTR{INS,PICK}.W - BYTEPICK.W - CL{O,Z}.W - CPUCFG - CT{O,Z}.W - EXT.W,{B,H} - F{LD,ST}X.{D,S} - MASK{EQ,NE}Z - PC{ADDI,ALAU12I} - REVB.2H - ROTR{I},W Additionally, LA32R defines three new instruction aliases: - RDCNTID.W RJ => RDTIMEL.W ZERO, RJ - RDCNTVH.W RD => RDTIMEH.W RD, ZERO - RDCNTVL.W RD => RDTIMEL.W RD, ZERO

heiher · 2025-05-13T11:36:29Z

cc @FlyGoat @xry111 @xen0n

xen0n · 2025-05-13T12:05:58Z

llvm/lib/Target/LoongArch/LoongArchExpandAtomicPseudoInsts.cpp

@@ -214,8 +214,9 @@ static void doAtomicBinOpExpansion(const LoongArchInstrInfo *TII,
      .addReg(ScratchReg)
      .addReg(AddrReg)
      .addImm(0);
-  BuildMI(LoopMBB, DL, TII->get(LoongArch::BEQZ))
+  BuildMI(LoopMBB, DL, TII->get(LoongArch::BEQ))


BEQZ has the advantage over BEQ in that its reach is broader by 5 bits (width of a GPR slot), so if some of the changed bits potentially refers to a remote MBB, we may want to preserve them? I'm not looking at this code as closely as I'd prefer because I'm just battling my procrastination and doing a quick review here.

I agree. While BEQZ does offer a wider branch range than BEQ, their latency and throughput are the same. For expanding the pseudo-atomic seqences, I believe BEQ's range is sufficient. I chose not to split it further to avoid unnecessary divergence between the 32-bit and 64-bit code paths.

xen0n · 2025-05-13T12:16:51Z

llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp

+    setOperationAction(ISD::CTTZ, GRLenVT, Expand);
+    setOperationAction(ISD::CTLZ, GRLenVT, Expand);
+    setOperationAction(ISD::ROTR, GRLenVT, Expand);
+    setOperationAction(ISD::SELECT, GRLenVT, Custom);


Is the custom expansion for select absolutely necessary for the LA32 bringup? I see a lot of added code is presumably for optimizing select performance in the absence of the maskeqz + masknez combo, but I'd suggest splitting the changes into another PR if it's not essential to the bringup, and keep the bringup here focused.

Yes, this part is required. Without MASK{EQ,NE}Z, there's no way to lower ISD::SELECT properly, so the custom expansion is essential for the bringup.

xen0n · 2025-05-13T12:19:28Z

llvm/lib/Target/LoongArch/LoongArchInstrInfo.td

+def BNEZ : BrCCZ_1RI21<0x44000000>;
+
+// Other Miscellaneous Instructions
+def CPUCFG : ALU_2R<0x00006c00>;


Given CPUCFG is one of the few ways an application can query its runtime CPU's capabilities, and the only one way if running without ELF HWCAP support, we may want to allow it despite the spec not mandating it. I know, and personally insist that the LA32R spec is wrong in not doing so, but maybe acting against the spec this time will do the users more good than harm.

I personally agree. IIRC, @FlyGoat has considered emulating the CPUCFG instruction for LA32R in the Linux kernel, so compiler-side support would be necessary. Unless there are objections, I plan to go ahead with this.

Yup, I think the same for IOCSR (probably all privileged instructions).

xen0n · 2025-05-13T12:20:21Z

llvm/lib/Target/LoongArch/LoongArchInstrInfo.td

@@ -1054,6 +1068,8 @@ def AMCAS__DB_D  : AMCAS_3R<0x385b8000>;
 def LL_D : LLBase<0x22000000>;
 def SC_D : SCBase<0x23000000>;
 def SC_Q : SCBase_128<0x38570000>;
+def LLACQ_W : LLBase_ACQ<0x38578000>;
+def SCREL_W : SCBase_REL<0x38578400>;


This can be a separate change that can be fast-tracked?

They aren't new additions; rather, they are being moved out of the Has32S predicates.

llvmbot added mc Machine (object) code backend:loongarch labels May 13, 2025

heiher force-pushed the la32s branch from d73ddae to 3ce7e72 Compare May 13, 2025 09:43

heiher requested review from wangleiat, SixWeining and tangaac May 13, 2025 11:35

xen0n reviewed May 13, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LoongArch] Introduce `32s` target feature for LA32S ISA extensions #139695

[LoongArch] Introduce `32s` target feature for LA32S ISA extensions #139695

heiher commented May 13, 2025

llvmbot commented May 13, 2025 •

edited

Loading

heiher commented May 13, 2025

xen0n May 13, 2025

heiher May 13, 2025

xen0n May 13, 2025

heiher May 13, 2025

xen0n May 13, 2025

heiher May 13, 2025

FlyGoat May 13, 2025

xen0n May 13, 2025

heiher May 13, 2025

[LoongArch] Introduce 32s target feature for LA32S ISA extensions #139695

Are you sure you want to change the base?

[LoongArch] Introduce 32s target feature for LA32S ISA extensions #139695

Conversation

heiher commented May 13, 2025

llvmbot commented May 13, 2025 • edited Loading

heiher commented May 13, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

[LoongArch] Introduce `32s` target feature for LA32S ISA extensions #139695

[LoongArch] Introduce `32s` target feature for LA32S ISA extensions #139695

llvmbot commented May 13, 2025 •

edited

Loading