Skip to content

[LoongArch] Introduce 32s target feature for LA32S ISA extensions #139695

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

heiher
Copy link
Member

@heiher heiher commented May 13, 2025

According to the offical LoongArch reference manual, the 32-bit LoongArch is divied into two variants: the Reduced version (LA32R) and Standard version (LA32S). LA32S extends LA32R by adding additional instructions, and the 64-bit version (LA64) fully includes the LA32S instruction set.

This patch introduces a new target feature 32s for the LoongArch backend, enabling support for instructions specific to the LA32S variant.

The LA32S exntension includes the following additional instructions:

  • ALSL.W
  • {AND,OR}N
  • B{EQ,NE}Z
  • BITREV.{4B,W}
  • BSTR{INS,PICK}.W
  • BYTEPICK.W
  • CL{O,Z}.W
  • CPUCFG
  • CT{O,Z}.W
  • EXT.W,{B,H}
  • F{LD,ST}X.{D,S}
  • MASK{EQ,NE}Z
  • PC{ADDI,ALAU12I}
  • REVB.2H
  • ROTR{I},W

Additionally, LA32R defines three new instruction aliases:

  • RDCNTID.W RJ => RDTIMEL.W ZERO, RJ
  • RDCNTVH.W RD => RDTIMEH.W RD, ZERO
  • RDCNTVL.W RD => RDTIMEL.W RD, ZERO

@llvmbot llvmbot added mc Machine (object) code backend:loongarch labels May 13, 2025
@llvmbot
Copy link
Member

llvmbot commented May 13, 2025

@llvm/pr-subscribers-mc

@llvm/pr-subscribers-backend-loongarch

Author: hev (heiher)

Changes

According to the offical LoongArch reference manual, the 32-bit LoongArch is divied into two variants: the Reduced version (LA32R) and Standard version (LA32S). LA32S extends LA32R by adding additional instructions, and the 64-bit version (LA64) fully includes the LA32S instruction set.

This patch introduces a new target feature 32s for the LoongArch backend, enabling support for instructions specific to the LA32S variant.

The LA32S exntension includes the following additional instructions:

  • ALSL.W
  • {AND,OR}N
  • B{EQ,NE}Z
  • BITREV.{4B,W}
  • BSTR{INS,PICK}.W
  • BYTEPICK.W
  • CL{O,Z}.W
  • CPUCFG
  • CT{O,Z}.W
  • EXT.W,{B,H}
  • F{LD,ST}X.{D,S}
  • MASK{EQ,NE}Z
  • PC{ADDI,ALAU12I}
  • REVB.2H
  • ROTR{I},W

Additionally, LA32R defines three new instruction aliases:

  • RDCNTID.W RJ => RDTIMEL.W ZERO, RJ
  • RDCNTVH.W RD => RDTIMEH.W RD, ZERO
  • RDCNTVL.W RD => RDTIMEL.W RD, ZERO

Patch is 979.81 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/139695.diff

60 Files Affected:

  • (modified) llvm/lib/Target/LoongArch/LoongArch.td (+12)
  • (modified) llvm/lib/Target/LoongArch/LoongArchExpandAtomicPseudoInsts.cpp (+12-6)
  • (modified) llvm/lib/Target/LoongArch/LoongArchISelDAGToDAG.h (+22)
  • (modified) llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp (+506-11)
  • (modified) llvm/lib/Target/LoongArch/LoongArchISelLowering.h (+4)
  • (modified) llvm/lib/Target/LoongArch/LoongArchInstrInfo.td (+126-49)
  • (modified) llvm/test/CodeGen/LoongArch/alloca.ll (+148-71)
  • (modified) llvm/test/CodeGen/LoongArch/alsl.ll (+273-118)
  • (modified) llvm/test/CodeGen/LoongArch/annotate-tablejump.ll (+1-1)
  • (modified) llvm/test/CodeGen/LoongArch/atomicrmw-cond-sub-clamp.ll (+8-8)
  • (modified) llvm/test/CodeGen/LoongArch/atomicrmw-uinc-udec-wrap.ll (+8-8)
  • (modified) llvm/test/CodeGen/LoongArch/bitreverse.ll (+573-72)
  • (modified) llvm/test/CodeGen/LoongArch/bnez-beqz.ll (+66-31)
  • (modified) llvm/test/CodeGen/LoongArch/branch-relaxation.ll (+101-53)
  • (modified) llvm/test/CodeGen/LoongArch/bstrins_w.ll (+1-1)
  • (modified) llvm/test/CodeGen/LoongArch/bstrpick_w.ll (+1-1)
  • (modified) llvm/test/CodeGen/LoongArch/bswap-bitreverse.ll (+225-39)
  • (modified) llvm/test/CodeGen/LoongArch/bswap.ll (+229-65)
  • (modified) llvm/test/CodeGen/LoongArch/bytepick.ll (+1-1)
  • (modified) llvm/test/CodeGen/LoongArch/ctlz-cttz-ctpop.ll (+803-200)
  • (modified) llvm/test/CodeGen/LoongArch/ctpop-with-lsx.ll (+80-37)
  • (modified) llvm/test/CodeGen/LoongArch/exception-pointer-register.ll (+1-1)
  • (modified) llvm/test/CodeGen/LoongArch/fabs.ll (+2-2)
  • (modified) llvm/test/CodeGen/LoongArch/fcopysign.ll (+2-2)
  • (modified) llvm/test/CodeGen/LoongArch/feature-32bit.ll (+1)
  • (modified) llvm/test/CodeGen/LoongArch/intrinsic-csr-side-effects.ll (+1-1)
  • (modified) llvm/test/CodeGen/LoongArch/ir-instruction/and.ll (+303-136)
  • (modified) llvm/test/CodeGen/LoongArch/ir-instruction/ashr.ll (+20-22)
  • (modified) llvm/test/CodeGen/LoongArch/ir-instruction/atomic-cmpxchg-128.ll (+10-10)
  • (modified) llvm/test/CodeGen/LoongArch/ir-instruction/atomic-cmpxchg.ll (+44-44)
  • (modified) llvm/test/CodeGen/LoongArch/ir-instruction/atomicrmw-fp.ll (+40-40)
  • (modified) llvm/test/CodeGen/LoongArch/ir-instruction/atomicrmw-lam-bh.ll (+557-591)
  • (modified) llvm/test/CodeGen/LoongArch/ir-instruction/atomicrmw-lamcas.ll (+90-90)
  • (modified) llvm/test/CodeGen/LoongArch/ir-instruction/atomicrmw-minmax.ll (+40-40)
  • (modified) llvm/test/CodeGen/LoongArch/ir-instruction/atomicrmw.ll (+4752-2266)
  • (modified) llvm/test/CodeGen/LoongArch/ir-instruction/br.ll (+201-90)
  • (modified) llvm/test/CodeGen/LoongArch/ir-instruction/double-convert.ll (+9-8)
  • (modified) llvm/test/CodeGen/LoongArch/ir-instruction/fcmp-dbl.ll (+1-1)
  • (modified) llvm/test/CodeGen/LoongArch/ir-instruction/fcmp-flt.ll (+1-1)
  • (modified) llvm/test/CodeGen/LoongArch/ir-instruction/float-convert.ll (+18-16)
  • (modified) llvm/test/CodeGen/LoongArch/ir-instruction/load-store-fp.ll (+2-2)
  • (modified) llvm/test/CodeGen/LoongArch/ir-instruction/load-store.ll (+818-390)
  • (modified) llvm/test/CodeGen/LoongArch/ir-instruction/lshr.ll (+132-57)
  • (modified) llvm/test/CodeGen/LoongArch/ir-instruction/mul.ll (+1276-594)
  • (modified) llvm/test/CodeGen/LoongArch/ir-instruction/sdiv-udiv-srem-urem.ll (+1041-490)
  • (modified) llvm/test/CodeGen/LoongArch/ir-instruction/select-bare-int.ll (+36-25)
  • (modified) llvm/test/CodeGen/LoongArch/ir-instruction/select-fpcc-int.ll (+112-84)
  • (modified) llvm/test/CodeGen/LoongArch/ir-instruction/select-icc-int.ll (+50-46)
  • (modified) llvm/test/CodeGen/LoongArch/ir-instruction/sext-zext-trunc.ll (+281-121)
  • (modified) llvm/test/CodeGen/LoongArch/ir-instruction/shl.ll (+13-11)
  • (modified) llvm/test/CodeGen/LoongArch/jump-table.ll (+2-2)
  • (modified) llvm/test/CodeGen/LoongArch/rotl-rotr.ll (+711-320)
  • (modified) llvm/test/CodeGen/LoongArch/select-to-shiftand.ll (+4-4)
  • (modified) llvm/test/CodeGen/LoongArch/shift-masked-shamt.ll (+40-40)
  • (modified) llvm/test/CodeGen/LoongArch/smul-with-overflow.ll (+132-140)
  • (modified) llvm/test/CodeGen/LoongArch/stack-realignment-with-variable-sized-objects.ll (+1-1)
  • (modified) llvm/test/CodeGen/LoongArch/typepromotion-overflow.ll (+468-222)
  • (modified) llvm/test/MC/LoongArch/Basic/Integer/atomic.s (+8-8)
  • (modified) llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/loongarch_generated_funcs.ll.generated.expected (+3-3)
  • (modified) llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/loongarch_generated_funcs.ll.nogenerated.expected (+3-3)
diff --git a/llvm/lib/Target/LoongArch/LoongArch.td b/llvm/lib/Target/LoongArch/LoongArch.td
index 5fd52babfc6ec..707d2de23cdfe 100644
--- a/llvm/lib/Target/LoongArch/LoongArch.td
+++ b/llvm/lib/Target/LoongArch/LoongArch.td
@@ -32,6 +32,14 @@ def IsLA32
 defvar LA32 = DefaultMode;
 def LA64 : HwMode<"+64bit", [IsLA64]>;
 
+// LoongArch 32-bit is divided into variants, the reduced 32-bit variant (LA32R)
+// and the standard 32-bit variant (LA32S).
+def Feature32S
+    : SubtargetFeature<"32s", "Has32S", "true",
+                       "LA32 Standard Basic Instruction Extension">;
+def Has32S : Predicate<"Subtarget->has32S()">;
+def Not32S : Predicate<"!Subtarget->has32S()">;
+
 // Single Precision floating point
 def FeatureBasicF
     : SubtargetFeature<"f", "HasBasicF", "true",
@@ -159,11 +167,13 @@ include "LoongArchInstrInfo.td"
 
 def : ProcessorModel<"generic-la32", NoSchedModel, [Feature32Bit]>;
 def : ProcessorModel<"generic-la64", NoSchedModel, [Feature64Bit,
+                                                    Feature32S,
                                                     FeatureUAL,
                                                     FeatureExtLSX]>;
 
 // Generic 64-bit processor with double-precision floating-point support.
 def : ProcessorModel<"loongarch64", NoSchedModel, [Feature64Bit,
+                                                   Feature32S,
                                                    FeatureUAL,
                                                    FeatureBasicD]>;
 
@@ -172,12 +182,14 @@ def : ProcessorModel<"loongarch64", NoSchedModel, [Feature64Bit,
 def : ProcessorModel<"generic", NoSchedModel, []>;
 
 def : ProcessorModel<"la464", NoSchedModel, [Feature64Bit,
+                                             Feature32S,
                                              FeatureUAL,
                                              FeatureExtLASX,
                                              FeatureExtLVZ,
                                              FeatureExtLBT]>;
 
 def : ProcessorModel<"la664", NoSchedModel, [Feature64Bit,
+                                             Feature32S,
                                              FeatureUAL,
                                              FeatureExtLASX,
                                              FeatureExtLVZ,
diff --git a/llvm/lib/Target/LoongArch/LoongArchExpandAtomicPseudoInsts.cpp b/llvm/lib/Target/LoongArch/LoongArchExpandAtomicPseudoInsts.cpp
index 27d20390eb6ae..3be012feb2385 100644
--- a/llvm/lib/Target/LoongArch/LoongArchExpandAtomicPseudoInsts.cpp
+++ b/llvm/lib/Target/LoongArch/LoongArchExpandAtomicPseudoInsts.cpp
@@ -214,8 +214,9 @@ static void doAtomicBinOpExpansion(const LoongArchInstrInfo *TII,
       .addReg(ScratchReg)
       .addReg(AddrReg)
       .addImm(0);
-  BuildMI(LoopMBB, DL, TII->get(LoongArch::BEQZ))
+  BuildMI(LoopMBB, DL, TII->get(LoongArch::BEQ))
       .addReg(ScratchReg)
+      .addReg(LoongArch::R0)
       .addMBB(LoopMBB);
 }
 
@@ -296,8 +297,9 @@ static void doMaskedAtomicBinOpExpansion(
       .addReg(ScratchReg)
       .addReg(AddrReg)
       .addImm(0);
-  BuildMI(LoopMBB, DL, TII->get(LoongArch::BEQZ))
+  BuildMI(LoopMBB, DL, TII->get(LoongArch::BEQ))
       .addReg(ScratchReg)
+      .addReg(LoongArch::R0)
       .addMBB(LoopMBB);
 }
 
@@ -454,8 +456,9 @@ bool LoongArchExpandAtomicPseudo::expandAtomicMinMaxOp(
       .addReg(Scratch1Reg)
       .addReg(AddrReg)
       .addImm(0);
-  BuildMI(LoopTailMBB, DL, TII->get(LoongArch::BEQZ))
+  BuildMI(LoopTailMBB, DL, TII->get(LoongArch::BEQ))
       .addReg(Scratch1Reg)
+      .addReg(LoongArch::R0)
       .addMBB(LoopHeadMBB);
 
   NextMBBI = MBB.end();
@@ -529,8 +532,9 @@ bool LoongArchExpandAtomicPseudo::expandAtomicCmpXchg(
         .addReg(ScratchReg)
         .addReg(AddrReg)
         .addImm(0);
-    BuildMI(LoopTailMBB, DL, TII->get(LoongArch::BEQZ))
+    BuildMI(LoopTailMBB, DL, TII->get(LoongArch::BEQ))
         .addReg(ScratchReg)
+        .addReg(LoongArch::R0)
         .addMBB(LoopHeadMBB);
     BuildMI(LoopTailMBB, DL, TII->get(LoongArch::B)).addMBB(DoneMBB);
   } else {
@@ -569,8 +573,9 @@ bool LoongArchExpandAtomicPseudo::expandAtomicCmpXchg(
         .addReg(ScratchReg)
         .addReg(AddrReg)
         .addImm(0);
-    BuildMI(LoopTailMBB, DL, TII->get(LoongArch::BEQZ))
+    BuildMI(LoopTailMBB, DL, TII->get(LoongArch::BEQ))
         .addReg(ScratchReg)
+        .addReg(LoongArch::R0)
         .addMBB(LoopHeadMBB);
     BuildMI(LoopTailMBB, DL, TII->get(LoongArch::B)).addMBB(DoneMBB);
   }
@@ -677,8 +682,9 @@ bool LoongArchExpandAtomicPseudo::expandAtomicCmpXchg128(
       .addReg(ScratchReg)
       .addReg(NewValHiReg)
       .addReg(AddrReg);
-  BuildMI(LoopTailMBB, DL, TII->get(LoongArch::BEQZ))
+  BuildMI(LoopTailMBB, DL, TII->get(LoongArch::BEQ))
       .addReg(ScratchReg)
+      .addReg(LoongArch::R0)
       .addMBB(LoopHeadMBB);
   BuildMI(LoopTailMBB, DL, TII->get(LoongArch::B)).addMBB(DoneMBB);
   int hint;
diff --git a/llvm/lib/Target/LoongArch/LoongArchISelDAGToDAG.h b/llvm/lib/Target/LoongArch/LoongArchISelDAGToDAG.h
index 8a7eba418d804..e94f249c14be2 100644
--- a/llvm/lib/Target/LoongArch/LoongArchISelDAGToDAG.h
+++ b/llvm/lib/Target/LoongArch/LoongArchISelDAGToDAG.h
@@ -64,6 +64,28 @@ class LoongArchDAGToDAGISel : public SelectionDAGISel {
   bool selectVSplatUimmInvPow2(SDValue N, SDValue &SplatImm) const;
   bool selectVSplatUimmPow2(SDValue N, SDValue &SplatImm) const;
 
+  // Return the LoongArch branch opcode that matches the given DAG integer
+  // condition code. The CondCode must be one of those supported by the
+  // LoongArch ISA (see translateSetCCForBranch).
+  static unsigned getBranchOpcForIntCC(ISD::CondCode CC) {
+    switch (CC) {
+    default:
+      llvm_unreachable("Unsupported CondCode");
+    case ISD::SETEQ:
+      return LoongArch::BEQ;
+    case ISD::SETNE:
+      return LoongArch::BNE;
+    case ISD::SETLT:
+      return LoongArch::BLT;
+    case ISD::SETGE:
+      return LoongArch::BGE;
+    case ISD::SETULT:
+      return LoongArch::BLTU;
+    case ISD::SETUGE:
+      return LoongArch::BGEU;
+    }
+  }
+
 // Include the pieces autogenerated from the target description.
 #include "LoongArchGenDAGISel.inc"
 };
diff --git a/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp b/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
index b729b4ea6f9b4..6e3e1396e6aeb 100644
--- a/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
+++ b/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
@@ -18,6 +18,7 @@
 #include "LoongArchSubtarget.h"
 #include "MCTargetDesc/LoongArchBaseInfo.h"
 #include "MCTargetDesc/LoongArchMCTargetDesc.h"
+#include "llvm/ADT/SmallSet.h"
 #include "llvm/ADT/Statistic.h"
 #include "llvm/ADT/StringExtras.h"
 #include "llvm/CodeGen/ISDOpcodes.h"
@@ -102,15 +103,26 @@ LoongArchTargetLowering::LoongArchTargetLowering(const TargetMachine &TM,
 
   setOperationAction(ISD::PREFETCH, MVT::Other, Custom);
 
-  // Expand bitreverse.i16 with native-width bitrev and shift for now, before
-  // we get to know which of sll and revb.2h is faster.
-  setOperationAction(ISD::BITREVERSE, MVT::i8, Custom);
-  setOperationAction(ISD::BITREVERSE, GRLenVT, Legal);
-
-  // LA32 does not have REVB.2W and REVB.D due to the 64-bit operands, and
-  // the narrower REVB.W does not exist. But LA32 does have REVB.2H, so i16
-  // and i32 could still be byte-swapped relatively cheaply.
-  setOperationAction(ISD::BSWAP, MVT::i16, Custom);
+  // BITREV/REVB requires the 32S feature.
+  if (STI.has32S()) {
+    // Expand bitreverse.i16 with native-width bitrev and shift for now, before
+    // we get to know which of sll and revb.2h is faster.
+    setOperationAction(ISD::BITREVERSE, MVT::i8, Custom);
+    setOperationAction(ISD::BITREVERSE, GRLenVT, Legal);
+
+    // LA32 does not have REVB.2W and REVB.D due to the 64-bit operands, and
+    // the narrower REVB.W does not exist. But LA32 does have REVB.2H, so i16
+    // and i32 could still be byte-swapped relatively cheaply.
+    setOperationAction(ISD::BSWAP, MVT::i16, Custom);
+  } else {
+    setOperationAction(ISD::BSWAP, GRLenVT, Expand);
+    setOperationAction(ISD::CTTZ, GRLenVT, Expand);
+    setOperationAction(ISD::CTLZ, GRLenVT, Expand);
+    setOperationAction(ISD::ROTR, GRLenVT, Expand);
+    setOperationAction(ISD::SELECT, GRLenVT, Custom);
+    setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i8, Expand);
+    setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i16, Expand);
+  }
 
   setOperationAction(ISD::BR_JT, MVT::Other, Expand);
   setOperationAction(ISD::BR_CC, GRLenVT, Expand);
@@ -476,6 +488,8 @@ SDValue LoongArchTargetLowering::LowerOperation(SDValue Op,
     return lowerSCALAR_TO_VECTOR(Op, DAG);
   case ISD::PREFETCH:
     return lowerPREFETCH(Op, DAG);
+  case ISD::SELECT:
+    return lowerSELECT(Op, DAG);
   }
   return SDValue();
 }
@@ -492,6 +506,327 @@ SDValue LoongArchTargetLowering::lowerPREFETCH(SDValue Op,
   return Op;
 }
 
+// Return true if Val is equal to (setcc LHS, RHS, CC).
+// Return false if Val is the inverse of (setcc LHS, RHS, CC).
+// Otherwise, return std::nullopt.
+static std::optional<bool> matchSetCC(SDValue LHS, SDValue RHS,
+                                      ISD::CondCode CC, SDValue Val) {
+  assert(Val->getOpcode() == ISD::SETCC);
+  SDValue LHS2 = Val.getOperand(0);
+  SDValue RHS2 = Val.getOperand(1);
+  ISD::CondCode CC2 = cast<CondCodeSDNode>(Val.getOperand(2))->get();
+
+  if (LHS == LHS2 && RHS == RHS2) {
+    if (CC == CC2)
+      return true;
+    if (CC == ISD::getSetCCInverse(CC2, LHS2.getValueType()))
+      return false;
+  } else if (LHS == RHS2 && RHS == LHS2) {
+    CC2 = ISD::getSetCCSwappedOperands(CC2);
+    if (CC == CC2)
+      return true;
+    if (CC == ISD::getSetCCInverse(CC2, LHS2.getValueType()))
+      return false;
+  }
+
+  return std::nullopt;
+}
+
+static SDValue combineSelectToBinOp(SDNode *N, SelectionDAG &DAG,
+                                    const LoongArchSubtarget &Subtarget) {
+  SDValue CondV = N->getOperand(0);
+  SDValue TrueV = N->getOperand(1);
+  SDValue FalseV = N->getOperand(2);
+  MVT VT = N->getSimpleValueType(0);
+  SDLoc DL(N);
+
+  // (select c, -1, y) -> -c | y
+  if (isAllOnesConstant(TrueV)) {
+    SDValue Neg = DAG.getNegative(CondV, DL, VT);
+    return DAG.getNode(ISD::OR, DL, VT, Neg, DAG.getFreeze(FalseV));
+  }
+  // (select c, y, -1) -> (c-1) | y
+  if (isAllOnesConstant(FalseV)) {
+    SDValue Neg =
+        DAG.getNode(ISD::ADD, DL, VT, CondV, DAG.getAllOnesConstant(DL, VT));
+    return DAG.getNode(ISD::OR, DL, VT, Neg, DAG.getFreeze(TrueV));
+  }
+
+  // (select c, 0, y) -> (c-1) & y
+  if (isNullConstant(TrueV)) {
+    SDValue Neg =
+        DAG.getNode(ISD::ADD, DL, VT, CondV, DAG.getAllOnesConstant(DL, VT));
+    return DAG.getNode(ISD::AND, DL, VT, Neg, DAG.getFreeze(FalseV));
+  }
+  // (select c, y, 0) -> -c & y
+  if (isNullConstant(FalseV)) {
+    SDValue Neg = DAG.getNegative(CondV, DL, VT);
+    return DAG.getNode(ISD::AND, DL, VT, Neg, DAG.getFreeze(TrueV));
+  }
+
+  // select c, ~x, x --> xor -c, x
+  if (isa<ConstantSDNode>(TrueV) && isa<ConstantSDNode>(FalseV)) {
+    const APInt &TrueVal = TrueV->getAsAPIntVal();
+    const APInt &FalseVal = FalseV->getAsAPIntVal();
+    if (~TrueVal == FalseVal) {
+      SDValue Neg = DAG.getNegative(CondV, DL, VT);
+      return DAG.getNode(ISD::XOR, DL, VT, Neg, FalseV);
+    }
+  }
+
+  // Try to fold (select (setcc lhs, rhs, cc), truev, falsev) into bitwise ops
+  // when both truev and falsev are also setcc.
+  if (CondV.getOpcode() == ISD::SETCC && TrueV.getOpcode() == ISD::SETCC &&
+      FalseV.getOpcode() == ISD::SETCC) {
+    SDValue LHS = CondV.getOperand(0);
+    SDValue RHS = CondV.getOperand(1);
+    ISD::CondCode CC = cast<CondCodeSDNode>(CondV.getOperand(2))->get();
+
+    // (select x, x, y) -> x | y
+    // (select !x, x, y) -> x & y
+    if (std::optional<bool> MatchResult = matchSetCC(LHS, RHS, CC, TrueV)) {
+      return DAG.getNode(*MatchResult ? ISD::OR : ISD::AND, DL, VT, TrueV,
+                         DAG.getFreeze(FalseV));
+    }
+    // (select x, y, x) -> x & y
+    // (select !x, y, x) -> x | y
+    if (std::optional<bool> MatchResult = matchSetCC(LHS, RHS, CC, FalseV)) {
+      return DAG.getNode(*MatchResult ? ISD::AND : ISD::OR, DL, VT,
+                         DAG.getFreeze(TrueV), FalseV);
+    }
+  }
+
+  return SDValue();
+}
+
+// Transform `binOp (select cond, x, c0), c1` where `c0` and `c1` are constants
+// into `select cond, binOp(x, c1), binOp(c0, c1)` if profitable.
+// For now we only consider transformation profitable if `binOp(c0, c1)` ends up
+// being `0` or `-1`. In such cases we can replace `select` with `and`.
+// TODO: Should we also do this if `binOp(c0, c1)` is cheaper to materialize
+// than `c0`?
+static SDValue
+foldBinOpIntoSelectIfProfitable(SDNode *BO, SelectionDAG &DAG,
+                                const LoongArchSubtarget &Subtarget) {
+  unsigned SelOpNo = 0;
+  SDValue Sel = BO->getOperand(0);
+  if (Sel.getOpcode() != ISD::SELECT || !Sel.hasOneUse()) {
+    SelOpNo = 1;
+    Sel = BO->getOperand(1);
+  }
+
+  if (Sel.getOpcode() != ISD::SELECT || !Sel.hasOneUse())
+    return SDValue();
+
+  unsigned ConstSelOpNo = 1;
+  unsigned OtherSelOpNo = 2;
+  if (!isa<ConstantSDNode>(Sel->getOperand(ConstSelOpNo))) {
+    ConstSelOpNo = 2;
+    OtherSelOpNo = 1;
+  }
+  SDValue ConstSelOp = Sel->getOperand(ConstSelOpNo);
+  ConstantSDNode *ConstSelOpNode = dyn_cast<ConstantSDNode>(ConstSelOp);
+  if (!ConstSelOpNode || ConstSelOpNode->isOpaque())
+    return SDValue();
+
+  SDValue ConstBinOp = BO->getOperand(SelOpNo ^ 1);
+  ConstantSDNode *ConstBinOpNode = dyn_cast<ConstantSDNode>(ConstBinOp);
+  if (!ConstBinOpNode || ConstBinOpNode->isOpaque())
+    return SDValue();
+
+  SDLoc DL(Sel);
+  EVT VT = BO->getValueType(0);
+
+  SDValue NewConstOps[2] = {ConstSelOp, ConstBinOp};
+  if (SelOpNo == 1)
+    std::swap(NewConstOps[0], NewConstOps[1]);
+
+  SDValue NewConstOp =
+      DAG.FoldConstantArithmetic(BO->getOpcode(), DL, VT, NewConstOps);
+  if (!NewConstOp)
+    return SDValue();
+
+  const APInt &NewConstAPInt = NewConstOp->getAsAPIntVal();
+  if (!NewConstAPInt.isZero() && !NewConstAPInt.isAllOnes())
+    return SDValue();
+
+  SDValue OtherSelOp = Sel->getOperand(OtherSelOpNo);
+  SDValue NewNonConstOps[2] = {OtherSelOp, ConstBinOp};
+  if (SelOpNo == 1)
+    std::swap(NewNonConstOps[0], NewNonConstOps[1]);
+  SDValue NewNonConstOp = DAG.getNode(BO->getOpcode(), DL, VT, NewNonConstOps);
+
+  SDValue NewT = (ConstSelOpNo == 1) ? NewConstOp : NewNonConstOp;
+  SDValue NewF = (ConstSelOpNo == 1) ? NewNonConstOp : NewConstOp;
+  return DAG.getSelect(DL, VT, Sel.getOperand(0), NewT, NewF);
+}
+
+// Changes the condition code and swaps operands if necessary, so the SetCC
+// operation matches one of the comparisons supported directly by branches
+// in the LoongArch ISA. May adjust compares to favor compare with 0 over
+// compare with 1/-1.
+static void translateSetCCForBranch(const SDLoc &DL, SDValue &LHS, SDValue &RHS,
+                                    ISD::CondCode &CC, SelectionDAG &DAG) {
+  // If this is a single bit test that can't be handled by ANDI, shift the
+  // bit to be tested to the MSB and perform a signed compare with 0.
+  if (isIntEqualitySetCC(CC) && isNullConstant(RHS) &&
+      LHS.getOpcode() == ISD::AND && LHS.hasOneUse() &&
+      isa<ConstantSDNode>(LHS.getOperand(1))) {
+    uint64_t Mask = LHS.getConstantOperandVal(1);
+    if ((isPowerOf2_64(Mask) || isMask_64(Mask)) && !isInt<12>(Mask)) {
+      unsigned ShAmt = 0;
+      if (isPowerOf2_64(Mask)) {
+        CC = CC == ISD::SETEQ ? ISD::SETGE : ISD::SETLT;
+        ShAmt = LHS.getValueSizeInBits() - 1 - Log2_64(Mask);
+      } else {
+        ShAmt = LHS.getValueSizeInBits() - llvm::bit_width(Mask);
+      }
+
+      LHS = LHS.getOperand(0);
+      if (ShAmt != 0)
+        LHS = DAG.getNode(ISD::SHL, DL, LHS.getValueType(), LHS,
+                          DAG.getConstant(ShAmt, DL, LHS.getValueType()));
+      return;
+    }
+  }
+
+  if (auto *RHSC = dyn_cast<ConstantSDNode>(RHS)) {
+    int64_t C = RHSC->getSExtValue();
+    switch (CC) {
+    default:
+      break;
+    case ISD::SETGT:
+      // Convert X > -1 to X >= 0.
+      if (C == -1) {
+        RHS = DAG.getConstant(0, DL, RHS.getValueType());
+        CC = ISD::SETGE;
+        return;
+      }
+      break;
+    case ISD::SETLT:
+      // Convert X < 1 to 0 >= X.
+      if (C == 1) {
+        RHS = LHS;
+        LHS = DAG.getConstant(0, DL, RHS.getValueType());
+        CC = ISD::SETGE;
+        return;
+      }
+      break;
+    }
+  }
+
+  switch (CC) {
+  default:
+    break;
+  case ISD::SETGT:
+  case ISD::SETLE:
+  case ISD::SETUGT:
+  case ISD::SETULE:
+    CC = ISD::getSetCCSwappedOperands(CC);
+    std::swap(LHS, RHS);
+    break;
+  }
+}
+
+SDValue LoongArchTargetLowering::lowerSELECT(SDValue Op,
+                                             SelectionDAG &DAG) const {
+  SDValue CondV = Op.getOperand(0);
+  SDValue TrueV = Op.getOperand(1);
+  SDValue FalseV = Op.getOperand(2);
+  SDLoc DL(Op);
+  MVT VT = Op.getSimpleValueType();
+  MVT GRLenVT = Subtarget.getGRLenVT();
+
+  if (SDValue V = combineSelectToBinOp(Op.getNode(), DAG, Subtarget))
+    return V;
+
+  if (Op.hasOneUse()) {
+    unsigned UseOpc = Op->user_begin()->getOpcode();
+    if (isBinOp(UseOpc) && DAG.isSafeToSpeculativelyExecute(UseOpc)) {
+      SDNode *BinOp = *Op->user_begin();
+      if (SDValue NewSel = foldBinOpIntoSelectIfProfitable(*Op->user_begin(),
+                                                           DAG, Subtarget)) {
+        DAG.ReplaceAllUsesWith(BinOp, &NewSel);
+        // Opcode check is necessary because foldBinOpIntoSelectIfProfitable
+        // may return a constant node and cause crash in lowerSELECT.
+        if (NewSel.getOpcode() == ISD::SELECT)
+          return lowerSELECT(NewSel, DAG);
+        return NewSel;
+      }
+    }
+  }
+
+  // If the condition is not an integer SETCC which operates on GRLenVT, we need
+  // to emit a LoongArchISD::SELECT_CC comparing the condition to zero. i.e.:
+  // (select condv, truev, falsev)
+  // -> (loongarchisd::select_cc condv, zero, setne, truev, falsev)
+  if (CondV.getOpcode() != ISD::SETCC ||
+      CondV.getOperand(0).getSimpleValueType() != GRLenVT) {
+    SDValue Zero = DAG.getConstant(0, DL, GRLenVT);
+    SDValue SetNE = DAG.getCondCode(ISD::SETNE);
+
+    SDValue Ops[] = {CondV, Zero, SetNE, TrueV, FalseV};
+
+    return DAG.getNode(LoongArchISD::SELECT_CC, DL, VT, Ops);
+  }
+
+  // If the CondV is the output of a SETCC node which operates on GRLenVT
+  // inputs, then merge the SETCC node into the lowered LoongArchISD::SELECT_CC
+  // to take advantage of the integer compare+branch instructions. i.e.: (select
+  // (setcc lhs, rhs, cc), truev, falsev)
+  // -> (loongarchisd::select_cc lhs, rhs, cc, truev, falsev)
+  SDValue LHS = CondV.getOperand(0);
+  SDValue RHS = CondV.getOperand(1);
+  ISD::CondCode CCVal = cast<CondCodeSDNode>(CondV.getOperand(2))->get();
+
+  // Special case for a select of 2 constants that have a difference of 1.
+  // Normally this is done by DAGCombine, but if the select is introduced by
+  // type legalization or op legalization, we miss it. Restricting to SETLT
+  // case for now because that is what signed saturating add/sub need.
+  // FIXME: We don't need the condition to be SETLT or even a SETCC,
+  // but we would probably want to swap the true/false values if the condition
+  // is SETGE/SETLE to avoid an XORI.
+  if (isa<ConstantSDNode>(TrueV) && isa<ConstantSDNode>(FalseV) &&
+      CCVal == ISD::SETLT) {
+    const APInt &TrueVal = TrueV->getAsAPIntVal();
+    const APInt &FalseVal = FalseV->getAsAPIntVal();
+    if (TrueVal - 1 == FalseVal)
+      return DAG.getNode(ISD::ADD, DL, VT, CondV, FalseV);
+    if (TrueVal + 1 == FalseVal)
+      return DAG.getNode(ISD::SUB, DL, VT, FalseV, CondV);
+  }
+
+  translateSetCCForBranch(DL, LHS, RHS, CCVal, DAG);
+  // 1 < x ? x : 1 -> 0 < x ? x : 1
+  if (isOneConstant(LHS) && (CCVal == ISD::SETLT || CCVal == ISD::SETULT) &&
+      RHS == TrueV && LHS == FalseV) {
+    LHS = DAG.getConstant(0, DL, VT);
+    // 0 <u x is the same as x != 0.
+    if (CCVal == ISD::SETULT) {
+      std::swap(LHS, RHS);
+      CCVal = ISD::SETNE;
+    }
+  }
+
+  // x <s -1 ? x : -1 -> x <s 0 ? x : -1
+  if (isAllOnesConstant(RH...
[truncated]

According to the offical LoongArch reference manual, the 32-bit
LoongArch is divied into two variants: the Reduced version (LA32R)
and Standard version (LA32S). LA32S extends LA32R by adding additional
instructions, and the 64-bit version (LA64) fully includes the LA32S
instruction set.

This patch introduces a new target feature `32s` for the LoongArch
backend, enabling support for instructions specific to the LA32S
variant.

The LA32S exntension includes the following additional instructions:

- ALSL.W
- {AND,OR}N
- B{EQ,NE}Z
- BITREV.{4B,W}
- BSTR{INS,PICK}.W
- BYTEPICK.W
- CL{O,Z}.W
- CPUCFG
- CT{O,Z}.W
- EXT.W,{B,H}
- F{LD,ST}X.{D,S}
- MASK{EQ,NE}Z
- PC{ADDI,ALAU12I}
- REVB.2H
- ROTR{I},W

Additionally, LA32R defines three new instruction aliases:

- RDCNTID.W RJ => RDTIMEL.W ZERO, RJ
- RDCNTVH.W RD => RDTIMEH.W RD, ZERO
- RDCNTVL.W RD => RDTIMEL.W RD, ZERO
@heiher
Copy link
Member Author

heiher commented May 13, 2025

cc @FlyGoat @xry111 @xen0n

@@ -214,8 +214,9 @@ static void doAtomicBinOpExpansion(const LoongArchInstrInfo *TII,
.addReg(ScratchReg)
.addReg(AddrReg)
.addImm(0);
BuildMI(LoopMBB, DL, TII->get(LoongArch::BEQZ))
BuildMI(LoopMBB, DL, TII->get(LoongArch::BEQ))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BEQZ has the advantage over BEQ in that its reach is broader by 5 bits (width of a GPR slot), so if some of the changed bits potentially refers to a remote MBB, we may want to preserve them? I'm not looking at this code as closely as I'd prefer because I'm just battling my procrastination and doing a quick review here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. While BEQZ does offer a wider branch range than BEQ, their latency and throughput are the same. For expanding the pseudo-atomic seqences, I believe BEQ's range is sufficient. I chose not to split it further to avoid unnecessary divergence between the 32-bit and 64-bit code paths.

setOperationAction(ISD::CTTZ, GRLenVT, Expand);
setOperationAction(ISD::CTLZ, GRLenVT, Expand);
setOperationAction(ISD::ROTR, GRLenVT, Expand);
setOperationAction(ISD::SELECT, GRLenVT, Custom);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the custom expansion for select absolutely necessary for the LA32 bringup? I see a lot of added code is presumably for optimizing select performance in the absence of the maskeqz + masknez combo, but I'd suggest splitting the changes into another PR if it's not essential to the bringup, and keep the bringup here focused.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this part is required. Without MASK{EQ,NE}Z, there's no way to lower ISD::SELECT properly, so the custom expansion is essential for the bringup.

def BNEZ : BrCCZ_1RI21<0x44000000>;

// Other Miscellaneous Instructions
def CPUCFG : ALU_2R<0x00006c00>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given CPUCFG is one of the few ways an application can query its runtime CPU's capabilities, and the only one way if running without ELF HWCAP support, we may want to allow it despite the spec not mandating it. I know, and personally insist that the LA32R spec is wrong in not doing so, but maybe acting against the spec this time will do the users more good than harm.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally agree. IIRC, @FlyGoat has considered emulating the CPUCFG instruction for LA32R in the Linux kernel, so compiler-side support would be necessary. Unless there are objections, I plan to go ahead with this.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I think the same for IOCSR (probably all privileged instructions).

@@ -1054,6 +1068,8 @@ def AMCAS__DB_D : AMCAS_3R<0x385b8000>;
def LL_D : LLBase<0x22000000>;
def SC_D : SCBase<0x23000000>;
def SC_Q : SCBase_128<0x38570000>;
def LLACQ_W : LLBase_ACQ<0x38578000>;
def SCREL_W : SCBase_REL<0x38578400>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be a separate change that can be fast-tracked?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They aren't new additions; rather, they are being moved out of the Has32S predicates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:loongarch mc Machine (object) code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants