-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[LoongArch] Introduce 32s
target feature for LA32S ISA extensions
#139695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-mc @llvm/pr-subscribers-backend-loongarch Author: hev (heiher) ChangesAccording to the offical LoongArch reference manual, the 32-bit LoongArch is divied into two variants: the Reduced version (LA32R) and Standard version (LA32S). LA32S extends LA32R by adding additional instructions, and the 64-bit version (LA64) fully includes the LA32S instruction set. This patch introduces a new target feature The LA32S exntension includes the following additional instructions:
Additionally, LA32R defines three new instruction aliases:
Patch is 979.81 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/139695.diff 60 Files Affected:
diff --git a/llvm/lib/Target/LoongArch/LoongArch.td b/llvm/lib/Target/LoongArch/LoongArch.td
index 5fd52babfc6ec..707d2de23cdfe 100644
--- a/llvm/lib/Target/LoongArch/LoongArch.td
+++ b/llvm/lib/Target/LoongArch/LoongArch.td
@@ -32,6 +32,14 @@ def IsLA32
defvar LA32 = DefaultMode;
def LA64 : HwMode<"+64bit", [IsLA64]>;
+// LoongArch 32-bit is divided into variants, the reduced 32-bit variant (LA32R)
+// and the standard 32-bit variant (LA32S).
+def Feature32S
+ : SubtargetFeature<"32s", "Has32S", "true",
+ "LA32 Standard Basic Instruction Extension">;
+def Has32S : Predicate<"Subtarget->has32S()">;
+def Not32S : Predicate<"!Subtarget->has32S()">;
+
// Single Precision floating point
def FeatureBasicF
: SubtargetFeature<"f", "HasBasicF", "true",
@@ -159,11 +167,13 @@ include "LoongArchInstrInfo.td"
def : ProcessorModel<"generic-la32", NoSchedModel, [Feature32Bit]>;
def : ProcessorModel<"generic-la64", NoSchedModel, [Feature64Bit,
+ Feature32S,
FeatureUAL,
FeatureExtLSX]>;
// Generic 64-bit processor with double-precision floating-point support.
def : ProcessorModel<"loongarch64", NoSchedModel, [Feature64Bit,
+ Feature32S,
FeatureUAL,
FeatureBasicD]>;
@@ -172,12 +182,14 @@ def : ProcessorModel<"loongarch64", NoSchedModel, [Feature64Bit,
def : ProcessorModel<"generic", NoSchedModel, []>;
def : ProcessorModel<"la464", NoSchedModel, [Feature64Bit,
+ Feature32S,
FeatureUAL,
FeatureExtLASX,
FeatureExtLVZ,
FeatureExtLBT]>;
def : ProcessorModel<"la664", NoSchedModel, [Feature64Bit,
+ Feature32S,
FeatureUAL,
FeatureExtLASX,
FeatureExtLVZ,
diff --git a/llvm/lib/Target/LoongArch/LoongArchExpandAtomicPseudoInsts.cpp b/llvm/lib/Target/LoongArch/LoongArchExpandAtomicPseudoInsts.cpp
index 27d20390eb6ae..3be012feb2385 100644
--- a/llvm/lib/Target/LoongArch/LoongArchExpandAtomicPseudoInsts.cpp
+++ b/llvm/lib/Target/LoongArch/LoongArchExpandAtomicPseudoInsts.cpp
@@ -214,8 +214,9 @@ static void doAtomicBinOpExpansion(const LoongArchInstrInfo *TII,
.addReg(ScratchReg)
.addReg(AddrReg)
.addImm(0);
- BuildMI(LoopMBB, DL, TII->get(LoongArch::BEQZ))
+ BuildMI(LoopMBB, DL, TII->get(LoongArch::BEQ))
.addReg(ScratchReg)
+ .addReg(LoongArch::R0)
.addMBB(LoopMBB);
}
@@ -296,8 +297,9 @@ static void doMaskedAtomicBinOpExpansion(
.addReg(ScratchReg)
.addReg(AddrReg)
.addImm(0);
- BuildMI(LoopMBB, DL, TII->get(LoongArch::BEQZ))
+ BuildMI(LoopMBB, DL, TII->get(LoongArch::BEQ))
.addReg(ScratchReg)
+ .addReg(LoongArch::R0)
.addMBB(LoopMBB);
}
@@ -454,8 +456,9 @@ bool LoongArchExpandAtomicPseudo::expandAtomicMinMaxOp(
.addReg(Scratch1Reg)
.addReg(AddrReg)
.addImm(0);
- BuildMI(LoopTailMBB, DL, TII->get(LoongArch::BEQZ))
+ BuildMI(LoopTailMBB, DL, TII->get(LoongArch::BEQ))
.addReg(Scratch1Reg)
+ .addReg(LoongArch::R0)
.addMBB(LoopHeadMBB);
NextMBBI = MBB.end();
@@ -529,8 +532,9 @@ bool LoongArchExpandAtomicPseudo::expandAtomicCmpXchg(
.addReg(ScratchReg)
.addReg(AddrReg)
.addImm(0);
- BuildMI(LoopTailMBB, DL, TII->get(LoongArch::BEQZ))
+ BuildMI(LoopTailMBB, DL, TII->get(LoongArch::BEQ))
.addReg(ScratchReg)
+ .addReg(LoongArch::R0)
.addMBB(LoopHeadMBB);
BuildMI(LoopTailMBB, DL, TII->get(LoongArch::B)).addMBB(DoneMBB);
} else {
@@ -569,8 +573,9 @@ bool LoongArchExpandAtomicPseudo::expandAtomicCmpXchg(
.addReg(ScratchReg)
.addReg(AddrReg)
.addImm(0);
- BuildMI(LoopTailMBB, DL, TII->get(LoongArch::BEQZ))
+ BuildMI(LoopTailMBB, DL, TII->get(LoongArch::BEQ))
.addReg(ScratchReg)
+ .addReg(LoongArch::R0)
.addMBB(LoopHeadMBB);
BuildMI(LoopTailMBB, DL, TII->get(LoongArch::B)).addMBB(DoneMBB);
}
@@ -677,8 +682,9 @@ bool LoongArchExpandAtomicPseudo::expandAtomicCmpXchg128(
.addReg(ScratchReg)
.addReg(NewValHiReg)
.addReg(AddrReg);
- BuildMI(LoopTailMBB, DL, TII->get(LoongArch::BEQZ))
+ BuildMI(LoopTailMBB, DL, TII->get(LoongArch::BEQ))
.addReg(ScratchReg)
+ .addReg(LoongArch::R0)
.addMBB(LoopHeadMBB);
BuildMI(LoopTailMBB, DL, TII->get(LoongArch::B)).addMBB(DoneMBB);
int hint;
diff --git a/llvm/lib/Target/LoongArch/LoongArchISelDAGToDAG.h b/llvm/lib/Target/LoongArch/LoongArchISelDAGToDAG.h
index 8a7eba418d804..e94f249c14be2 100644
--- a/llvm/lib/Target/LoongArch/LoongArchISelDAGToDAG.h
+++ b/llvm/lib/Target/LoongArch/LoongArchISelDAGToDAG.h
@@ -64,6 +64,28 @@ class LoongArchDAGToDAGISel : public SelectionDAGISel {
bool selectVSplatUimmInvPow2(SDValue N, SDValue &SplatImm) const;
bool selectVSplatUimmPow2(SDValue N, SDValue &SplatImm) const;
+ // Return the LoongArch branch opcode that matches the given DAG integer
+ // condition code. The CondCode must be one of those supported by the
+ // LoongArch ISA (see translateSetCCForBranch).
+ static unsigned getBranchOpcForIntCC(ISD::CondCode CC) {
+ switch (CC) {
+ default:
+ llvm_unreachable("Unsupported CondCode");
+ case ISD::SETEQ:
+ return LoongArch::BEQ;
+ case ISD::SETNE:
+ return LoongArch::BNE;
+ case ISD::SETLT:
+ return LoongArch::BLT;
+ case ISD::SETGE:
+ return LoongArch::BGE;
+ case ISD::SETULT:
+ return LoongArch::BLTU;
+ case ISD::SETUGE:
+ return LoongArch::BGEU;
+ }
+ }
+
// Include the pieces autogenerated from the target description.
#include "LoongArchGenDAGISel.inc"
};
diff --git a/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp b/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
index b729b4ea6f9b4..6e3e1396e6aeb 100644
--- a/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
+++ b/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
@@ -18,6 +18,7 @@
#include "LoongArchSubtarget.h"
#include "MCTargetDesc/LoongArchBaseInfo.h"
#include "MCTargetDesc/LoongArchMCTargetDesc.h"
+#include "llvm/ADT/SmallSet.h"
#include "llvm/ADT/Statistic.h"
#include "llvm/ADT/StringExtras.h"
#include "llvm/CodeGen/ISDOpcodes.h"
@@ -102,15 +103,26 @@ LoongArchTargetLowering::LoongArchTargetLowering(const TargetMachine &TM,
setOperationAction(ISD::PREFETCH, MVT::Other, Custom);
- // Expand bitreverse.i16 with native-width bitrev and shift for now, before
- // we get to know which of sll and revb.2h is faster.
- setOperationAction(ISD::BITREVERSE, MVT::i8, Custom);
- setOperationAction(ISD::BITREVERSE, GRLenVT, Legal);
-
- // LA32 does not have REVB.2W and REVB.D due to the 64-bit operands, and
- // the narrower REVB.W does not exist. But LA32 does have REVB.2H, so i16
- // and i32 could still be byte-swapped relatively cheaply.
- setOperationAction(ISD::BSWAP, MVT::i16, Custom);
+ // BITREV/REVB requires the 32S feature.
+ if (STI.has32S()) {
+ // Expand bitreverse.i16 with native-width bitrev and shift for now, before
+ // we get to know which of sll and revb.2h is faster.
+ setOperationAction(ISD::BITREVERSE, MVT::i8, Custom);
+ setOperationAction(ISD::BITREVERSE, GRLenVT, Legal);
+
+ // LA32 does not have REVB.2W and REVB.D due to the 64-bit operands, and
+ // the narrower REVB.W does not exist. But LA32 does have REVB.2H, so i16
+ // and i32 could still be byte-swapped relatively cheaply.
+ setOperationAction(ISD::BSWAP, MVT::i16, Custom);
+ } else {
+ setOperationAction(ISD::BSWAP, GRLenVT, Expand);
+ setOperationAction(ISD::CTTZ, GRLenVT, Expand);
+ setOperationAction(ISD::CTLZ, GRLenVT, Expand);
+ setOperationAction(ISD::ROTR, GRLenVT, Expand);
+ setOperationAction(ISD::SELECT, GRLenVT, Custom);
+ setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i8, Expand);
+ setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i16, Expand);
+ }
setOperationAction(ISD::BR_JT, MVT::Other, Expand);
setOperationAction(ISD::BR_CC, GRLenVT, Expand);
@@ -476,6 +488,8 @@ SDValue LoongArchTargetLowering::LowerOperation(SDValue Op,
return lowerSCALAR_TO_VECTOR(Op, DAG);
case ISD::PREFETCH:
return lowerPREFETCH(Op, DAG);
+ case ISD::SELECT:
+ return lowerSELECT(Op, DAG);
}
return SDValue();
}
@@ -492,6 +506,327 @@ SDValue LoongArchTargetLowering::lowerPREFETCH(SDValue Op,
return Op;
}
+// Return true if Val is equal to (setcc LHS, RHS, CC).
+// Return false if Val is the inverse of (setcc LHS, RHS, CC).
+// Otherwise, return std::nullopt.
+static std::optional<bool> matchSetCC(SDValue LHS, SDValue RHS,
+ ISD::CondCode CC, SDValue Val) {
+ assert(Val->getOpcode() == ISD::SETCC);
+ SDValue LHS2 = Val.getOperand(0);
+ SDValue RHS2 = Val.getOperand(1);
+ ISD::CondCode CC2 = cast<CondCodeSDNode>(Val.getOperand(2))->get();
+
+ if (LHS == LHS2 && RHS == RHS2) {
+ if (CC == CC2)
+ return true;
+ if (CC == ISD::getSetCCInverse(CC2, LHS2.getValueType()))
+ return false;
+ } else if (LHS == RHS2 && RHS == LHS2) {
+ CC2 = ISD::getSetCCSwappedOperands(CC2);
+ if (CC == CC2)
+ return true;
+ if (CC == ISD::getSetCCInverse(CC2, LHS2.getValueType()))
+ return false;
+ }
+
+ return std::nullopt;
+}
+
+static SDValue combineSelectToBinOp(SDNode *N, SelectionDAG &DAG,
+ const LoongArchSubtarget &Subtarget) {
+ SDValue CondV = N->getOperand(0);
+ SDValue TrueV = N->getOperand(1);
+ SDValue FalseV = N->getOperand(2);
+ MVT VT = N->getSimpleValueType(0);
+ SDLoc DL(N);
+
+ // (select c, -1, y) -> -c | y
+ if (isAllOnesConstant(TrueV)) {
+ SDValue Neg = DAG.getNegative(CondV, DL, VT);
+ return DAG.getNode(ISD::OR, DL, VT, Neg, DAG.getFreeze(FalseV));
+ }
+ // (select c, y, -1) -> (c-1) | y
+ if (isAllOnesConstant(FalseV)) {
+ SDValue Neg =
+ DAG.getNode(ISD::ADD, DL, VT, CondV, DAG.getAllOnesConstant(DL, VT));
+ return DAG.getNode(ISD::OR, DL, VT, Neg, DAG.getFreeze(TrueV));
+ }
+
+ // (select c, 0, y) -> (c-1) & y
+ if (isNullConstant(TrueV)) {
+ SDValue Neg =
+ DAG.getNode(ISD::ADD, DL, VT, CondV, DAG.getAllOnesConstant(DL, VT));
+ return DAG.getNode(ISD::AND, DL, VT, Neg, DAG.getFreeze(FalseV));
+ }
+ // (select c, y, 0) -> -c & y
+ if (isNullConstant(FalseV)) {
+ SDValue Neg = DAG.getNegative(CondV, DL, VT);
+ return DAG.getNode(ISD::AND, DL, VT, Neg, DAG.getFreeze(TrueV));
+ }
+
+ // select c, ~x, x --> xor -c, x
+ if (isa<ConstantSDNode>(TrueV) && isa<ConstantSDNode>(FalseV)) {
+ const APInt &TrueVal = TrueV->getAsAPIntVal();
+ const APInt &FalseVal = FalseV->getAsAPIntVal();
+ if (~TrueVal == FalseVal) {
+ SDValue Neg = DAG.getNegative(CondV, DL, VT);
+ return DAG.getNode(ISD::XOR, DL, VT, Neg, FalseV);
+ }
+ }
+
+ // Try to fold (select (setcc lhs, rhs, cc), truev, falsev) into bitwise ops
+ // when both truev and falsev are also setcc.
+ if (CondV.getOpcode() == ISD::SETCC && TrueV.getOpcode() == ISD::SETCC &&
+ FalseV.getOpcode() == ISD::SETCC) {
+ SDValue LHS = CondV.getOperand(0);
+ SDValue RHS = CondV.getOperand(1);
+ ISD::CondCode CC = cast<CondCodeSDNode>(CondV.getOperand(2))->get();
+
+ // (select x, x, y) -> x | y
+ // (select !x, x, y) -> x & y
+ if (std::optional<bool> MatchResult = matchSetCC(LHS, RHS, CC, TrueV)) {
+ return DAG.getNode(*MatchResult ? ISD::OR : ISD::AND, DL, VT, TrueV,
+ DAG.getFreeze(FalseV));
+ }
+ // (select x, y, x) -> x & y
+ // (select !x, y, x) -> x | y
+ if (std::optional<bool> MatchResult = matchSetCC(LHS, RHS, CC, FalseV)) {
+ return DAG.getNode(*MatchResult ? ISD::AND : ISD::OR, DL, VT,
+ DAG.getFreeze(TrueV), FalseV);
+ }
+ }
+
+ return SDValue();
+}
+
+// Transform `binOp (select cond, x, c0), c1` where `c0` and `c1` are constants
+// into `select cond, binOp(x, c1), binOp(c0, c1)` if profitable.
+// For now we only consider transformation profitable if `binOp(c0, c1)` ends up
+// being `0` or `-1`. In such cases we can replace `select` with `and`.
+// TODO: Should we also do this if `binOp(c0, c1)` is cheaper to materialize
+// than `c0`?
+static SDValue
+foldBinOpIntoSelectIfProfitable(SDNode *BO, SelectionDAG &DAG,
+ const LoongArchSubtarget &Subtarget) {
+ unsigned SelOpNo = 0;
+ SDValue Sel = BO->getOperand(0);
+ if (Sel.getOpcode() != ISD::SELECT || !Sel.hasOneUse()) {
+ SelOpNo = 1;
+ Sel = BO->getOperand(1);
+ }
+
+ if (Sel.getOpcode() != ISD::SELECT || !Sel.hasOneUse())
+ return SDValue();
+
+ unsigned ConstSelOpNo = 1;
+ unsigned OtherSelOpNo = 2;
+ if (!isa<ConstantSDNode>(Sel->getOperand(ConstSelOpNo))) {
+ ConstSelOpNo = 2;
+ OtherSelOpNo = 1;
+ }
+ SDValue ConstSelOp = Sel->getOperand(ConstSelOpNo);
+ ConstantSDNode *ConstSelOpNode = dyn_cast<ConstantSDNode>(ConstSelOp);
+ if (!ConstSelOpNode || ConstSelOpNode->isOpaque())
+ return SDValue();
+
+ SDValue ConstBinOp = BO->getOperand(SelOpNo ^ 1);
+ ConstantSDNode *ConstBinOpNode = dyn_cast<ConstantSDNode>(ConstBinOp);
+ if (!ConstBinOpNode || ConstBinOpNode->isOpaque())
+ return SDValue();
+
+ SDLoc DL(Sel);
+ EVT VT = BO->getValueType(0);
+
+ SDValue NewConstOps[2] = {ConstSelOp, ConstBinOp};
+ if (SelOpNo == 1)
+ std::swap(NewConstOps[0], NewConstOps[1]);
+
+ SDValue NewConstOp =
+ DAG.FoldConstantArithmetic(BO->getOpcode(), DL, VT, NewConstOps);
+ if (!NewConstOp)
+ return SDValue();
+
+ const APInt &NewConstAPInt = NewConstOp->getAsAPIntVal();
+ if (!NewConstAPInt.isZero() && !NewConstAPInt.isAllOnes())
+ return SDValue();
+
+ SDValue OtherSelOp = Sel->getOperand(OtherSelOpNo);
+ SDValue NewNonConstOps[2] = {OtherSelOp, ConstBinOp};
+ if (SelOpNo == 1)
+ std::swap(NewNonConstOps[0], NewNonConstOps[1]);
+ SDValue NewNonConstOp = DAG.getNode(BO->getOpcode(), DL, VT, NewNonConstOps);
+
+ SDValue NewT = (ConstSelOpNo == 1) ? NewConstOp : NewNonConstOp;
+ SDValue NewF = (ConstSelOpNo == 1) ? NewNonConstOp : NewConstOp;
+ return DAG.getSelect(DL, VT, Sel.getOperand(0), NewT, NewF);
+}
+
+// Changes the condition code and swaps operands if necessary, so the SetCC
+// operation matches one of the comparisons supported directly by branches
+// in the LoongArch ISA. May adjust compares to favor compare with 0 over
+// compare with 1/-1.
+static void translateSetCCForBranch(const SDLoc &DL, SDValue &LHS, SDValue &RHS,
+ ISD::CondCode &CC, SelectionDAG &DAG) {
+ // If this is a single bit test that can't be handled by ANDI, shift the
+ // bit to be tested to the MSB and perform a signed compare with 0.
+ if (isIntEqualitySetCC(CC) && isNullConstant(RHS) &&
+ LHS.getOpcode() == ISD::AND && LHS.hasOneUse() &&
+ isa<ConstantSDNode>(LHS.getOperand(1))) {
+ uint64_t Mask = LHS.getConstantOperandVal(1);
+ if ((isPowerOf2_64(Mask) || isMask_64(Mask)) && !isInt<12>(Mask)) {
+ unsigned ShAmt = 0;
+ if (isPowerOf2_64(Mask)) {
+ CC = CC == ISD::SETEQ ? ISD::SETGE : ISD::SETLT;
+ ShAmt = LHS.getValueSizeInBits() - 1 - Log2_64(Mask);
+ } else {
+ ShAmt = LHS.getValueSizeInBits() - llvm::bit_width(Mask);
+ }
+
+ LHS = LHS.getOperand(0);
+ if (ShAmt != 0)
+ LHS = DAG.getNode(ISD::SHL, DL, LHS.getValueType(), LHS,
+ DAG.getConstant(ShAmt, DL, LHS.getValueType()));
+ return;
+ }
+ }
+
+ if (auto *RHSC = dyn_cast<ConstantSDNode>(RHS)) {
+ int64_t C = RHSC->getSExtValue();
+ switch (CC) {
+ default:
+ break;
+ case ISD::SETGT:
+ // Convert X > -1 to X >= 0.
+ if (C == -1) {
+ RHS = DAG.getConstant(0, DL, RHS.getValueType());
+ CC = ISD::SETGE;
+ return;
+ }
+ break;
+ case ISD::SETLT:
+ // Convert X < 1 to 0 >= X.
+ if (C == 1) {
+ RHS = LHS;
+ LHS = DAG.getConstant(0, DL, RHS.getValueType());
+ CC = ISD::SETGE;
+ return;
+ }
+ break;
+ }
+ }
+
+ switch (CC) {
+ default:
+ break;
+ case ISD::SETGT:
+ case ISD::SETLE:
+ case ISD::SETUGT:
+ case ISD::SETULE:
+ CC = ISD::getSetCCSwappedOperands(CC);
+ std::swap(LHS, RHS);
+ break;
+ }
+}
+
+SDValue LoongArchTargetLowering::lowerSELECT(SDValue Op,
+ SelectionDAG &DAG) const {
+ SDValue CondV = Op.getOperand(0);
+ SDValue TrueV = Op.getOperand(1);
+ SDValue FalseV = Op.getOperand(2);
+ SDLoc DL(Op);
+ MVT VT = Op.getSimpleValueType();
+ MVT GRLenVT = Subtarget.getGRLenVT();
+
+ if (SDValue V = combineSelectToBinOp(Op.getNode(), DAG, Subtarget))
+ return V;
+
+ if (Op.hasOneUse()) {
+ unsigned UseOpc = Op->user_begin()->getOpcode();
+ if (isBinOp(UseOpc) && DAG.isSafeToSpeculativelyExecute(UseOpc)) {
+ SDNode *BinOp = *Op->user_begin();
+ if (SDValue NewSel = foldBinOpIntoSelectIfProfitable(*Op->user_begin(),
+ DAG, Subtarget)) {
+ DAG.ReplaceAllUsesWith(BinOp, &NewSel);
+ // Opcode check is necessary because foldBinOpIntoSelectIfProfitable
+ // may return a constant node and cause crash in lowerSELECT.
+ if (NewSel.getOpcode() == ISD::SELECT)
+ return lowerSELECT(NewSel, DAG);
+ return NewSel;
+ }
+ }
+ }
+
+ // If the condition is not an integer SETCC which operates on GRLenVT, we need
+ // to emit a LoongArchISD::SELECT_CC comparing the condition to zero. i.e.:
+ // (select condv, truev, falsev)
+ // -> (loongarchisd::select_cc condv, zero, setne, truev, falsev)
+ if (CondV.getOpcode() != ISD::SETCC ||
+ CondV.getOperand(0).getSimpleValueType() != GRLenVT) {
+ SDValue Zero = DAG.getConstant(0, DL, GRLenVT);
+ SDValue SetNE = DAG.getCondCode(ISD::SETNE);
+
+ SDValue Ops[] = {CondV, Zero, SetNE, TrueV, FalseV};
+
+ return DAG.getNode(LoongArchISD::SELECT_CC, DL, VT, Ops);
+ }
+
+ // If the CondV is the output of a SETCC node which operates on GRLenVT
+ // inputs, then merge the SETCC node into the lowered LoongArchISD::SELECT_CC
+ // to take advantage of the integer compare+branch instructions. i.e.: (select
+ // (setcc lhs, rhs, cc), truev, falsev)
+ // -> (loongarchisd::select_cc lhs, rhs, cc, truev, falsev)
+ SDValue LHS = CondV.getOperand(0);
+ SDValue RHS = CondV.getOperand(1);
+ ISD::CondCode CCVal = cast<CondCodeSDNode>(CondV.getOperand(2))->get();
+
+ // Special case for a select of 2 constants that have a difference of 1.
+ // Normally this is done by DAGCombine, but if the select is introduced by
+ // type legalization or op legalization, we miss it. Restricting to SETLT
+ // case for now because that is what signed saturating add/sub need.
+ // FIXME: We don't need the condition to be SETLT or even a SETCC,
+ // but we would probably want to swap the true/false values if the condition
+ // is SETGE/SETLE to avoid an XORI.
+ if (isa<ConstantSDNode>(TrueV) && isa<ConstantSDNode>(FalseV) &&
+ CCVal == ISD::SETLT) {
+ const APInt &TrueVal = TrueV->getAsAPIntVal();
+ const APInt &FalseVal = FalseV->getAsAPIntVal();
+ if (TrueVal - 1 == FalseVal)
+ return DAG.getNode(ISD::ADD, DL, VT, CondV, FalseV);
+ if (TrueVal + 1 == FalseVal)
+ return DAG.getNode(ISD::SUB, DL, VT, FalseV, CondV);
+ }
+
+ translateSetCCForBranch(DL, LHS, RHS, CCVal, DAG);
+ // 1 < x ? x : 1 -> 0 < x ? x : 1
+ if (isOneConstant(LHS) && (CCVal == ISD::SETLT || CCVal == ISD::SETULT) &&
+ RHS == TrueV && LHS == FalseV) {
+ LHS = DAG.getConstant(0, DL, VT);
+ // 0 <u x is the same as x != 0.
+ if (CCVal == ISD::SETULT) {
+ std::swap(LHS, RHS);
+ CCVal = ISD::SETNE;
+ }
+ }
+
+ // x <s -1 ? x : -1 -> x <s 0 ? x : -1
+ if (isAllOnesConstant(RH...
[truncated]
|
According to the offical LoongArch reference manual, the 32-bit LoongArch is divied into two variants: the Reduced version (LA32R) and Standard version (LA32S). LA32S extends LA32R by adding additional instructions, and the 64-bit version (LA64) fully includes the LA32S instruction set. This patch introduces a new target feature `32s` for the LoongArch backend, enabling support for instructions specific to the LA32S variant. The LA32S exntension includes the following additional instructions: - ALSL.W - {AND,OR}N - B{EQ,NE}Z - BITREV.{4B,W} - BSTR{INS,PICK}.W - BYTEPICK.W - CL{O,Z}.W - CPUCFG - CT{O,Z}.W - EXT.W,{B,H} - F{LD,ST}X.{D,S} - MASK{EQ,NE}Z - PC{ADDI,ALAU12I} - REVB.2H - ROTR{I},W Additionally, LA32R defines three new instruction aliases: - RDCNTID.W RJ => RDTIMEL.W ZERO, RJ - RDCNTVH.W RD => RDTIMEH.W RD, ZERO - RDCNTVL.W RD => RDTIMEL.W RD, ZERO
@@ -214,8 +214,9 @@ static void doAtomicBinOpExpansion(const LoongArchInstrInfo *TII, | |||
.addReg(ScratchReg) | |||
.addReg(AddrReg) | |||
.addImm(0); | |||
BuildMI(LoopMBB, DL, TII->get(LoongArch::BEQZ)) | |||
BuildMI(LoopMBB, DL, TII->get(LoongArch::BEQ)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BEQZ
has the advantage over BEQ
in that its reach is broader by 5 bits (width of a GPR slot), so if some of the changed bits potentially refers to a remote MBB, we may want to preserve them? I'm not looking at this code as closely as I'd prefer because I'm just battling my procrastination and doing a quick review here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. While BEQZ
does offer a wider branch range than BEQ
, their latency and throughput are the same. For expanding the pseudo-atomic seqences, I believe BEQ's range is sufficient. I chose not to split it further to avoid unnecessary divergence between the 32-bit and 64-bit code paths.
setOperationAction(ISD::CTTZ, GRLenVT, Expand); | ||
setOperationAction(ISD::CTLZ, GRLenVT, Expand); | ||
setOperationAction(ISD::ROTR, GRLenVT, Expand); | ||
setOperationAction(ISD::SELECT, GRLenVT, Custom); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the custom expansion for select
absolutely necessary for the LA32 bringup? I see a lot of added code is presumably for optimizing select
performance in the absence of the maskeqz + masknez
combo, but I'd suggest splitting the changes into another PR if it's not essential to the bringup, and keep the bringup here focused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this part is required. Without MASK{EQ,NE}Z
, there's no way to lower ISD::SELECT
properly, so the custom expansion is essential for the bringup.
def BNEZ : BrCCZ_1RI21<0x44000000>; | ||
|
||
// Other Miscellaneous Instructions | ||
def CPUCFG : ALU_2R<0x00006c00>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given CPUCFG
is one of the few ways an application can query its runtime CPU's capabilities, and the only one way if running without ELF HWCAP support, we may want to allow it despite the spec not mandating it. I know, and personally insist that the LA32R spec is wrong in not doing so, but maybe acting against the spec this time will do the users more good than harm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally agree. IIRC, @FlyGoat has considered emulating the CPUCFG
instruction for LA32R in the Linux kernel, so compiler-side support would be necessary. Unless there are objections, I plan to go ahead with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, I think the same for IOCSR (probably all privileged instructions).
@@ -1054,6 +1068,8 @@ def AMCAS__DB_D : AMCAS_3R<0x385b8000>; | |||
def LL_D : LLBase<0x22000000>; | |||
def SC_D : SCBase<0x23000000>; | |||
def SC_Q : SCBase_128<0x38570000>; | |||
def LLACQ_W : LLBase_ACQ<0x38578000>; | |||
def SCREL_W : SCBase_REL<0x38578400>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be a separate change that can be fast-tracked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They aren't new additions; rather, they are being moved out of the Has32S
predicates.
According to the offical LoongArch reference manual, the 32-bit LoongArch is divied into two variants: the Reduced version (LA32R) and Standard version (LA32S). LA32S extends LA32R by adding additional instructions, and the 64-bit version (LA64) fully includes the LA32S instruction set.
This patch introduces a new target feature
32s
for the LoongArch backend, enabling support for instructions specific to the LA32S variant.The LA32S exntension includes the following additional instructions:
Additionally, LA32R defines three new instruction aliases: