-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[SelectionDAG] Add an ISD node for for get.active.lane.mask #139084
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
For now expansion still happens in SelectionDAGBuilder when GET_ACTIVE_LANE_MASK is not legal on the target. This patch also includes changes in AArch64ISelLowering to replace handling of the get.active.lane.mask intrinsic to use the ISD node. Tablegen patterns are added which match to whilelo for scalable types. A follow up change will add support for more types to be lowered to GET_ACTIVE_LANE_MASK by allowing splitting of the node.
@llvm/pr-subscribers-llvm-selectiondag @llvm/pr-subscribers-backend-aarch64 Author: Kerry McLaughlin (kmclaughlin-arm) ChangesFor now expansion still happens in SelectionDAGBuilder when This patch also includes changes in AArch64ISelLowering to replace A follow up change will add support for more types to be lowered to Patch is 21.07 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/139084.diff 9 Files Affected:
diff --git a/llvm/include/llvm/CodeGen/ISDOpcodes.h b/llvm/include/llvm/CodeGen/ISDOpcodes.h
index 80ef32aff62ae..ba29c118007dd 100644
--- a/llvm/include/llvm/CodeGen/ISDOpcodes.h
+++ b/llvm/include/llvm/CodeGen/ISDOpcodes.h
@@ -1524,6 +1524,12 @@ enum NodeType {
// Operands: Mask
VECTOR_FIND_LAST_ACTIVE,
+ // Creates a mask representing active and inactive
+ // vector lanes, active while base < trip count.
+ // Operands: Base, Trip Count
+ // Output: Mask
+ GET_ACTIVE_LANE_MASK,
+
// llvm.clear_cache intrinsic
// Operands: Input Chain, Start Addres, End Address
// Outputs: Output Chain
diff --git a/llvm/include/llvm/Target/TargetSelectionDAG.td b/llvm/include/llvm/Target/TargetSelectionDAG.td
index 41fed692c7025..5b8e01951edf2 100644
--- a/llvm/include/llvm/Target/TargetSelectionDAG.td
+++ b/llvm/include/llvm/Target/TargetSelectionDAG.td
@@ -860,6 +860,12 @@ def find_last_active
: SDNode<"ISD::VECTOR_FIND_LAST_ACTIVE",
SDTypeProfile<1, 1, [SDTCisInt<0>, SDTCisVec<1>]>, []>;
+def get_active_lane_mask
+ : SDNode<
+ "ISD::GET_ACTIVE_LANE_MASK",
+ SDTypeProfile<1, 2, [SDTCisVec<0>, SDTCisInt<1>, SDTCisSameAs<1, 2>]>,
+ []>;
+
// Nodes for intrinsics, you should use the intrinsic itself and let tblgen use
// these internally. Don't reference these directly.
def intrinsic_void : SDNode<"ISD::INTRINSIC_VOID",
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
index 25e74a2ae5b71..9ccd6a4d1684c 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
@@ -160,6 +160,10 @@ void DAGTypeLegalizer::PromoteIntegerResult(SDNode *N, unsigned ResNo) {
Res = PromoteIntRes_VECTOR_FIND_LAST_ACTIVE(N);
break;
+ case ISD::GET_ACTIVE_LANE_MASK:
+ Res = PromoteIntRes_GET_ACTIVE_LANE_MASK(N);
+ break;
+
case ISD::PARTIAL_REDUCE_UMLA:
case ISD::PARTIAL_REDUCE_SMLA:
Res = PromoteIntRes_PARTIAL_REDUCE_MLA(N);
@@ -6222,6 +6226,12 @@ SDValue DAGTypeLegalizer::PromoteIntRes_VECTOR_FIND_LAST_ACTIVE(SDNode *N) {
return DAG.getNode(ISD::VECTOR_FIND_LAST_ACTIVE, SDLoc(N), NVT, N->ops());
}
+SDValue DAGTypeLegalizer::PromoteIntRes_GET_ACTIVE_LANE_MASK(SDNode *N) {
+ EVT VT = N->getValueType(0);
+ EVT NVT = TLI.getTypeToTransformTo(*DAG.getContext(), VT);
+ return DAG.getNode(ISD::GET_ACTIVE_LANE_MASK, SDLoc(N), NVT, N->ops());
+}
+
SDValue DAGTypeLegalizer::PromoteIntRes_PARTIAL_REDUCE_MLA(SDNode *N) {
SDLoc DL(N);
EVT VT = N->getValueType(0);
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index 720393158aa5e..edd54fc550f53 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -379,6 +379,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
SDValue PromoteIntRes_IS_FPCLASS(SDNode *N);
SDValue PromoteIntRes_PATCHPOINT(SDNode *N);
SDValue PromoteIntRes_VECTOR_FIND_LAST_ACTIVE(SDNode *N);
+ SDValue PromoteIntRes_GET_ACTIVE_LANE_MASK(SDNode *N);
SDValue PromoteIntRes_PARTIAL_REDUCE_MLA(SDNode *N);
// Integer Operand Promotion.
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index 9d138d364bad7..3840f94a214e2 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -7987,14 +7987,15 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
case Intrinsic::get_active_lane_mask: {
EVT CCVT = TLI.getValueType(DAG.getDataLayout(), I.getType());
SDValue Index = getValue(I.getOperand(0));
+ SDValue TripCount = getValue(I.getOperand(1));
EVT ElementVT = Index.getValueType();
if (!TLI.shouldExpandGetActiveLaneMask(CCVT, ElementVT)) {
- visitTargetIntrinsic(I, Intrinsic);
+ setValue(&I, DAG.getNode(ISD::GET_ACTIVE_LANE_MASK, sdl, CCVT, Index,
+ TripCount));
return;
}
- SDValue TripCount = getValue(I.getOperand(1));
EVT VecTy = EVT::getVectorVT(*DAG.getContext(), ElementVT,
CCVT.getVectorElementCount());
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
index 8faf97271d99e..cf35327b4a103 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
@@ -576,6 +576,9 @@ std::string SDNode::getOperationName(const SelectionDAG *G) const {
case ISD::VECTOR_FIND_LAST_ACTIVE:
return "find_last_active";
+ case ISD::GET_ACTIVE_LANE_MASK:
+ return "get_active_lane_mask";
+
case ISD::PARTIAL_REDUCE_UMLA:
return "partial_reduce_umla";
case ISD::PARTIAL_REDUCE_SMLA:
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 6e7f13c20db68..0b10882e45924 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -1178,6 +1178,8 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
setTargetDAGCombine(ISD::CTLZ);
+ setTargetDAGCombine(ISD::GET_ACTIVE_LANE_MASK);
+
setTargetDAGCombine(ISD::VECREDUCE_AND);
setTargetDAGCombine(ISD::VECREDUCE_OR);
setTargetDAGCombine(ISD::VECREDUCE_XOR);
@@ -1493,8 +1495,13 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
setOperationAction(ISD::VECTOR_DEINTERLEAVE, VT, Custom);
setOperationAction(ISD::VECTOR_INTERLEAVE, VT, Custom);
}
- for (auto VT : {MVT::nxv16i1, MVT::nxv8i1, MVT::nxv4i1, MVT::nxv2i1})
+ for (auto VT : {MVT::nxv16i1, MVT::nxv8i1, MVT::nxv4i1, MVT::nxv2i1}) {
setOperationAction(ISD::VECTOR_FIND_LAST_ACTIVE, VT, Legal);
+ setOperationAction(ISD::GET_ACTIVE_LANE_MASK, VT, Legal);
+ }
+
+ for (auto VT : {MVT::v16i8, MVT::v8i8, MVT::v4i16, MVT::v2i32})
+ setOperationAction(ISD::GET_ACTIVE_LANE_MASK, VT, Custom);
}
if (Subtarget->isSVEorStreamingSVEAvailable()) {
@@ -5731,21 +5738,24 @@ static inline SDValue getPTrue(SelectionDAG &DAG, SDLoc DL, EVT VT,
DAG.getTargetConstant(Pattern, DL, MVT::i32));
}
-static SDValue optimizeIncrementingWhile(SDValue Op, SelectionDAG &DAG,
+static SDValue optimizeIncrementingWhile(SDNode *N, SelectionDAG &DAG,
bool IsSigned, bool IsEqual) {
- if (!isa<ConstantSDNode>(Op.getOperand(1)) ||
- !isa<ConstantSDNode>(Op.getOperand(2)))
+ unsigned Op0 = N->getOpcode() == ISD::INTRINSIC_WO_CHAIN ? 1 : 0;
+ unsigned Op1 = N->getOpcode() == ISD::INTRINSIC_WO_CHAIN ? 2 : 1;
+
+ if (!isa<ConstantSDNode>(N->getOperand(Op0)) ||
+ !isa<ConstantSDNode>(N->getOperand(Op1)))
return SDValue();
- SDLoc dl(Op);
- APInt X = Op.getConstantOperandAPInt(1);
- APInt Y = Op.getConstantOperandAPInt(2);
+ SDLoc dl(N);
+ APInt X = N->getConstantOperandAPInt(Op0);
+ APInt Y = N->getConstantOperandAPInt(Op1);
// When the second operand is the maximum value, comparisons that include
// equality can never fail and thus we can return an all active predicate.
if (IsEqual)
if (IsSigned ? Y.isMaxSignedValue() : Y.isMaxValue())
- return DAG.getConstant(1, dl, Op.getValueType());
+ return DAG.getConstant(1, dl, N->getValueType(0));
bool Overflow;
APInt NumActiveElems =
@@ -5766,10 +5776,10 @@ static SDValue optimizeIncrementingWhile(SDValue Op, SelectionDAG &DAG,
getSVEPredPatternFromNumElements(NumActiveElems.getZExtValue());
unsigned MinSVEVectorSize = std::max(
DAG.getSubtarget<AArch64Subtarget>().getMinSVEVectorSizeInBits(), 128u);
- unsigned ElementSize = 128 / Op.getValueType().getVectorMinNumElements();
+ unsigned ElementSize = 128 / N->getValueType(0).getVectorMinNumElements();
if (PredPattern != std::nullopt &&
NumActiveElems.getZExtValue() <= (MinSVEVectorSize / ElementSize))
- return getPTrue(DAG, dl, Op.getValueType(), *PredPattern);
+ return getPTrue(DAG, dl, N->getValueType(0), *PredPattern);
return SDValue();
}
@@ -6222,16 +6232,16 @@ SDValue AArch64TargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue Op,
AArch64ISD::URSHR_I, dl, Op.getOperand(1).getValueType(), Op.getOperand(1), Op.getOperand(2)));
return SDValue();
case Intrinsic::aarch64_sve_whilelo:
- return optimizeIncrementingWhile(Op, DAG, /*IsSigned=*/false,
+ return optimizeIncrementingWhile(Op.getNode(), DAG, /*IsSigned=*/false,
/*IsEqual=*/false);
case Intrinsic::aarch64_sve_whilelt:
- return optimizeIncrementingWhile(Op, DAG, /*IsSigned=*/true,
+ return optimizeIncrementingWhile(Op.getNode(), DAG, /*IsSigned=*/true,
/*IsEqual=*/false);
case Intrinsic::aarch64_sve_whilels:
- return optimizeIncrementingWhile(Op, DAG, /*IsSigned=*/false,
+ return optimizeIncrementingWhile(Op.getNode(), DAG, /*IsSigned=*/false,
/*IsEqual=*/true);
case Intrinsic::aarch64_sve_whilele:
- return optimizeIncrementingWhile(Op, DAG, /*IsSigned=*/true,
+ return optimizeIncrementingWhile(Op.getNode(), DAG, /*IsSigned=*/true,
/*IsEqual=*/true);
case Intrinsic::aarch64_sve_sunpkhi:
return DAG.getNode(AArch64ISD::SUNPKHI, dl, Op.getValueType(),
@@ -6532,28 +6542,6 @@ SDValue AArch64TargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue Op,
return DAG.getNode(AArch64ISD::USDOT, dl, Op.getValueType(),
Op.getOperand(1), Op.getOperand(2), Op.getOperand(3));
}
- case Intrinsic::get_active_lane_mask: {
- SDValue ID =
- DAG.getTargetConstant(Intrinsic::aarch64_sve_whilelo, dl, MVT::i64);
-
- EVT VT = Op.getValueType();
- if (VT.isScalableVector())
- return DAG.getNode(ISD::INTRINSIC_WO_CHAIN, dl, VT, ID, Op.getOperand(1),
- Op.getOperand(2));
-
- // We can use the SVE whilelo instruction to lower this intrinsic by
- // creating the appropriate sequence of scalable vector operations and
- // then extracting a fixed-width subvector from the scalable vector.
-
- EVT ContainerVT = getContainerForFixedLengthVector(DAG, VT);
- EVT WhileVT = ContainerVT.changeElementType(MVT::i1);
-
- SDValue Mask = DAG.getNode(ISD::INTRINSIC_WO_CHAIN, dl, WhileVT, ID,
- Op.getOperand(1), Op.getOperand(2));
- SDValue MaskAsInt = DAG.getNode(ISD::SIGN_EXTEND, dl, ContainerVT, Mask);
- return DAG.getNode(ISD::EXTRACT_SUBVECTOR, dl, VT, MaskAsInt,
- DAG.getVectorIdxConstant(0, dl));
- }
case Intrinsic::aarch64_neon_saddlv:
case Intrinsic::aarch64_neon_uaddlv: {
EVT OpVT = Op.getOperand(1).getValueType();
@@ -7692,6 +7680,8 @@ SDValue AArch64TargetLowering::LowerOperation(SDValue Op,
return LowerVECTOR_DEINTERLEAVE(Op, DAG);
case ISD::VECTOR_INTERLEAVE:
return LowerVECTOR_INTERLEAVE(Op, DAG);
+ case ISD::GET_ACTIVE_LANE_MASK:
+ return LowerGET_ACTIVE_LANE_MASK(Op, DAG);
case ISD::LRINT:
case ISD::LLRINT:
if (Op.getValueType().isVector())
@@ -18142,6 +18132,70 @@ static SDValue performVecReduceAddCombineWithUADDLP(SDNode *N,
return DAG.getNode(ISD::VECREDUCE_ADD, DL, MVT::i32, UADDLP);
}
+static SDValue
+performActiveLaneMaskCombine(SDNode *N, TargetLowering::DAGCombinerInfo &DCI,
+ const AArch64Subtarget *ST) {
+ if (SDValue While = optimizeIncrementingWhile(N, DCI.DAG, /*IsSigned=*/false,
+ /*IsEqual=*/false))
+ return While;
+
+ if (DCI.isBeforeLegalize())
+ return SDValue();
+
+ if (!ST->hasSVE2p1())
+ return SDValue();
+
+ if (!N->hasNUsesOfValue(2, 0))
+ return SDValue();
+
+ const uint64_t HalfSize = N->getValueType(0).getVectorMinNumElements() / 2;
+ if (HalfSize < 2)
+ return SDValue();
+
+ auto It = N->user_begin();
+ SDNode *Lo = *It++;
+ SDNode *Hi = *It;
+
+ if (Lo->getOpcode() != ISD::EXTRACT_SUBVECTOR ||
+ Hi->getOpcode() != ISD::EXTRACT_SUBVECTOR)
+ return SDValue();
+
+ uint64_t OffLo = Lo->getConstantOperandVal(1);
+ uint64_t OffHi = Hi->getConstantOperandVal(1);
+
+ if (OffLo > OffHi) {
+ std::swap(Lo, Hi);
+ std::swap(OffLo, OffHi);
+ }
+
+ if (OffLo != 0 || OffHi != HalfSize)
+ return SDValue();
+
+ EVT HalfVec = Lo->getValueType(0);
+ if (HalfVec != Hi->getValueType(0) ||
+ HalfVec.getVectorElementCount() != ElementCount::getScalable(HalfSize))
+ return SDValue();
+
+ SelectionDAG &DAG = DCI.DAG;
+ SDLoc DL(N);
+ SDValue ID =
+ DAG.getTargetConstant(Intrinsic::aarch64_sve_whilelo_x2, DL, MVT::i64);
+ SDValue Idx = N->getOperand(0);
+ SDValue TC = N->getOperand(1);
+ if (Idx.getValueType() != MVT::i64) {
+ Idx = DAG.getZExtOrTrunc(Idx, DL, MVT::i64);
+ TC = DAG.getZExtOrTrunc(TC, DL, MVT::i64);
+ }
+ auto R =
+ DAG.getNode(ISD::INTRINSIC_WO_CHAIN, DL,
+ {Lo->getValueType(0), Hi->getValueType(0)}, {ID, Idx, TC});
+
+ DCI.CombineTo(Lo, R.getValue(0));
+ DCI.CombineTo(Hi, R.getValue(1));
+
+ return SDValue(N, 0);
+}
+
// Turn a v8i8/v16i8 extended vecreduce into a udot/sdot and vecreduce
// vecreduce.add(ext(A)) to vecreduce.add(DOT(zero, A, one))
// vecreduce.add(mul(ext(A), ext(B))) to vecreduce.add(DOT(zero, A, B))
@@ -19672,6 +19726,8 @@ static SDValue getPTest(SelectionDAG &DAG, EVT VT, SDValue Pg, SDValue Op,
static bool isPredicateCCSettingOp(SDValue N) {
if ((N.getOpcode() == ISD::SETCC) ||
+ // get_active_lane_mask is lowered to a whilelo instruction.
+ (N.getOpcode() == ISD::GET_ACTIVE_LANE_MASK) ||
(N.getOpcode() == ISD::INTRINSIC_WO_CHAIN &&
(N.getConstantOperandVal(0) == Intrinsic::aarch64_sve_whilege ||
N.getConstantOperandVal(0) == Intrinsic::aarch64_sve_whilegt ||
@@ -19680,9 +19736,7 @@ static bool isPredicateCCSettingOp(SDValue N) {
N.getConstantOperandVal(0) == Intrinsic::aarch64_sve_whilele ||
N.getConstantOperandVal(0) == Intrinsic::aarch64_sve_whilelo ||
N.getConstantOperandVal(0) == Intrinsic::aarch64_sve_whilels ||
- N.getConstantOperandVal(0) == Intrinsic::aarch64_sve_whilelt ||
- // get_active_lane_mask is lowered to a whilelo instruction.
- N.getConstantOperandVal(0) == Intrinsic::get_active_lane_mask)))
+ N.getConstantOperandVal(0) == Intrinsic::aarch64_sve_whilelt)))
return true;
return false;
@@ -21796,66 +21850,6 @@ static SDValue convertMergedOpToPredOp(SDNode *N, unsigned Opc,
return SDValue();
}
-static SDValue tryCombineWhileLo(SDNode *N,
- TargetLowering::DAGCombinerInfo &DCI,
- const AArch64Subtarget *Subtarget) {
- if (DCI.isBeforeLegalize())
- return SDValue();
-
- if (!Subtarget->hasSVE2p1())
- return SDValue();
-
- if (!N->hasNUsesOfValue(2, 0))
- return SDValue();
-
- const uint64_t HalfSize = N->getValueType(0).getVectorMinNumElements() / 2;
- if (HalfSize < 2)
- return SDValue();
-
- auto It = N->user_begin();
- SDNode *Lo = *It++;
- SDNode *Hi = *It;
-
- if (Lo->getOpcode() != ISD::EXTRACT_SUBVECTOR ||
- Hi->getOpcode() != ISD::EXTRACT_SUBVECTOR)
- return SDValue();
-
- uint64_t OffLo = Lo->getConstantOperandVal(1);
- uint64_t OffHi = Hi->getConstantOperandVal(1);
-
- if (OffLo > OffHi) {
- std::swap(Lo, Hi);
- std::swap(OffLo, OffHi);
- }
-
- if (OffLo != 0 || OffHi != HalfSize)
- return SDValue();
-
- EVT HalfVec = Lo->getValueType(0);
- if (HalfVec != Hi->getValueType(0) ||
- HalfVec.getVectorElementCount() != ElementCount::getScalable(HalfSize))
- return SDValue();
-
- SelectionDAG &DAG = DCI.DAG;
- SDLoc DL(N);
- SDValue ID =
- DAG.getTargetConstant(Intrinsic::aarch64_sve_whilelo_x2, DL, MVT::i64);
- SDValue Idx = N->getOperand(1);
- SDValue TC = N->getOperand(2);
- if (Idx.getValueType() != MVT::i64) {
- Idx = DAG.getZExtOrTrunc(Idx, DL, MVT::i64);
- TC = DAG.getZExtOrTrunc(TC, DL, MVT::i64);
- }
- auto R =
- DAG.getNode(ISD::INTRINSIC_WO_CHAIN, DL,
- {Lo->getValueType(0), Hi->getValueType(0)}, {ID, Idx, TC});
-
- DCI.CombineTo(Lo, R.getValue(0));
- DCI.CombineTo(Hi, R.getValue(1));
-
- return SDValue(N, 0);
-}
-
SDValue tryLowerPartialReductionToDot(SDNode *N,
const AArch64Subtarget *Subtarget,
SelectionDAG &DAG) {
@@ -22310,8 +22304,6 @@ static SDValue performIntrinsicCombine(SDNode *N,
case Intrinsic::aarch64_sve_ptest_last:
return getPTest(DAG, N->getValueType(0), N->getOperand(1), N->getOperand(2),
AArch64CC::LAST_ACTIVE);
- case Intrinsic::aarch64_sve_whilelo:
- return tryCombineWhileLo(N, DCI, Subtarget);
}
return SDValue();
}
@@ -26738,6 +26730,8 @@ SDValue AArch64TargetLowering::PerformDAGCombine(SDNode *N,
return performExtractVectorEltCombine(N, DCI, Subtarget);
case ISD::VECREDUCE_ADD:
return performVecReduceAddCombine(N, DCI.DAG, Subtarget);
+ case ISD::GET_ACTIVE_LANE_MASK:
+ return performActiveLaneMaskCombine(N, DCI, Subtarget);
case AArch64ISD::UADDV:
return performUADDVCombine(N, DAG);
case AArch64ISD::SMULL:
@@ -27720,8 +27714,7 @@ void AArch64TargetLowering::ReplaceNodeResults(
DAG.getNode(ISD::TRUNCATE, DL, MVT::i1, RuntimePStateSM));
return;
}
- case Intrinsic::experimental_vector_match:
- case Intrinsic::get_active_lane_mask: {
+ case Intrinsic::experimental_vector_match: {
if (!VT.isFixedLengthVector() || VT.getVectorElementType() != MVT::i1)
return;
@@ -29515,6 +29508,27 @@ AArch64TargetLowering::LowerPARTIAL_REDUCE_MLA(SDValue Op,
return DAG.getNode(ISD::ADD, DL, ResultVT, Acc, Extended);
}
+SDValue
+AArch64TargetLowering::LowerGET_ACTIVE_LANE_MASK(SDValue Op,
+ SelectionDAG &DAG) const {
+ EVT VT = Op.getValueType();
+ assert(VT.isFixedLengthVector() && "Expected fixed length vector type!");
+
+ // We can use the SVE whilelo instruction to lower this intrinsic by
+ // creating the appropriate sequence of scalable vector operations and
+ // then extracting a fixed-width subvector from the scalable vector.
+
+ SDLoc DL(Op);
+ EVT ContainerVT = getContainerForFixedLengthVector(DAG, VT);
+ EVT WhileVT = ContainerVT.changeElementType(MVT::i1);
+
+ SDValue Mask = DAG.getNode(ISD::GET_ACTIVE_LANE_MASK, DL, WhileVT,
+ Op.getOperand(0), Op.getOperand(1));
+ SDValue MaskAsInt = DAG.getNode(ISD::SIGN_EXTEND, DL, ContainerVT, Mask);
+ return DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, VT, MaskAsInt,
+ DAG.getVectorIdxConstant(0, DL));
+}
+
SDValue
AArch64TargetLowering::LowerFixedLengthFPToIntToSVE(SDValue Op,
SelectionDAG &DAG) const {
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.h b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
index 9d8d1c22258be..ca93aea7c394f 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.h
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
@@ -1182,6 +1182,7 @@ class AArch64TargetLowering : public TargetLowering {
SDValue LowerVECTOR_INTERLEAVE(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerVECTOR_HISTOGRAM(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerPARTIAL_REDUCE_MLA(SDValue Op, SelectionDAG &DAG) const;
+ SDValue LowerGET_ACTIVE_LANE_MASK(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerDIV(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerMUL(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerVectorSRA_SRL_SHL(SDValue Op, SelectionDAG &DAG) const;
diff --git a/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td b/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
index bd394671881e8..b9dd599bdb0ce 100644
--- a/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
@@ -3449,6 +3449,24 @@ let Predicates = [HasSVE_or_SME] in {
def : Pat<(i64(find_last_active nxv2i1:$P1)), (LASTB_RPZ_D $P1, (INDEX_II_D 0,
1))>;...
[truncated]
|
Given many of the changes are required to maintain existing code quality after loosing the lowering to the |
…Combine - Remove the additional tablegen patterns matching get_active_lane_mask and reuse those for the whilelo intrinsic
// Creates a mask representing active and inactive | ||
// vector lanes, active while base < trip count. | ||
// Operands: Base, Trip Count | ||
// Output: Mask |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could do with more details, for example, to reference the internal addition cannot overflow or perhaps reference the LLVM intrinsic's documentation and then just call out any differences. For example, the ISD node supports different result types that follow the same rules as ISD::SETCC
, specifically "If the result value type is not i1 then the high bits conform to getBooleanContents".
// We can use the SVE whilelo instruction to lower this intrinsic by | ||
// creating the appropriate sequence of scalable vector operations and | ||
// then extracting a fixed-width subvector from the scalable vector. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't strictly match the code. Perhaps something simpler like "No dedicated fixed length instructions but we can use SVE when available." ?
On which note, I think it's worth adding an assert for Subtarget->isSVEorStreamingSVEAvailable()
.
if (SDValue While = optimizeIncrementingWhile(N, DCI.DAG, /*IsSigned=*/false, | ||
/*IsEqual=*/false)) | ||
return While; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than the call to optimizeIncrementingWhile
is this identical to the original tryCombineWhileLo
? It looks like it to me but I just wanted to make sure.
@@ -5731,21 +5739,24 @@ static inline SDValue getPTrue(SelectionDAG &DAG, SDLoc DL, EVT VT, | |||
DAG.getTargetConstant(Pattern, DL, MVT::i32)); | |||
} | |||
|
|||
static SDValue optimizeIncrementingWhile(SDValue Op, SelectionDAG &DAG, | |||
static SDValue optimizeIncrementingWhile(SDNode *N, SelectionDAG &DAG, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now this function is enabled for GET_ACTIVE_LANE_MASK
you'll need a isBeforeLegalize
bailout because otherwise it can erroneously trigger for fixed length vectors which ISD::PTRUE does not support. You could restrict the combine to scalable vector predicate types but I also think it's better to preserve the common node for as long as possible and waiting until after legalisation before doing the conversion would tick that box as well.
FYI: When this PR lands I think I'll move the other uses into performIntrinsicCombine
because they aren't really related to lowering.
For now expansion still happens in SelectionDAGBuilder when
GET_ACTIVE_LANE_MASK is not legal on the target.
This patch also includes changes in AArch64ISelLowering to replace
handling of the get.active.lane.mask intrinsic to use the ISD node.
Tablegen patterns are added which match to whilelo for scalable types.
A follow up change will add support for more types to be lowered to
GET_ACTIVE_LANE_MASK by allowing splitting of the node.