-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[RISCV][ISel] Combine scalable vector add/sub/mul with zero/sign extension #72340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-risc-v Author: Chia (sun-jacobi) ChangesThis PR mainly aims at resolving the below mis-optimization case, while itself could also be considered as an extension of the previous patch https://reviews.llvm.org/D133739?id= Missed-Optimization CaseCompiler Explorer: https://godbolt.org/z/GzWzP7Pfh Source Code:
Before This Patch
After This Patch
We can see Add/Sub/Mul are combined with the Sign ExtensionThe patch D133739 introduced an optimization for folding Furthermore, in the current Patch is 29.48 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/72340.diff 2 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index a4cd8327f45f82a..d777f9da1f9b816 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -1364,8 +1364,8 @@ RISCVTargetLowering::RISCVTargetLowering(const TargetMachine &TM,
setJumpIsExpensive();
setTargetDAGCombine({ISD::INTRINSIC_VOID, ISD::INTRINSIC_W_CHAIN,
- ISD::INTRINSIC_WO_CHAIN, ISD::ADD, ISD::SUB, ISD::AND,
- ISD::OR, ISD::XOR, ISD::SETCC, ISD::SELECT});
+ ISD::INTRINSIC_WO_CHAIN, ISD::ADD, ISD::SUB, ISD::MUL,
+ ISD::AND, ISD::OR, ISD::XOR, ISD::SETCC, ISD::SELECT});
if (Subtarget.is64Bit())
setTargetDAGCombine(ISD::SRA);
@@ -12538,9 +12538,9 @@ struct CombineResult;
/// Helper class for folding sign/zero extensions.
/// In particular, this class is used for the following combines:
-/// add_vl -> vwadd(u) | vwadd(u)_w
-/// sub_vl -> vwsub(u) | vwsub(u)_w
-/// mul_vl -> vwmul(u) | vwmul_su
+/// add | add_vl -> vwadd(u) | vwadd(u)_w
+/// sub | sub_vl -> vwsub(u) | vwsub(u)_w
+/// mul | mul_vl -> vwmul(u) | vwmul_su
///
/// An object of this class represents an operand of the operation we want to
/// combine.
@@ -12585,6 +12585,8 @@ struct NodeExtensionHelper {
/// E.g., for zext(a), this would return a.
SDValue getSource() const {
switch (OrigOperand.getOpcode()) {
+ case ISD::ZERO_EXTEND:
+ case ISD::SIGN_EXTEND:
case RISCVISD::VSEXT_VL:
case RISCVISD::VZEXT_VL:
return OrigOperand.getOperand(0);
@@ -12601,7 +12603,8 @@ struct NodeExtensionHelper {
/// Get or create a value that can feed \p Root with the given extension \p
/// SExt. If \p SExt is std::nullopt, this returns the source of this operand.
/// \see ::getSource().
- SDValue getOrCreateExtendedOp(const SDNode *Root, SelectionDAG &DAG,
+ SDValue getOrCreateExtendedOp(SDNode *Root, SelectionDAG &DAG,
+ const RISCVSubtarget &Subtarget,
std::optional<bool> SExt) const {
if (!SExt.has_value())
return OrigOperand;
@@ -12616,8 +12619,10 @@ struct NodeExtensionHelper {
// If we need an extension, we should be changing the type.
SDLoc DL(Root);
- auto [Mask, VL] = getMaskAndVL(Root);
+ auto [Mask, VL] = getMaskAndVL(Root, DAG, Subtarget);
switch (OrigOperand.getOpcode()) {
+ case ISD::ZERO_EXTEND:
+ case ISD::SIGN_EXTEND:
case RISCVISD::VSEXT_VL:
case RISCVISD::VZEXT_VL:
return DAG.getNode(ExtOpc, DL, NarrowVT, Source, Mask, VL);
@@ -12657,12 +12662,15 @@ struct NodeExtensionHelper {
/// \pre \p Opcode represents a supported root (\see ::isSupportedRoot()).
static unsigned getSameExtensionOpcode(unsigned Opcode, bool IsSExt) {
switch (Opcode) {
+ case ISD::ADD:
case RISCVISD::ADD_VL:
case RISCVISD::VWADD_W_VL:
case RISCVISD::VWADDU_W_VL:
return IsSExt ? RISCVISD::VWADD_VL : RISCVISD::VWADDU_VL;
+ case ISD::MUL:
case RISCVISD::MUL_VL:
return IsSExt ? RISCVISD::VWMUL_VL : RISCVISD::VWMULU_VL;
+ case ISD::SUB:
case RISCVISD::SUB_VL:
case RISCVISD::VWSUB_W_VL:
case RISCVISD::VWSUBU_W_VL:
@@ -12675,7 +12683,8 @@ struct NodeExtensionHelper {
/// Get the opcode to materialize \p Opcode(sext(a), zext(b)) ->
/// newOpcode(a, b).
static unsigned getSUOpcode(unsigned Opcode) {
- assert(Opcode == RISCVISD::MUL_VL && "SU is only supported for MUL");
+ assert((Opcode == RISCVISD::MUL_VL || Opcode == ISD::MUL) &&
+ "SU is only supported for MUL");
return RISCVISD::VWMULSU_VL;
}
@@ -12683,8 +12692,10 @@ struct NodeExtensionHelper {
/// newOpcode(a, b).
static unsigned getWOpcode(unsigned Opcode, bool IsSExt) {
switch (Opcode) {
+ case ISD::ADD:
case RISCVISD::ADD_VL:
return IsSExt ? RISCVISD::VWADD_W_VL : RISCVISD::VWADDU_W_VL;
+ case ISD::SUB:
case RISCVISD::SUB_VL:
return IsSExt ? RISCVISD::VWSUB_W_VL : RISCVISD::VWSUBU_W_VL;
default:
@@ -12694,19 +12705,45 @@ struct NodeExtensionHelper {
using CombineToTry = std::function<std::optional<CombineResult>(
SDNode * /*Root*/, const NodeExtensionHelper & /*LHS*/,
- const NodeExtensionHelper & /*RHS*/)>;
+ const NodeExtensionHelper & /*RHS*/, SelectionDAG &,
+ const RISCVSubtarget &)>;
/// Check if this node needs to be fully folded or extended for all users.
bool needToPromoteOtherUsers() const { return EnforceOneUse; }
/// Helper method to set the various fields of this struct based on the
/// type of \p Root.
- void fillUpExtensionSupport(SDNode *Root, SelectionDAG &DAG) {
+ void fillUpExtensionSupport(SDNode *Root, SelectionDAG &DAG,
+ const RISCVSubtarget &Subtarget) {
SupportsZExt = false;
SupportsSExt = false;
EnforceOneUse = true;
CheckMask = true;
switch (OrigOperand.getOpcode()) {
+ case ISD::ZERO_EXTEND: {
+ SupportsZExt = true;
+ SDLoc DL(Root);
+ MVT VT = Root->getSimpleValueType(0);
+ if (VT.isFixedLengthVector()) {
+ MVT ContainerVT = getContainerForFixedLengthVector(DAG, VT, Subtarget);
+ std::tie(Mask, VL) =
+ getDefaultVLOps(VT, ContainerVT, DL, DAG, Subtarget);
+ } else if (VT.isVector())
+ std::tie(Mask, VL) = getDefaultScalableVLOps(VT, DL, DAG, Subtarget);
+ break;
+ }
+ case ISD::SIGN_EXTEND: {
+ SupportsSExt = true;
+ SDLoc DL(Root);
+ MVT VT = Root->getSimpleValueType(0);
+ if (VT.isFixedLengthVector()) {
+ MVT ContainerVT = getContainerForFixedLengthVector(DAG, VT, Subtarget);
+ std::tie(Mask, VL) =
+ getDefaultVLOps(VT, ContainerVT, DL, DAG, Subtarget);
+ } else if (VT.isVector())
+ std::tie(Mask, VL) = getDefaultScalableVLOps(VT, DL, DAG, Subtarget);
+ break;
+ }
case RISCVISD::VZEXT_VL:
SupportsZExt = true;
Mask = OrigOperand.getOperand(1);
@@ -12764,6 +12801,15 @@ struct NodeExtensionHelper {
/// Check if \p Root supports any extension folding combines.
static bool isSupportedRoot(const SDNode *Root) {
switch (Root->getOpcode()) {
+ case ISD::ADD:
+ case ISD::SUB:
+ case ISD::MUL: {
+ EVT VT0 = Root->getOperand(0).getValueType();
+ EVT VT1 = Root->getOperand(1).getValueType();
+ if (VT0.isFixedLengthVector() || VT0.isFixedLengthVector())
+ return false;
+ return (VT0.isVector() || VT1.isVector());
+ }
case RISCVISD::ADD_VL:
case RISCVISD::MUL_VL:
case RISCVISD::VWADD_W_VL:
@@ -12778,7 +12824,8 @@ struct NodeExtensionHelper {
}
/// Build a NodeExtensionHelper for \p Root.getOperand(\p OperandIdx).
- NodeExtensionHelper(SDNode *Root, unsigned OperandIdx, SelectionDAG &DAG) {
+ NodeExtensionHelper(SDNode *Root, unsigned OperandIdx, SelectionDAG &DAG,
+ const RISCVSubtarget &Subtarget) {
assert(isSupportedRoot(Root) && "Trying to build an helper with an "
"unsupported root");
assert(OperandIdx < 2 && "Requesting something else than LHS or RHS");
@@ -12796,7 +12843,8 @@ struct NodeExtensionHelper {
SupportsZExt =
Opc == RISCVISD::VWADDU_W_VL || Opc == RISCVISD::VWSUBU_W_VL;
SupportsSExt = !SupportsZExt;
- std::tie(Mask, VL) = getMaskAndVL(Root);
+ Mask = Root->getOperand(3);
+ VL = Root->getOperand(4);
CheckMask = true;
// There's no existing extension here, so we don't have to worry about
// making sure it gets removed.
@@ -12805,7 +12853,7 @@ struct NodeExtensionHelper {
}
[[fallthrough]];
default:
- fillUpExtensionSupport(Root, DAG);
+ fillUpExtensionSupport(Root, DAG, Subtarget);
break;
}
}
@@ -12821,14 +12869,35 @@ struct NodeExtensionHelper {
}
/// Helper function to get the Mask and VL from \p Root.
- static std::pair<SDValue, SDValue> getMaskAndVL(const SDNode *Root) {
+ static std::pair<SDValue, SDValue>
+ getMaskAndVL(const SDNode *Root, SelectionDAG &DAG,
+ const RISCVSubtarget &Subtarget) {
assert(isSupportedRoot(Root) && "Unexpected root");
- return std::make_pair(Root->getOperand(3), Root->getOperand(4));
+ switch (Root->getOpcode()) {
+ case ISD::ADD:
+ case ISD::SUB:
+ case ISD::MUL: {
+ SDLoc DL(Root);
+ MVT VT = Root->getSimpleValueType(0);
+ SDValue Mask, VL;
+ if (VT.isFixedLengthVector()) {
+ MVT ContainerVT = getContainerForFixedLengthVector(DAG, VT, Subtarget);
+ std::tie(Mask, VL) =
+ getDefaultVLOps(VT, ContainerVT, DL, DAG, Subtarget);
+ } else
+ std::tie(Mask, VL) = getDefaultScalableVLOps(VT, DL, DAG, Subtarget);
+ return std::make_pair(Mask, VL);
+ }
+
+ default:
+ return std::make_pair(Root->getOperand(3), Root->getOperand(4));
+ }
}
/// Check if the Mask and VL of this operand are compatible with \p Root.
- bool areVLAndMaskCompatible(const SDNode *Root) const {
- auto [Mask, VL] = getMaskAndVL(Root);
+ bool areVLAndMaskCompatible(SDNode *Root, SelectionDAG &DAG,
+ const RISCVSubtarget &Subtarget) const {
+ auto [Mask, VL] = getMaskAndVL(Root, DAG, Subtarget);
return isMaskCompatible(Mask) && isVLCompatible(VL);
}
@@ -12836,11 +12905,14 @@ struct NodeExtensionHelper {
/// foldings that are supported by this class.
static bool isCommutative(const SDNode *N) {
switch (N->getOpcode()) {
+ case ISD::ADD:
+ case ISD::MUL:
case RISCVISD::ADD_VL:
case RISCVISD::MUL_VL:
case RISCVISD::VWADD_W_VL:
case RISCVISD::VWADDU_W_VL:
return true;
+ case ISD::SUB:
case RISCVISD::SUB_VL:
case RISCVISD::VWSUB_W_VL:
case RISCVISD::VWSUBU_W_VL:
@@ -12885,14 +12957,25 @@ struct CombineResult {
/// Return a value that uses TargetOpcode and that can be used to replace
/// Root.
/// The actual replacement is *not* done in that method.
- SDValue materialize(SelectionDAG &DAG) const {
+ SDValue materialize(SelectionDAG &DAG,
+ const RISCVSubtarget &Subtarget) const {
SDValue Mask, VL, Merge;
- std::tie(Mask, VL) = NodeExtensionHelper::getMaskAndVL(Root);
- Merge = Root->getOperand(2);
+ std::tie(Mask, VL) =
+ NodeExtensionHelper::getMaskAndVL(Root, DAG, Subtarget);
+ switch (Root->getOpcode()) {
+ default:
+ Merge = Root->getOperand(2);
+ break;
+ case ISD::ADD:
+ case ISD::SUB:
+ case ISD::MUL:
+ Merge = DAG.getUNDEF(Root->getValueType(0));
+ break;
+ }
return DAG.getNode(TargetOpcode, SDLoc(Root), Root->getValueType(0),
- LHS.getOrCreateExtendedOp(Root, DAG, SExtLHS),
- RHS.getOrCreateExtendedOp(Root, DAG, SExtRHS), Merge,
- Mask, VL);
+ LHS.getOrCreateExtendedOp(Root, DAG, Subtarget, SExtLHS),
+ RHS.getOrCreateExtendedOp(Root, DAG, Subtarget, SExtRHS),
+ Merge, Mask, VL);
}
};
@@ -12909,15 +12992,16 @@ struct CombineResult {
static std::optional<CombineResult>
canFoldToVWWithSameExtensionImpl(SDNode *Root, const NodeExtensionHelper &LHS,
const NodeExtensionHelper &RHS, bool AllowSExt,
- bool AllowZExt) {
+ bool AllowZExt, SelectionDAG &DAG,
+ const RISCVSubtarget &Subtarget) {
assert((AllowSExt || AllowZExt) && "Forgot to set what you want?");
- if (!LHS.areVLAndMaskCompatible(Root) || !RHS.areVLAndMaskCompatible(Root))
+ if (!LHS.areVLAndMaskCompatible(Root, DAG, Subtarget) ||
+ !RHS.areVLAndMaskCompatible(Root, DAG, Subtarget))
return std::nullopt;
if (AllowZExt && LHS.SupportsZExt && RHS.SupportsZExt)
return CombineResult(NodeExtensionHelper::getSameExtensionOpcode(
Root->getOpcode(), /*IsSExt=*/false),
- Root, LHS, /*SExtLHS=*/false, RHS,
- /*SExtRHS=*/false);
+ Root, LHS, /*SExtLHS=*/false, RHS, /*SExtRHS=*/false);
if (AllowSExt && LHS.SupportsSExt && RHS.SupportsSExt)
return CombineResult(NodeExtensionHelper::getSameExtensionOpcode(
Root->getOpcode(), /*IsSExt=*/true),
@@ -12934,9 +13018,10 @@ canFoldToVWWithSameExtensionImpl(SDNode *Root, const NodeExtensionHelper &LHS,
/// can be used to apply the pattern.
static std::optional<CombineResult>
canFoldToVWWithSameExtension(SDNode *Root, const NodeExtensionHelper &LHS,
- const NodeExtensionHelper &RHS) {
+ const NodeExtensionHelper &RHS, SelectionDAG &DAG,
+ const RISCVSubtarget &Subtarget) {
return canFoldToVWWithSameExtensionImpl(Root, LHS, RHS, /*AllowSExt=*/true,
- /*AllowZExt=*/true);
+ /*AllowZExt=*/true, DAG, Subtarget);
}
/// Check if \p Root follows a pattern Root(LHS, ext(RHS))
@@ -12945,8 +13030,9 @@ canFoldToVWWithSameExtension(SDNode *Root, const NodeExtensionHelper &LHS,
/// can be used to apply the pattern.
static std::optional<CombineResult>
canFoldToVW_W(SDNode *Root, const NodeExtensionHelper &LHS,
- const NodeExtensionHelper &RHS) {
- if (!RHS.areVLAndMaskCompatible(Root))
+ const NodeExtensionHelper &RHS, SelectionDAG &DAG,
+ const RISCVSubtarget &Subtarget) {
+ if (!RHS.areVLAndMaskCompatible(Root, DAG, Subtarget))
return std::nullopt;
// FIXME: Is it useful to form a vwadd.wx or vwsub.wx if it removes a scalar
@@ -12970,9 +13056,10 @@ canFoldToVW_W(SDNode *Root, const NodeExtensionHelper &LHS,
/// can be used to apply the pattern.
static std::optional<CombineResult>
canFoldToVWWithSEXT(SDNode *Root, const NodeExtensionHelper &LHS,
- const NodeExtensionHelper &RHS) {
+ const NodeExtensionHelper &RHS, SelectionDAG &DAG,
+ const RISCVSubtarget &Subtarget) {
return canFoldToVWWithSameExtensionImpl(Root, LHS, RHS, /*AllowSExt=*/true,
- /*AllowZExt=*/false);
+ /*AllowZExt=*/false, DAG, Subtarget);
}
/// Check if \p Root follows a pattern Root(zext(LHS), zext(RHS))
@@ -12981,9 +13068,10 @@ canFoldToVWWithSEXT(SDNode *Root, const NodeExtensionHelper &LHS,
/// can be used to apply the pattern.
static std::optional<CombineResult>
canFoldToVWWithZEXT(SDNode *Root, const NodeExtensionHelper &LHS,
- const NodeExtensionHelper &RHS) {
+ const NodeExtensionHelper &RHS, SelectionDAG &DAG,
+ const RISCVSubtarget &Subtarget) {
return canFoldToVWWithSameExtensionImpl(Root, LHS, RHS, /*AllowSExt=*/false,
- /*AllowZExt=*/true);
+ /*AllowZExt=*/true, DAG, Subtarget);
}
/// Check if \p Root follows a pattern Root(sext(LHS), zext(RHS))
@@ -12992,10 +13080,13 @@ canFoldToVWWithZEXT(SDNode *Root, const NodeExtensionHelper &LHS,
/// can be used to apply the pattern.
static std::optional<CombineResult>
canFoldToVW_SU(SDNode *Root, const NodeExtensionHelper &LHS,
- const NodeExtensionHelper &RHS) {
+ const NodeExtensionHelper &RHS, SelectionDAG &DAG,
+ const RISCVSubtarget &Subtarget) {
+
if (!LHS.SupportsSExt || !RHS.SupportsZExt)
return std::nullopt;
- if (!LHS.areVLAndMaskCompatible(Root) || !RHS.areVLAndMaskCompatible(Root))
+ if (!LHS.areVLAndMaskCompatible(Root, DAG, Subtarget) ||
+ !RHS.areVLAndMaskCompatible(Root, DAG, Subtarget))
return std::nullopt;
return CombineResult(NodeExtensionHelper::getSUOpcode(Root->getOpcode()),
Root, LHS, /*SExtLHS=*/true, RHS, /*SExtRHS=*/false);
@@ -13005,6 +13096,8 @@ SmallVector<NodeExtensionHelper::CombineToTry>
NodeExtensionHelper::getSupportedFoldings(const SDNode *Root) {
SmallVector<CombineToTry> Strategies;
switch (Root->getOpcode()) {
+ case ISD::ADD:
+ case ISD::SUB:
case RISCVISD::ADD_VL:
case RISCVISD::SUB_VL:
// add|sub -> vwadd(u)|vwsub(u)
@@ -13012,6 +13105,7 @@ NodeExtensionHelper::getSupportedFoldings(const SDNode *Root) {
// add|sub -> vwadd(u)_w|vwsub(u)_w
Strategies.push_back(canFoldToVW_W);
break;
+ case ISD::MUL:
case RISCVISD::MUL_VL:
// mul -> vwmul(u)
Strategies.push_back(canFoldToVWWithSameExtension);
@@ -13042,12 +13136,14 @@ NodeExtensionHelper::getSupportedFoldings(const SDNode *Root) {
/// mul_vl -> vwmul(u) | vwmul_su
/// vwadd_w(u) -> vwadd(u)
/// vwub_w(u) -> vwadd(u)
-static SDValue
-combineBinOp_VLToVWBinOp_VL(SDNode *N, TargetLowering::DAGCombinerInfo &DCI) {
+static SDValue combineBinOp_VLToVWBinOp_VL(SDNode *N,
+ TargetLowering::DAGCombinerInfo &DCI,
+ const RISCVSubtarget &Subtarget) {
SelectionDAG &DAG = DCI.DAG;
- assert(NodeExtensionHelper::isSupportedRoot(N) &&
- "Shouldn't have called this method");
+ if (!NodeExtensionHelper::isSupportedRoot(N))
+ return SDValue();
+
SmallVector<SDNode *> Worklist;
SmallSet<SDNode *, 8> Inserted;
Worklist.push_back(N);
@@ -13059,8 +13155,8 @@ combineBinOp_VLToVWBinOp_VL(SDNode *N, TargetLowering::DAGCombinerInfo &DCI) {
if (!NodeExtensionHelper::isSupportedRoot(Root))
return SDValue();
- NodeExtensionHelper LHS(N, 0, DAG);
- NodeExtensionHelper RHS(N, 1, DAG);
+ NodeExtensionHelper LHS(N, 0, DAG, Subtarget);
+ NodeExtensionHelper RHS(N, 1, DAG, Subtarget);
auto AppendUsersIfNeeded = [&Worklist,
&Inserted](const NodeExtensionHelper &Op) {
if (Op.needToPromoteOtherUsers()) {
@@ -13087,7 +13183,8 @@ combineBinOp_VLToVWBinOp_VL(SDNode *N, TargetLowering::DAGCombinerInfo &DCI) {
for (NodeExtensionHelper::CombineToTry FoldingStrategy :
FoldingStrategies) {
- std::optional<CombineResult> Res = FoldingStrategy(N, LHS, RHS);
+ std::optional<CombineResult> Res =
+ FoldingStrategy(N, LHS, RHS, DAG, Subtarget);
if (Res) {
Matched = true;
CombinesToApply.push_back(*Res);
@@ -13116,7 +13213,7 @@ combineBinOp_VLToVWBinOp_VL(SDNode *N, TargetLowering::DAGCombinerInfo &DCI) {
SmallVector<std::pair<SDValue, SDValue>> ValuesToReplace;
ValuesToReplace.reserve(CombinesToApply.size());
for (CombineResult Res : CombinesToApply) {
- SDValue NewValue = Res.materialize(DAG);
+ SDValue NewValue = Res.materialize(DAG, Subtarget);
if (!InputRootReplacement) {
assert(Res.Root == N &&
"First element is expected to be the current node");
@@ -14282,10 +14379,20 @@ static SDValue performCONCAT_VECTORSCombine(SDNode *N, SelectionDAG &DAG,
static SDValue combineToVWMACC(SDNode *N, SelectionDAG &DAG,
const RISCVSubtarget &Subtarget) {
- assert(N->getOpcode() == RISCVISD::ADD_VL);
+
+ assert(N->getOpcode() == RISCVISD::ADD_VL || N->getOpcode() == ISD::ADD);
+
+ if (N->getValueType(0).isFixedLengthVector())
+ return SDValue();
+
SDValue Addend = N->getOperand(0);
SDValue MulOp = N->getOperand(1);
- SDValue AddMergeOp = N->getOperand(2);
+ SDValue AddMergeOp = [](SDNode *N, SelectionDAG &DAG) {
+ if (N->getOpcode() == ISD::ADD)
+ return DAG.getUNDEF(N->getValueType(0));
+ else
+ return N->getOperand(2);
+ }(N, DAG);
if (!AddMergeOp.isUndef())
return SDValue();
@@ -14312,8 +14419,17 @@ static SDValue combineToVWMACC(SDNode *N, SelectionDAG &DAG,
if (!MulMergeOp.isUndef())
return SDValue();
- SDValue AddMask = N->getOperand(3);
- SDValue AddVL = N->getOperand(4);
+ auto [AddMask, AddVL] = [](SDNode *N, SelectionDAG &DAG,
+ const RISCVSubtarget &Subtarget) {
+ if (N->getOpcode() == ISD::ADD) {
+ SDLoc DL(N);
+ return getDefaultScalableVLOps(N->getSimpleValueType(0), DL, DAG,
+ Subtarget)...
[truncated]
|
case ISD::MUL: { | ||
EVT VT0 = Root->getOperand(0).getValueType(); | ||
EVT VT1 = Root->getOperand(1).getValueType(); | ||
if (VT0.isFixedLengthVector() || VT0.isFixedLengthVector()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: VT0 => VT1
@@ -12796,7 +12843,8 @@ struct NodeExtensionHelper { | |||
SupportsZExt = | |||
Opc == RISCVISD::VWADDU_W_VL || Opc == RISCVISD::VWSUBU_W_VL; | |||
SupportsSExt = !SupportsZExt; | |||
std::tie(Mask, VL) = getMaskAndVL(Root); | |||
Mask = Root->getOperand(3); | |||
VL = Root->getOperand(4); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you fix getMaskAndVL
instead of expanding it here?
getDefaultVLOps(VT, ContainerVT, DL, DAG, Subtarget); | ||
} else | ||
std::tie(Mask, VL) = getDefaultScalableVLOps(VT, DL, DAG, Subtarget); | ||
return std::make_pair(Mask, VL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This piece of code looks similar to what you have in fillUpExtensionSupport
.
Could you refactor the code to share it in the different places?
SDValue AddMergeOp = [](SDNode *N, SelectionDAG &DAG) { | ||
if (N->getOpcode() == ISD::ADD) | ||
return DAG.getUNDEF(N->getValueType(0)); | ||
else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No else
after return.
SDLoc DL(N); | ||
return getDefaultScalableVLOps(N->getSimpleValueType(0), DL, DAG, | ||
Subtarget); | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto: no else after return
case ISD::ADD: | ||
case ISD::SUB: | ||
case ISD::MUL: { | ||
EVT VT0 = Root->getOperand(0).getValueType(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be sufficient to check return N->getValueType(0).isScalableVector()
The operands of ISD::ADD/SUB/MUL must all have the same as the result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need o check that the scalable type is legal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, It will be safer to do this, I will fix it soon.
Politely Ping @qcolombet @topperc |
SDValue Addend = N->getOperand(0); | ||
SDValue MulOp = N->getOperand(1); | ||
SDValue AddMergeOp = N->getOperand(2); | ||
SDValue AddMergeOp = [](SDNode *N, SelectionDAG &DAG) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this really need to be a lambda? It looks like AddMergeOp is only needed for the next if. Could we write
if (N->getOpcode() == RISCVISD::ADD_VL) {
SDValue AddMergeOp = N->getOperand(2);
if (AddMergeOp.isUndef())
return SDValue();
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Sorry for repeating ping. @qcolombet |
07ec3eb
to
d678548
Compare
@@ -1231,16 +1231,17 @@ define <vscale x 1 x i64> @ctlz_nxv1i64(<vscale x 1 x i64> %va) { | |||
; | |||
; CHECK-F-LABEL: ctlz_nxv1i64: | |||
; CHECK-F: # %bb.0: | |||
; CHECK-F-NEXT: vsetvli a0, zero, e32, mf2, ta, ma | |||
; CHECK-F-NEXT: li a0, 190 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens to this part? I think this is a regession.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here vrsub.vx
and vzext.vf2
are folded into vwsubu.wv
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explanation! But I meant that we need a vmv.v.x
to construct the immediate. The (micro) instructions may not be reduced and we have more vtype toggles now for this ctlz_nxv1i64
case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your advice! It might be possible to use vwsubu.wx
here.
I will send a patch on it soon :-)
; CHECK-F-NEXT: li a1, 64 | ||
; CHECK-F-NEXT: vminu.vx v8, v8, a1 | ||
; CHECK-F-NEXT: vsetvli zero, zero, e64, m1, ta, ma |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And here. There are more vtype
toggles after this patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And here. There are more
vtype
toggles after this patch.
I hope vtype toggles are cheap on most implementations. Do you know of one where it is expensive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree! I meant something else, please see another comment. :-)
We're seeing a failure from this in our testing. We're creating a VZEXT_VL from an i1 vector which isn't supported. I'll see if I can find a quick fix. |
This test case
and this test case both fail
|
@sun-jacobi I'm going to revert the patch and let you work on a fix. |
Okay, I will fix it soon. |
@topperc Should I create another PR? |
Yes. There's no way to reopen a merged PR. |
Yeah, you cannot reopen a merged PR. |
…/sign extension. (llvm#72340)" This recreate the llvm#72340 reverted by 4e347b4.
…/sign extension." (#76785) This patch was originally introduced in PR #72340, but was reverted due to a bug on invalid extension combine. Specifically, we resolve the case in the #72340 (comment) ``` define <vscale x 1 x i32> @foo(<vscale x 1 x i1> %x, <vscale x 1 x i2> %y) { %a = zext <vscale x 1 x i1> %x to <vscale x 1 x i32> %b = zext <vscale x 1 x i1> %y to <vscale x 1 x i32> %c = add <vscale x 1 x i32> %a, %b ret <vscale x 1 x i32> %c } ``` The previous patch didn't check if the semantic of `ISD::ZERO_EXTEND` and `ISD::ZERO_EXTEND` is equivalent to the `vsext.vf2` or `vzext.vf2` (not ensuring the SEW condition on widening Vector Arithmetic Instructions). Thanks for @topperc pointing out this bug. ## The original description This PR mainly aims at resolving the below missed-optimization case, while it could also be considered as an extension of the previous patch https://reviews.llvm.org/D133739?id= ### Missed-Optimization Case Compiler Explorer: https://godbolt.org/z/GzWzP7Pfh ### Source Code: ``` define <vscale x 2 x i16> @multiple_users(ptr %x, ptr %y, ptr %z) { %a = load <vscale x 2 x i8>, ptr %x %b = load <vscale x 2 x i8>, ptr %y %b2 = load <vscale x 2 x i8>, ptr %z %c = sext <vscale x 2 x i8> %a to <vscale x 2 x i16> %d = sext <vscale x 2 x i8> %b to <vscale x 2 x i16> %d2 = sext <vscale x 2 x i8> %b2 to <vscale x 2 x i16> %e = mul <vscale x 2 x i16> %c, %d %f = add <vscale x 2 x i16> %c, %d2 %g = sub <vscale x 2 x i16> %c, %d2 %h = or <vscale x 2 x i16> %e, %f %i = or <vscale x 2 x i16> %h, %g ret <vscale x 2 x i16> %i } ``` ### Before This Patch ``` # %bb.0: vsetvli a3, zero, e16, mf2, ta, ma vle8.v v8, (a0) vle8.v v9, (a1) vle8.v v10, (a2) svf2 v11, v8 vsext.vf2 v8, v9 vsext.vf2 v9, v10 vmul.vv v8, v11, v8 vadd.vv v10, v11, v9 vsub.vv v9, v11, v9 vor.vv v8, v8, v10 vor.vv v8, v8, v9 ret ``` ### After This Patch ``` # %bb.0: vsetvli a3, zero, e8, mf4, ta, ma vle8.v v8, (a0) vle8.v v9, (a1) vle8.v v10, (a2) vwmul.vv v11, v8, v9 vwadd.vv v9, v8, v10 vwsub.vv v12, v8, v10 vsetvli zero, zero, e16, mf2, ta, ma vor.vv v8, v11, v9 vor.vv v8, v8, v12 ret ``` We can see Add/Sub/Mul are combined with the Sign Extension. ### Relation to the Patch D133739 The patch D133739 introduced an optimization for folding `ADD_VL`/ `SUB_VL` / `MUL_V` with `VSEXT_VL` / `VZEXT_VL`. However, the patch did not consider the case of non-fixed length vector case, thus this PR could also be considered as an extension for the D133739.
…/sign extension." (llvm#76785) This patch was originally introduced in PR llvm#72340, but was reverted due to a bug on invalid extension combine. Specifically, we resolve the case in the llvm#72340 (comment) ``` define <vscale x 1 x i32> @foo(<vscale x 1 x i1> %x, <vscale x 1 x i2> %y) { %a = zext <vscale x 1 x i1> %x to <vscale x 1 x i32> %b = zext <vscale x 1 x i1> %y to <vscale x 1 x i32> %c = add <vscale x 1 x i32> %a, %b ret <vscale x 1 x i32> %c } ``` The previous patch didn't check if the semantic of `ISD::ZERO_EXTEND` and `ISD::ZERO_EXTEND` is equivalent to the `vsext.vf2` or `vzext.vf2` (not ensuring the SEW condition on widening Vector Arithmetic Instructions). Thanks for @topperc pointing out this bug. ## The original description This PR mainly aims at resolving the below missed-optimization case, while it could also be considered as an extension of the previous patch https://reviews.llvm.org/D133739?id= ### Missed-Optimization Case Compiler Explorer: https://godbolt.org/z/GzWzP7Pfh ### Source Code: ``` define <vscale x 2 x i16> @multiple_users(ptr %x, ptr %y, ptr %z) { %a = load <vscale x 2 x i8>, ptr %x %b = load <vscale x 2 x i8>, ptr %y %b2 = load <vscale x 2 x i8>, ptr %z %c = sext <vscale x 2 x i8> %a to <vscale x 2 x i16> %d = sext <vscale x 2 x i8> %b to <vscale x 2 x i16> %d2 = sext <vscale x 2 x i8> %b2 to <vscale x 2 x i16> %e = mul <vscale x 2 x i16> %c, %d %f = add <vscale x 2 x i16> %c, %d2 %g = sub <vscale x 2 x i16> %c, %d2 %h = or <vscale x 2 x i16> %e, %f %i = or <vscale x 2 x i16> %h, %g ret <vscale x 2 x i16> %i } ``` ### Before This Patch ``` # %bb.0: vsetvli a3, zero, e16, mf2, ta, ma vle8.v v8, (a0) vle8.v v9, (a1) vle8.v v10, (a2) svf2 v11, v8 vsext.vf2 v8, v9 vsext.vf2 v9, v10 vmul.vv v8, v11, v8 vadd.vv v10, v11, v9 vsub.vv v9, v11, v9 vor.vv v8, v8, v10 vor.vv v8, v8, v9 ret ``` ### After This Patch ``` # %bb.0: vsetvli a3, zero, e8, mf4, ta, ma vle8.v v8, (a0) vle8.v v9, (a1) vle8.v v10, (a2) vwmul.vv v11, v8, v9 vwadd.vv v9, v8, v10 vwsub.vv v12, v8, v10 vsetvli zero, zero, e16, mf2, ta, ma vor.vv v8, v11, v9 vor.vv v8, v8, v12 ret ``` We can see Add/Sub/Mul are combined with the Sign Extension. ### Relation to the Patch D133739 The patch D133739 introduced an optimization for folding `ADD_VL`/ `SUB_VL` / `MUL_V` with `VSEXT_VL` / `VZEXT_VL`. However, the patch did not consider the case of non-fixed length vector case, thus this PR could also be considered as an extension for the D133739.
…pes. NFCI I noticed this from a discrepancy in fillUpExtensionSupport between how we apparently need to check for legal types for ISD::{ZERO,SIGN}_EXTEND, but we don't need to for RISCVISD::V{Z,S}EXT_VL. Prior to llvm#72340, combineBinOp_VLToVWBinOp_VL only ran after type legalization because it only operated on _VL nodes. _VL nodes are only emitted during op legalization, which takes place **after** type legalization, which is presumably why the existing code didn't need to check for legal types. After llvm#72340 we now handle generic ops like ISD::ADD that exist before op legalization and thus **before** type legalization. This meant that we needed to add extra checks that the narrow type was legal in llvm#76785. I think the easiest thing to do here is to just maintain the invariant that the types are legal and only run the combine after type legalization.
…pes. NFCI I noticed this from a discrepancy in fillUpExtensionSupport between how we apparently need to check for legal types for ISD::{ZERO,SIGN}_EXTEND, but we don't need to for RISCVISD::V{Z,S}EXT_VL. Prior to llvm#72340, combineBinOp_VLToVWBinOp_VL only ran after type legalization because it only operated on _VL nodes. _VL nodes are only emitted during op legalization, which takes place **after** type legalization, which is presumably why the existing code didn't need to check for legal types. After llvm#72340 we now handle generic ops like ISD::ADD that exist before op legalization and thus **before** type legalization. This meant that we needed to add extra checks that the narrow type was legal in llvm#76785. I think the easiest thing to do here is to just maintain the invariant that the types are legal and only run the combine after type legalization.
…pes. NFCI I noticed this from a discrepancy in fillUpExtensionSupport between how we apparently need to check for legal types for ISD::{ZERO,SIGN}_EXTEND, but we don't need to for RISCVISD::V{Z,S}EXT_VL. Prior to llvm#72340, combineBinOp_VLToVWBinOp_VL only ran after type legalization because it only operated on _VL nodes. _VL nodes are only emitted during op legalization, which takes place **after** type legalization, which is presumably why the existing code didn't need to check for legal types. After llvm#72340 we now handle generic ops like ISD::ADD that exist before op legalization and thus **before** type legalization. This meant that we needed to add extra checks that the narrow type was legal in llvm#76785. I think the easiest thing to do here is to just maintain the invariant that the types are legal and only run the combine after type legalization.
…pes. NFCI (#84125) I noticed this from a discrepancy in fillUpExtensionSupport between how we apparently need to check for legal types for ISD::{ZERO,SIGN}_EXTEND, but we don't need to for RISCVISD::V{Z,S}EXT_VL. Prior to #72340, combineBinOp_VLToVWBinOp_VL only ran after type legalization because it only operated on _VL nodes. _VL nodes are only emitted during op legalization, which takes place **after** type legalization, which is presumably why the existing code didn't need to check for legal types. After #72340 we now handle generic ops like ISD::ADD that exist before op legalization and thus **before** type legalization. This meant that we needed to add extra checks that the narrow type was legal in #76785. I think the easiest thing to do here is to just maintain the invariant that the types are legal and only run the combine after type legalization.
This PR mainly aims at resolving the below missed-optimization case, while it could also be considered as an extension of the previous patch https://reviews.llvm.org/D133739?id=
Missed-Optimization Case
Compiler Explorer: https://godbolt.org/z/GzWzP7Pfh
Source Code:
Before This Patch
After This Patch
We can see Add/Sub/Mul are combined with the Sign Extension.
Relation to the Patch D133739
The patch D133739 introduced an optimization for folding
ADD_VL
/SUB_VL
/MUL_V
withVSEXT_VL
/VZEXT_VL
. However, the patch did not consider the case of non-fixed length vector case, thus this PR could also be considered as an extension for the D133739.Furthermore, in the current
SelectionDAG
, we represent scalable vector add (or any binary operator) as a normalADD
operation. It might be better to use an Opcode likeADD_VL
, which needs further conversation and decision.