Skip to content

[VP][RISCV] Add a vp.load.ff intrinsic for fault only first load. #128593

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 15 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 86 additions & 0 deletions llvm/docs/LangRef.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23943,6 +23943,92 @@ Examples:
%also.r = call <8 x i8> @llvm.masked.load.v8i8.p0(ptr %ptr, i32 2, <8 x i1> %mask, <8 x i8> poison)


.. _int_experimental_vp_ff_load:

'``llvm.experimental.vp.ff.load``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:
"""""""
This is an overloaded intrinsic.

::

declare {<4 x float>, i32} @llvm.experimental.vp.load.ff.v4f32.p0(ptr %ptr, <4 x i1> %mask, i32 %evl)
declare {<vscale x 2 x i16>, i32} @llvm.experimental.vp.load.ff.nxv2i16.p0(ptr %ptr, <vscale x 2 x i1> %mask, i32 %evl)
declare {<8 x float>, i32} @llvm.experimental.vp.load.ff.v8f32.p1(ptr addrspace(1) %ptr, <8 x i1> %mask, i32 %evl)
declare {<vscale x 1 x i64>, i32} @llvm.experimental.vp.load.ff.nxv1i64.p6(ptr addrspace(6) %ptr, <vscale x 1 x i1> %mask, i32 %evl)

Overview:
"""""""""

The '``llvm.experimental.vp.load.ff.*``' intrinsic is similar to
'``llvm.vp.load.*``', but will not trap if there are not ``evl`` readable
lanes at the pointer. '``ff``' stands for fault-first or fault-only-first.

Arguments:
""""""""""

The first argument is the base pointer for the load. The second argument is a
vector of boolean values with the same number of elements as the first return
type. The third is the explicit vector length of the operation. The first
return type and underlying type of the base pointer are the same vector types.

The :ref:`align <attr_align>` parameter attribute can be provided for the first
argument.

Semantics:
""""""""""

The '``llvm.experimental.vp.load.ff``' is designed for reading vector lanes in a single
IR operation where the number of lanes that can be read is not known and can
only be determined by looking at the data. This is useful for vectorizing
strcmp or strlen like loops where the data contains a null terminator. Some
targets have a fault-only-first load instruction that this intrinsic can be
lowered to. Other targets may support this intrinsic differently, for example by
lowering to a single scalar load guarded by ``evl!=0`` and ``mask[0]==1`` and
indicating only 1 lane could be read.

Like '``llvm.vp.load``', this intrinsic reads memory based on a ``mask`` and an
``evl``. If ``evl`` is non-zero and the first lane is masked-on, then the
first lane of the vector needs to be inbounds of an allocation. The remaining
masked-on lanes with index less than ``evl`` do not need to be inbounds of
an the same allocation or any allocation.

The second return value from the intrinsic indicates the index of the first
lane that could not be read for some reason or ``evl`` if all lanes could be
be read. Lanes at this index or higher in the first return value are
:ref:`poison value <poisonvalues>`. If ``evl`` is non-zero, the result in the
second return value must be at least 1, even if the first lane is masked-off.

The second result is usually less than ``evl`` when an exception would occur
for reading that lane, but it can be reduced for any reason. This facilitates
emulating this intrinsic when the hardware only supports narrower vector
types natively or when when hardware does not support fault-only-first loads.

Masked-on lanes that are not inbounds of the allocation that contains the first
lane are :ref:`poison value <poisonvalues>`. There should be a marker in the
allocation that indicates where valid data stops such as a null terminator. The
terminator should be checked for after calling this intrinsic to prevent using
any lanes past the terminator. Even if second return value is less than
``evl``, the terminator value may not have been read.

This intrinsic will typically be called in a loop until a terminator is
found. The second result should be used to indicates how many elements are
valid to look for the null terminator. If the terminator is not found, the
pointer should be advanced by the number of elements in the second result and
the intrinsic called again.

The default alignment is taken as the ABI alignment of the first return
type as specified by the :ref:`datalayout string<langref_datalayout>`.

Examples:
"""""""""

.. code-block:: text

%r = call {<8 x i8>, i32} @llvm.experimental.vp.load.ff.v8i8.p0(ptr align 2 %ptr, <8 x i1> %mask, i32 %evl)

.. _int_vp_store:

'``llvm.vp.store``' Intrinsic
Expand Down
2 changes: 2 additions & 0 deletions llvm/include/llvm/CodeGen/SelectionDAG.h
Original file line number Diff line number Diff line change
Expand Up @@ -1572,6 +1572,8 @@ class SelectionDAG {
SDValue getMaskedHistogram(SDVTList VTs, EVT MemVT, const SDLoc &dl,
ArrayRef<SDValue> Ops, MachineMemOperand *MMO,
ISD::MemIndexType IndexType);
SDValue getLoadFFVP(EVT VT, const SDLoc &dl, SDValue Chain, SDValue Ptr,
SDValue Mask, SDValue EVL, MachineMemOperand *MMO);

SDValue getGetFPEnv(SDValue Chain, const SDLoc &dl, SDValue Ptr, EVT MemVT,
MachineMemOperand *MMO);
Expand Down
17 changes: 17 additions & 0 deletions llvm/include/llvm/CodeGen/SelectionDAGNodes.h
Original file line number Diff line number Diff line change
Expand Up @@ -3057,6 +3057,23 @@ class MaskedHistogramSDNode : public MaskedGatherScatterSDNode {
}
};

class VPLoadFFSDNode : public MemSDNode {
public:
friend class SelectionDAG;

VPLoadFFSDNode(unsigned Order, const DebugLoc &dl, SDVTList VTs, EVT MemVT,
MachineMemOperand *MMO)
: MemSDNode(ISD::VP_LOAD_FF, Order, dl, VTs, MemVT, MMO) {}

const SDValue &getBasePtr() const { return getOperand(1); }
const SDValue &getMask() const { return getOperand(2); }
const SDValue &getVectorLength() const { return getOperand(3); }

static bool classof(const SDNode *N) {
return N->getOpcode() == ISD::VP_LOAD_FF;
}
};

class FPStateAccessSDNode : public MemSDNode {
public:
friend class SelectionDAG;
Expand Down
6 changes: 6 additions & 0 deletions llvm/include/llvm/IR/Intrinsics.td
Original file line number Diff line number Diff line change
Expand Up @@ -1912,6 +1912,12 @@ def int_vp_load : DefaultAttrsIntrinsic<[ llvm_anyvector_ty],
llvm_i32_ty],
[ NoCapture<ArgIndex<0>>, IntrNoSync, IntrReadMem, IntrWillReturn, IntrArgMemOnly ]>;

def int_experimental_vp_load_ff : DefaultAttrsIntrinsic<[ llvm_anyvector_ty, llvm_i32_ty ],
[ llvm_anyptr_ty,
LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
llvm_i32_ty],
[ NoCapture<ArgIndex<0>>, IntrNoSync, IntrReadMem, IntrWillReturn, IntrArgMemOnly ]>;

def int_vp_gather: DefaultAttrsIntrinsic<[ llvm_anyvector_ty],
[ LLVMVectorOfAnyPointersToElt<0>,
LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
Expand Down
6 changes: 6 additions & 0 deletions llvm/include/llvm/IR/VPIntrinsics.def
Original file line number Diff line number Diff line change
Expand Up @@ -587,6 +587,12 @@ VP_PROPERTY_FUNCTIONAL_OPC(Load)
VP_PROPERTY_FUNCTIONAL_INTRINSIC(masked_load)
END_REGISTER_VP(vp_load, VP_LOAD)

BEGIN_REGISTER_VP_INTRINSIC(experimental_vp_load_ff, 1, 2)
// val,chain = VP_LOAD_FF chain,base,mask,evl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to ask why this doesn't have an offset operand like the others so it could inherit from VPBaseLoadStoreSDNode, but then I realised we don't use the offset in RISCVISelLowering for regular vp.load/store.

SelectionDAGBuilder doesn't set it, I think it's always undef. Not for this PR but can we remove it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to ask why this doesn't have an offset operand like the others so it could inherit from VPBaseLoadStoreSDNode, but then I realised we don't use the offset in RISCVISelLowering for regular vp.load/store.

SelectionDAGBuilder doesn't set it, I think it's always undef. Not for this PR but can we remove it?

I think we can probably remove Offset and Addressing mode until a target comes along that needs them.

BEGIN_REGISTER_VP_SDNODE(VP_LOAD_FF, -1, experimental_vp_load_ff, 2, 3)
HELPER_MAP_VPID_TO_VPSD(experimental_vp_load_ff, VP_LOAD_FF)
VP_PROPERTY_NO_FUNCTIONAL
END_REGISTER_VP(experimental_vp_load_ff, VP_LOAD_FF)
// llvm.experimental.vp.strided.load(ptr,stride,mask,vlen)
BEGIN_REGISTER_VP_INTRINSIC(experimental_vp_strided_load, 2, 3)
// chain = EXPERIMENTAL_VP_STRIDED_LOAD chain,base,offset,stride,mask,evl
Expand Down
2 changes: 2 additions & 0 deletions llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
Original file line number Diff line number Diff line change
Expand Up @@ -958,6 +958,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
void SplitVecRes_INSERT_VECTOR_ELT(SDNode *N, SDValue &Lo, SDValue &Hi);
void SplitVecRes_LOAD(LoadSDNode *LD, SDValue &Lo, SDValue &Hi);
void SplitVecRes_VP_LOAD(VPLoadSDNode *LD, SDValue &Lo, SDValue &Hi);
void SplitVecRes_VP_LOAD_FF(VPLoadFFSDNode *LD, SDValue &Lo, SDValue &Hi);
void SplitVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *SLD, SDValue &Lo,
SDValue &Hi);
void SplitVecRes_MLOAD(MaskedLoadSDNode *MLD, SDValue &Lo, SDValue &Hi);
Expand Down Expand Up @@ -1060,6 +1061,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
SDValue WidenVecRes_INSERT_VECTOR_ELT(SDNode* N);
SDValue WidenVecRes_LOAD(SDNode* N);
SDValue WidenVecRes_VP_LOAD(VPLoadSDNode *N);
SDValue WidenVecRes_VP_LOAD_FF(VPLoadFFSDNode *N);
SDValue WidenVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *N);
SDValue WidenVecRes_VECTOR_COMPRESS(SDNode *N);
SDValue WidenVecRes_MLOAD(MaskedLoadSDNode* N);
Expand Down
68 changes: 68 additions & 0 deletions llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1163,6 +1163,9 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, unsigned ResNo) {
case ISD::VP_LOAD:
SplitVecRes_VP_LOAD(cast<VPLoadSDNode>(N), Lo, Hi);
break;
case ISD::VP_LOAD_FF:
SplitVecRes_VP_LOAD_FF(cast<VPLoadFFSDNode>(N), Lo, Hi);
break;
case ISD::EXPERIMENTAL_VP_STRIDED_LOAD:
SplitVecRes_VP_STRIDED_LOAD(cast<VPStridedLoadSDNode>(N), Lo, Hi);
break;
Expand Down Expand Up @@ -2232,6 +2235,45 @@ void DAGTypeLegalizer::SplitVecRes_VP_LOAD(VPLoadSDNode *LD, SDValue &Lo,
ReplaceValueWith(SDValue(LD, 1), Ch);
}

void DAGTypeLegalizer::SplitVecRes_VP_LOAD_FF(VPLoadFFSDNode *LD, SDValue &Lo,
SDValue &Hi) {
SDLoc dl(LD);
auto [LoVT, HiVT] = DAG.GetSplitDestVTs(LD->getValueType(0));

SDValue Ch = LD->getChain();
SDValue Ptr = LD->getBasePtr();
Align Alignment = LD->getOriginalAlign();
SDValue Mask = LD->getMask();
SDValue EVL = LD->getVectorLength();

// Split Mask operand
SDValue MaskLo, MaskHi;
if (Mask.getOpcode() == ISD::SETCC) {
SplitVecRes_SETCC(Mask.getNode(), MaskLo, MaskHi);
} else {
if (getTypeAction(Mask.getValueType()) == TargetLowering::TypeSplitVector)
GetSplitVector(Mask, MaskLo, MaskHi);
else
std::tie(MaskLo, MaskHi) = DAG.SplitVector(Mask, dl);
}

// Split EVL operand
auto [EVLLo, EVLHi] = DAG.SplitEVL(EVL, LD->getValueType(0), dl);

MachineMemOperand *MMO = DAG.getMachineFunction().getMachineMemOperand(
LD->getPointerInfo(), MachineMemOperand::MOLoad,
LocationSize::beforeOrAfterPointer(), Alignment, LD->getAAInfo(),
LD->getRanges());

Lo = DAG.getLoadFFVP(LoVT, dl, Ch, Ptr, MaskLo, EVLLo, MMO);

// Fill the upper half with poison.
Hi = DAG.getUNDEF(HiVT);

ReplaceValueWith(SDValue(LD, 1), Lo.getValue(1));
ReplaceValueWith(SDValue(LD, 2), Lo.getValue(2));
}

void DAGTypeLegalizer::SplitVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *SLD,
SDValue &Lo, SDValue &Hi) {
assert(SLD->isUnindexed() &&
Expand Down Expand Up @@ -4599,6 +4641,9 @@ void DAGTypeLegalizer::WidenVectorResult(SDNode *N, unsigned ResNo) {
case ISD::VP_LOAD:
Res = WidenVecRes_VP_LOAD(cast<VPLoadSDNode>(N));
break;
case ISD::VP_LOAD_FF:
Res = WidenVecRes_VP_LOAD_FF(cast<VPLoadFFSDNode>(N));
break;
case ISD::EXPERIMENTAL_VP_STRIDED_LOAD:
Res = WidenVecRes_VP_STRIDED_LOAD(cast<VPStridedLoadSDNode>(N));
break;
Expand Down Expand Up @@ -6063,6 +6108,29 @@ SDValue DAGTypeLegalizer::WidenVecRes_VP_LOAD(VPLoadSDNode *N) {
return Res;
}

SDValue DAGTypeLegalizer::WidenVecRes_VP_LOAD_FF(VPLoadFFSDNode *N) {
EVT WidenVT = TLI.getTypeToTransformTo(*DAG.getContext(), N->getValueType(0));
SDValue Mask = N->getMask();
SDValue EVL = N->getVectorLength();
SDLoc dl(N);

// The mask should be widened as well
assert(getTypeAction(Mask.getValueType()) ==
TargetLowering::TypeWidenVector &&
"Unable to widen binary VP op");
Mask = GetWidenedVector(Mask);
assert(Mask.getValueType().getVectorElementCount() ==
TLI.getTypeToTransformTo(*DAG.getContext(), Mask.getValueType())
.getVectorElementCount() &&
"Unable to widen vector load");

SDValue Res = DAG.getLoadFFVP(WidenVT, dl, N->getChain(), N->getBasePtr(),
Mask, EVL, N->getMemOperand());
ReplaceValueWith(SDValue(N, 1), Res.getValue(1));
ReplaceValueWith(SDValue(N, 2), Res.getValue(2));
return Res;
}

SDValue DAGTypeLegalizer::WidenVecRes_VP_STRIDED_LOAD(VPStridedLoadSDNode *N) {
SDLoc DL(N);

Expand Down
36 changes: 36 additions & 0 deletions llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -846,6 +846,14 @@ static void AddNodeIDCustom(FoldingSetNodeID &ID, const SDNode *N) {
ID.AddInteger(ELD->getMemOperand()->getFlags());
break;
}
case ISD::VP_LOAD_FF: {
const VPLoadFFSDNode *LD = cast<VPLoadFFSDNode>(N);
ID.AddInteger(LD->getMemoryVT().getRawBits());
ID.AddInteger(LD->getRawSubclassData());
ID.AddInteger(LD->getPointerInfo().getAddrSpace());
ID.AddInteger(LD->getMemOperand()->getFlags());
break;
}
case ISD::VP_STORE: {
const VPStoreSDNode *EST = cast<VPStoreSDNode>(N);
ID.AddInteger(EST->getMemoryVT().getRawBits());
Expand Down Expand Up @@ -10123,6 +10131,34 @@ SDValue SelectionDAG::getMaskedHistogram(SDVTList VTs, EVT MemVT,
return V;
}

SDValue SelectionDAG::getLoadFFVP(EVT VT, const SDLoc &dl, SDValue Chain,
SDValue Ptr, SDValue Mask, SDValue EVL,
MachineMemOperand *MMO) {
SDVTList VTs = getVTList(VT, EVL.getValueType(), MVT::Other);
SDValue Ops[] = {Chain, Ptr, Mask, EVL};
FoldingSetNodeID ID;
AddNodeIDNode(ID, ISD::VP_LOAD_FF, VTs, Ops);
ID.AddInteger(VT.getRawBits());
ID.AddInteger(getSyntheticNodeSubclassData<VPLoadFFSDNode>(dl.getIROrder(),
VTs, VT, MMO));
ID.AddInteger(MMO->getPointerInfo().getAddrSpace());
ID.AddInteger(MMO->getFlags());
void *IP = nullptr;
if (SDNode *E = FindNodeOrInsertPos(ID, dl, IP)) {
cast<VPLoadFFSDNode>(E)->refineAlignment(MMO);
return SDValue(E, 0);
}
auto *N = newSDNode<VPLoadFFSDNode>(dl.getIROrder(), dl.getDebugLoc(), VTs,
VT, MMO);
createOperands(N, Ops);

CSEMap.InsertNode(N, IP);
InsertNode(N);
SDValue V(N, 0);
NewSDValueDbgMsg(V, "Creating new node: ", this);
return V;
}

SDValue SelectionDAG::getGetFPEnv(SDValue Chain, const SDLoc &dl, SDValue Ptr,
EVT MemVT, MachineMemOperand *MMO) {
assert(Chain.getValueType() == MVT::Other && "Invalid chain type");
Expand Down
31 changes: 31 additions & 0 deletions llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -8473,6 +8473,34 @@ void SelectionDAGBuilder::visitVPLoad(
setValue(&VPIntrin, LD);
}

void SelectionDAGBuilder::visitVPLoadFF(
const VPIntrinsic &VPIntrin, EVT VT, EVT EVLVT,
const SmallVectorImpl<SDValue> &OpValues) {
assert(OpValues.size() == 3);
SDLoc DL = getCurSDLoc();
Value *PtrOperand = VPIntrin.getArgOperand(0);
MaybeAlign Alignment = VPIntrin.getPointerAlignment();
AAMDNodes AAInfo = VPIntrin.getAAMetadata();
const MDNode *Ranges = VPIntrin.getMetadata(LLVMContext::MD_range);
SDValue LD;
// Do not serialize variable-length loads of constant memory with
// anything.
if (!Alignment)
Alignment = DAG.getEVTAlign(VT);
MemoryLocation ML = MemoryLocation::getAfter(PtrOperand, AAInfo);
bool AddToChain = !BatchAA || !BatchAA->pointsToConstantMemory(ML);
SDValue InChain = AddToChain ? DAG.getRoot() : DAG.getEntryNode();
MachineMemOperand *MMO = DAG.getMachineFunction().getMachineMemOperand(
MachinePointerInfo(PtrOperand), MachineMemOperand::MOLoad,
LocationSize::beforeOrAfterPointer(), *Alignment, AAInfo, Ranges);
LD = DAG.getLoadFFVP(VT, DL, InChain, OpValues[0], OpValues[1], OpValues[2],
MMO);
SDValue Trunc = DAG.getNode(ISD::TRUNCATE, DL, EVLVT, LD.getValue(1));
if (AddToChain)
PendingLoads.push_back(LD.getValue(2));
setValue(&VPIntrin, DAG.getMergeValues({LD.getValue(0), Trunc}, DL));
}

void SelectionDAGBuilder::visitVPGather(
const VPIntrinsic &VPIntrin, EVT VT,
const SmallVectorImpl<SDValue> &OpValues) {
Expand Down Expand Up @@ -8706,6 +8734,9 @@ void SelectionDAGBuilder::visitVectorPredicationIntrinsic(
case ISD::VP_LOAD:
visitVPLoad(VPIntrin, ValueVTs[0], OpValues);
break;
case ISD::VP_LOAD_FF:
visitVPLoadFF(VPIntrin, ValueVTs[0], ValueVTs[1], OpValues);
break;
case ISD::VP_GATHER:
visitVPGather(VPIntrin, ValueVTs[0], OpValues);
break;
Expand Down
2 changes: 2 additions & 0 deletions llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
Original file line number Diff line number Diff line change
Expand Up @@ -632,6 +632,8 @@ class SelectionDAGBuilder {
void visitVectorExtractLastActive(const CallInst &I, unsigned Intrinsic);
void visitVPLoad(const VPIntrinsic &VPIntrin, EVT VT,
const SmallVectorImpl<SDValue> &OpValues);
void visitVPLoadFF(const VPIntrinsic &VPIntrin, EVT VT, EVT EVLVT,
const SmallVectorImpl<SDValue> &OpValues);
void visitVPStore(const VPIntrinsic &VPIntrin,
const SmallVectorImpl<SDValue> &OpValues);
void visitVPGather(const VPIntrinsic &VPIntrin, EVT VT,
Expand Down
5 changes: 5 additions & 0 deletions llvm/lib/IR/IntrinsicInst.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -448,6 +448,7 @@ VPIntrinsic::getMemoryPointerParamPos(Intrinsic::ID VPID) {
case Intrinsic::experimental_vp_strided_store:
return 1;
case Intrinsic::vp_load:
case Intrinsic::experimental_vp_load_ff:
case Intrinsic::vp_gather:
case Intrinsic::experimental_vp_strided_load:
return 0;
Expand Down Expand Up @@ -671,6 +672,10 @@ Function *VPIntrinsic::getOrInsertDeclarationForParams(
VPFunc = Intrinsic::getOrInsertDeclaration(
M, VPID, {ReturnType, Params[0]->getType()});
break;
case Intrinsic::experimental_vp_load_ff:
VPFunc = Intrinsic::getOrInsertDeclaration(
M, VPID, {ReturnType->getStructElementType(0), Params[0]->getType()});
break;
case Intrinsic::experimental_vp_strided_load:
VPFunc = Intrinsic::getOrInsertDeclaration(
M, VPID, {ReturnType, Params[0]->getType(), Params[1]->getType()});
Expand Down
Loading