Skip to content

[IR] Initial introduction of memset_pattern #97583

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

asb
Copy link
Contributor

@asb asb commented Jul 3, 2024

Supersedes the draft PR #94992, taking a different approach following feedback:

  • Lower in PreISelIntrinsicLowering
  • Don't require that the number of bytes to set is a compile-time constant
  • Define llvm.memset_pattern rather than llvm.memset_pattern.inline

As discussed in the RFC thread, the intent is that the intrinsic will be lowered to loops, a sequence of stores, or libcalls depending on the expected cost and availability of libcalls on the target. Right now, there's just a single lowering path that aims to handle all cases. My intent would be to follow up with additional PRs that add additional optimisations when possible (e.g. when libcalls are available, when arguments are known to be constant etc).

@llvmbot
Copy link
Member

llvmbot commented Jul 3, 2024

@llvm/pr-subscribers-backend-powerpc
@llvm/pr-subscribers-llvm-ir

@llvm/pr-subscribers-llvm-transforms

Author: Alex Bradbury (asb)

Changes

Supersedes the draft PR #94992, taking a different approach following feedback:

  • Lower in PreISelIntrinsicLowering
  • Don't require that the number of bytes to set is a compile-time constant
  • Define llvm.memset_pattern rather than llvm.memset_pattern.inline

As discussed in the RFC thread, the intent is that the intrinsic will be lowered to loops, a sequence of stores, or libcalls depending on the expected cost and availability of libcalls on the target. Right now, there's just a single lowering path that aims to handle all cases (except hasn't been generalised beyond i128 (equivalent to memset_pattern16 yet)). My intent would be to clean this up, then follow up with additional PRs that add additional optimisations when possible (e.g. when libcalls are available, when arguments are known to be constant etc).


Patch is 42.45 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/97583.diff

12 Files Affected:

  • (modified) llvm/docs/LangRef.rst (+56)
  • (modified) llvm/include/llvm/IR/InstVisitor.h (+3)
  • (modified) llvm/include/llvm/IR/IntrinsicInst.h (+21-1)
  • (modified) llvm/include/llvm/IR/Intrinsics.td (+8)
  • (modified) llvm/lib/CodeGen/PreISelIntrinsicLowering.cpp (+8)
  • (modified) llvm/lib/IR/Verifier.cpp (+2-1)
  • (modified) llvm/lib/Transforms/Utils/LowerMemIntrinsics.cpp (+110)
  • (added) llvm/test/CodeGen/RISCV/memset-pattern.ll (+591)
  • (added) llvm/test/Transforms/PreISelIntrinsicLowering/RISCV/lit.local.cfg (+2)
  • (added) llvm/test/Transforms/PreISelIntrinsicLowering/RISCV/memset-pattern.ll (+162)
  • (modified) llvm/test/Verifier/intrinsic-immarg.ll (+9)
  • (added) llvm/test/Verifier/memset-pattern-inline.ll (+9)
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index c98332d3a24fc..2641dc1a45348 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -15231,6 +15231,62 @@ The behavior of '``llvm.memset.inline.*``' is equivalent to the behavior of
 '``llvm.memset.*``', but the generated code is guaranteed not to call any
 external functions.
 
+.. _int_memset_pattern:
+
+'``llvm.memset_pattern``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+This is an overloaded intrinsic. You can use ``llvm.memset_pattern`` on
+any integer bit width and for different address spaces. Not all targets
+support all bit widths however.
+
+::
+
+      declare void @llvm.memset_pattern.p0.i64.i128(ptr <dest>, i128 <val>,
+                                                    i64 <len>, i1 <isvolatile>)
+
+Overview:
+"""""""""
+
+The '``llvm.memset_pattern.*``' intrinsics fill a block of memory with
+a particular value. This may be expanded to an inline loop, a sequence of
+stores, or a libcall depending on what is available for the target and the
+expected performance and code size impact.
+
+Arguments:
+""""""""""
+
+The first argument is a pointer to the destination to fill, the second
+is the value with which to fill it, the third argument is an integer
+argument specifying the number of bytes to fill, and the fourth is a boolean
+indicating a volatile access.
+
+The :ref:`align <attr_align>` parameter attribute can be provided
+for the first argument.
+
+If the ``isvolatile`` parameter is ``true``, the
+``llvm.memset_pattern`` call is a :ref:`volatile operation <volatile>`. The
+detailed access behavior is not very cleanly specified and it is unwise to
+depend on it.
+
+Semantics:
+""""""""""
+
+The '``llvm.memset_pattern.*``' intrinsics fill "len" bytes of memory
+starting at the destination location. If the argument is known to be aligned
+to some boundary, this can be specified as an attribute on the argument.
+
+If ``<len>`` is not an integer multiple of the pattern width in bytes, then any
+remainder bytes will be copied from ``<val>``.
+If ``<len>`` is 0, it is no-op modulo the behavior of attributes attached to
+the arguments.
+If ``<len>`` is not a well-defined value, the behavior is undefined.
+If ``<len>`` is not zero, ``<dest>`` should be well-defined, otherwise the
+behavior is undefined.
+
 .. _int_sqrt:
 
 '``llvm.sqrt.*``' Intrinsic
diff --git a/llvm/include/llvm/IR/InstVisitor.h b/llvm/include/llvm/IR/InstVisitor.h
index 311e0ac47ddfa..aa4f0f36e4ed7 100644
--- a/llvm/include/llvm/IR/InstVisitor.h
+++ b/llvm/include/llvm/IR/InstVisitor.h
@@ -208,6 +208,7 @@ class InstVisitor {
   RetTy visitDbgInfoIntrinsic(DbgInfoIntrinsic &I){ DELEGATE(IntrinsicInst); }
   RetTy visitMemSetInst(MemSetInst &I)            { DELEGATE(MemIntrinsic); }
   RetTy visitMemSetInlineInst(MemSetInlineInst &I){ DELEGATE(MemSetInst); }
+  RetTy visitMemSetPatternInst(MemSetPatternInst &I) { DELEGATE(MemSetInst); }
   RetTy visitMemCpyInst(MemCpyInst &I)            { DELEGATE(MemTransferInst); }
   RetTy visitMemCpyInlineInst(MemCpyInlineInst &I){ DELEGATE(MemCpyInst); }
   RetTy visitMemMoveInst(MemMoveInst &I)          { DELEGATE(MemTransferInst); }
@@ -295,6 +296,8 @@ class InstVisitor {
       case Intrinsic::memset:      DELEGATE(MemSetInst);
       case Intrinsic::memset_inline:
         DELEGATE(MemSetInlineInst);
+      case Intrinsic::memset_pattern:
+        DELEGATE(MemSetPatternInst);
       case Intrinsic::vastart:     DELEGATE(VAStartInst);
       case Intrinsic::vaend:       DELEGATE(VAEndInst);
       case Intrinsic::vacopy:      DELEGATE(VACopyInst);
diff --git a/llvm/include/llvm/IR/IntrinsicInst.h b/llvm/include/llvm/IR/IntrinsicInst.h
index 3963a5c8ab8f9..72439ebc14a54 100644
--- a/llvm/include/llvm/IR/IntrinsicInst.h
+++ b/llvm/include/llvm/IR/IntrinsicInst.h
@@ -1171,6 +1171,7 @@ class MemIntrinsic : public MemIntrinsicBase<MemIntrinsic> {
     case Intrinsic::memmove:
     case Intrinsic::memset:
     case Intrinsic::memset_inline:
+    case Intrinsic::memset_pattern:
     case Intrinsic::memcpy_inline:
       return true;
     default:
@@ -1182,7 +1183,8 @@ class MemIntrinsic : public MemIntrinsicBase<MemIntrinsic> {
   }
 };
 
-/// This class wraps the llvm.memset and llvm.memset.inline intrinsics.
+/// This class wraps the llvm.memset, llvm.memset.inline, and
+/// llvm.memset_pattern intrinsics.
 class MemSetInst : public MemSetBase<MemIntrinsic> {
 public:
   // Methods for support type inquiry through isa, cast, and dyn_cast:
@@ -1190,6 +1192,7 @@ class MemSetInst : public MemSetBase<MemIntrinsic> {
     switch (I->getIntrinsicID()) {
     case Intrinsic::memset:
     case Intrinsic::memset_inline:
+    case Intrinsic::memset_pattern:
       return true;
     default:
       return false;
@@ -1215,6 +1218,21 @@ class MemSetInlineInst : public MemSetInst {
   }
 };
 
+/// This class wraps the llvm.memset.pattern intrinsic.
+class MemSetPatternInst : public MemSetInst {
+public:
+  ConstantInt *getLength() const {
+    return cast<ConstantInt>(MemSetInst::getLength());
+  }
+  // Methods for support type inquiry through isa, cast, and dyn_cast:
+  static bool classof(const IntrinsicInst *I) {
+    return I->getIntrinsicID() == Intrinsic::memset_pattern;
+  }
+  static bool classof(const Value *V) {
+    return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));
+  }
+};
+
 /// This class wraps the llvm.memcpy/memmove intrinsics.
 class MemTransferInst : public MemTransferBase<MemIntrinsic> {
 public:
@@ -1294,6 +1312,7 @@ class AnyMemIntrinsic : public MemIntrinsicBase<AnyMemIntrinsic> {
     case Intrinsic::memmove:
     case Intrinsic::memset:
     case Intrinsic::memset_inline:
+    case Intrinsic::memset_pattern:
     case Intrinsic::memcpy_element_unordered_atomic:
     case Intrinsic::memmove_element_unordered_atomic:
     case Intrinsic::memset_element_unordered_atomic:
@@ -1316,6 +1335,7 @@ class AnyMemSetInst : public MemSetBase<AnyMemIntrinsic> {
     switch (I->getIntrinsicID()) {
     case Intrinsic::memset:
     case Intrinsic::memset_inline:
+    case Intrinsic::memset_pattern:
     case Intrinsic::memset_element_unordered_atomic:
       return true;
     default:
diff --git a/llvm/include/llvm/IR/Intrinsics.td b/llvm/include/llvm/IR/Intrinsics.td
index c7d383a5d0c0c..09e759e0e25b3 100644
--- a/llvm/include/llvm/IR/Intrinsics.td
+++ b/llvm/include/llvm/IR/Intrinsics.td
@@ -1003,6 +1003,14 @@ def int_memset_inline
        NoCapture<ArgIndex<0>>, WriteOnly<ArgIndex<0>>,
        ImmArg<ArgIndex<2>>, ImmArg<ArgIndex<3>>]>;
 
+// Memset variant that writes a given pattern.
+def int_memset_pattern
+    : Intrinsic<[],
+      [llvm_anyptr_ty, llvm_anyint_ty, llvm_anyint_ty, llvm_i1_ty],
+      [IntrWriteMem, IntrArgMemOnly, IntrWillReturn, IntrNoFree, IntrNoCallback,
+       NoCapture<ArgIndex<0>>, WriteOnly<ArgIndex<0>>,
+       ImmArg<ArgIndex<3>>], "llvm.memset_pattern">;
+
 // FIXME: Add version of these floating point intrinsics which allow non-default
 // rounding modes and FP exception handling.
 
diff --git a/llvm/lib/CodeGen/PreISelIntrinsicLowering.cpp b/llvm/lib/CodeGen/PreISelIntrinsicLowering.cpp
index 0777acf633187..59e06dd92be14 100644
--- a/llvm/lib/CodeGen/PreISelIntrinsicLowering.cpp
+++ b/llvm/lib/CodeGen/PreISelIntrinsicLowering.cpp
@@ -263,6 +263,13 @@ bool PreISelIntrinsicLowering::expandMemIntrinsicUses(Function &F) const {
 
       break;
     }
+    case Intrinsic::memset_pattern: {
+      auto *Memset = cast<MemSetPatternInst>(Inst);
+      expandMemSetAsLoop(Memset);
+      Changed = true;
+      Memset->eraseFromParent();
+      break;
+    }
     default:
       llvm_unreachable("unhandled intrinsic");
     }
@@ -280,6 +287,7 @@ bool PreISelIntrinsicLowering::lowerIntrinsics(Module &M) const {
     case Intrinsic::memcpy:
     case Intrinsic::memmove:
     case Intrinsic::memset:
+    case Intrinsic::memset_pattern:
       Changed |= expandMemIntrinsicUses(F);
       break;
     case Intrinsic::load_relative:
diff --git a/llvm/lib/IR/Verifier.cpp b/llvm/lib/IR/Verifier.cpp
index c98f61d555140..a0bef394ecee3 100644
--- a/llvm/lib/IR/Verifier.cpp
+++ b/llvm/lib/IR/Verifier.cpp
@@ -5446,7 +5446,8 @@ void Verifier::visitIntrinsicCall(Intrinsic::ID ID, CallBase &Call) {
   case Intrinsic::memcpy_inline:
   case Intrinsic::memmove:
   case Intrinsic::memset:
-  case Intrinsic::memset_inline: {
+  case Intrinsic::memset_inline:
+  case Intrinsic::memset_pattern: {
     break;
   }
   case Intrinsic::memcpy_element_unordered_atomic:
diff --git a/llvm/lib/Transforms/Utils/LowerMemIntrinsics.cpp b/llvm/lib/Transforms/Utils/LowerMemIntrinsics.cpp
index d84e9f094e03a..3adcc51a81ed5 100644
--- a/llvm/lib/Transforms/Utils/LowerMemIntrinsics.cpp
+++ b/llvm/lib/Transforms/Utils/LowerMemIntrinsics.cpp
@@ -455,6 +455,106 @@ static void createMemMoveLoop(Instruction *InsertBefore, Value *SrcAddr,
   ElseTerm->eraseFromParent();
 }
 
+static void createMemSetPatternLoop(Instruction *InsertBefore, Value *DstAddr,
+                                    Value *CopyLen, Value *SetValue,
+                                    Align DstAlign, bool IsVolatile) {
+
+  // To start with, let's assume SetValue is an i128 and bail out if it's not.
+  if (SetValue->getType()->getScalarSizeInBits() != 128) {
+    report_fatal_error("Only 128-bit variant supported for now");
+  }
+  unsigned PatternSize = SetValue->getType()->getScalarSizeInBits() / 8;
+
+  Type *TypeOfCopyLen = CopyLen->getType();
+  BasicBlock *OrigBB = InsertBefore->getParent();
+  Function *F = OrigBB->getParent();
+  const DataLayout &DL = F->getDataLayout();
+
+  BasicBlock *NewBB = OrigBB->splitBasicBlock(InsertBefore, "split");
+  BasicBlock *LoopBB =
+      BasicBlock::Create(F->getContext(), "storeloop", F, NewBB);
+  BasicBlock *RemCheckBB =
+      BasicBlock::Create(F->getContext(), "remcheck", F, NewBB);
+  BasicBlock *RemainderLoopBB =
+      BasicBlock::Create(F->getContext(), "remainderloop", F, NewBB);
+  IRBuilder<> Builder(OrigBB->getTerminator());
+
+  ConstantInt *CILoopOpSize =
+      ConstantInt::get(dyn_cast<IntegerType>(TypeOfCopyLen), PatternSize);
+  Value *RuntimeLoopCount =
+      getRuntimeLoopCount(DL, Builder, CopyLen, CILoopOpSize, PatternSize);
+  Value *RuntimeRemainder =
+      getRuntimeLoopRemainder(DL, Builder, CopyLen, CILoopOpSize, PatternSize);
+
+  Builder.CreateCondBr(Builder.CreateICmpEQ(ConstantInt::get(TypeOfCopyLen, 0),
+                                            RuntimeLoopCount),
+                       RemCheckBB, LoopBB);
+  OrigBB->getTerminator()->eraseFromParent();
+
+  IRBuilder<> LoopBuilder(LoopBB);
+  PHINode *CurrentDst = LoopBuilder.CreatePHI(DstAddr->getType(), 0);
+  CurrentDst->addIncoming(DstAddr, OrigBB);
+  PHINode *LoopCount = LoopBuilder.CreatePHI(TypeOfCopyLen, 0);
+  LoopCount->addIncoming(RuntimeLoopCount, OrigBB);
+
+  // Create the store instruction for the pattern
+  LoopBuilder.CreateAlignedStore(SetValue, CurrentDst, DstAlign, IsVolatile);
+
+  Value *NextDst = LoopBuilder.CreateInBoundsGEP(
+      SetValue->getType(), CurrentDst,
+      ConstantInt::get(TypeOfCopyLen, PatternSize));
+  CurrentDst->addIncoming(NextDst, LoopBB);
+
+  Value *NewLoopCount =
+      LoopBuilder.CreateSub(LoopCount, ConstantInt::get(TypeOfCopyLen, 1));
+  LoopCount->addIncoming(NewLoopCount, LoopBB);
+
+  LoopBuilder.CreateCondBr(
+      LoopBuilder.CreateICmpNE(NewLoopCount,
+                               ConstantInt::get(TypeOfCopyLen, 0)),
+      LoopBB, RemCheckBB);
+
+  IRBuilder<> RemCheckBuilder(RemCheckBB, RemCheckBB->begin());
+  // Branch to the end if there are no remainder bytes.
+  PHINode *RemainderDstPHI = RemCheckBuilder.CreatePHI(NextDst->getType(), 0);
+  RemainderDstPHI->addIncoming(DstAddr, OrigBB);
+  RemainderDstPHI->addIncoming(NextDst, LoopBB);
+  RemCheckBuilder.CreateCondBr(
+      RemCheckBuilder.CreateICmpEQ(RuntimeRemainder,
+                                   ConstantInt::get(TypeOfCopyLen, 0)),
+      NewBB, RemainderLoopBB);
+
+  // Remainder loop
+  IRBuilder<> RemainderLoopBuilder(RemainderLoopBB);
+  PHINode *ByteIndex = RemainderLoopBuilder.CreatePHI(TypeOfCopyLen, 0);
+  ByteIndex->addIncoming(ConstantInt::get(TypeOfCopyLen, 0), RemCheckBB);
+  Type *TypeOfSetValue = SetValue->getType();
+  PHINode *ShiftedValue = RemainderLoopBuilder.CreatePHI(TypeOfSetValue, 0);
+  ShiftedValue->addIncoming(SetValue, RemCheckBB);
+
+  // Get the byte to store
+  Value *ByteToStore = RemainderLoopBuilder.CreateTrunc(
+      ShiftedValue, RemainderLoopBuilder.getInt8Ty());
+
+  // Store the byte
+  RemainderLoopBuilder.CreateStore(
+      ByteToStore,
+      RemainderLoopBuilder.CreateInBoundsGEP(RemainderLoopBuilder.getInt8Ty(),
+                                             RemainderDstPHI, ByteIndex),
+      IsVolatile);
+
+  Value *NewByteIndex = RemainderLoopBuilder.CreateAdd(
+      ByteIndex, ConstantInt::get(TypeOfCopyLen, 1));
+  ByteIndex->addIncoming(NewByteIndex, RemainderLoopBB);
+  Value *NewShiftedValue = RemainderLoopBuilder.CreateLShr(
+      ShiftedValue, ConstantInt::get(TypeOfSetValue, 8));
+  ShiftedValue->addIncoming(NewShiftedValue, RemainderLoopBB);
+
+  RemainderLoopBuilder.CreateCondBr(
+      RemainderLoopBuilder.CreateICmpULT(NewByteIndex, RuntimeRemainder),
+      RemainderLoopBB, NewBB);
+}
+
 static void createMemSetLoop(Instruction *InsertBefore, Value *DstAddr,
                              Value *CopyLen, Value *SetValue, Align DstAlign,
                              bool IsVolatile) {
@@ -590,6 +690,16 @@ bool llvm::expandMemMoveAsLoop(MemMoveInst *Memmove,
 }
 
 void llvm::expandMemSetAsLoop(MemSetInst *Memset) {
+  if (isa<MemSetPatternInst>(Memset)) {
+    return createMemSetPatternLoop(
+        /* InsertBefore */ Memset,
+        /* DstAddr */ Memset->getRawDest(),
+        /* CopyLen */ Memset->getLength(),
+        /* SetValue */ Memset->getValue(),
+        /* Alignment */ Memset->getDestAlign().valueOrOne(),
+        Memset->isVolatile());
+  }
+
   createMemSetLoop(/* InsertBefore */ Memset,
                    /* DstAddr */ Memset->getRawDest(),
                    /* CopyLen */ Memset->getLength(),
diff --git a/llvm/test/CodeGen/RISCV/memset-pattern.ll b/llvm/test/CodeGen/RISCV/memset-pattern.ll
new file mode 100644
index 0000000000000..ea50ae0b56e40
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/memset-pattern.ll
@@ -0,0 +1,591 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s -mtriple=riscv32 -mattr=+m \
+; RUN:   | FileCheck %s --check-prefixes=RV32-BOTH,RV32
+; RUN: llc < %s -mtriple=riscv64 -mattr=+m \
+; RUN:   | FileCheck %s --check-prefixes=RV64-BOTH,RV64
+; RUN: llc < %s -mtriple=riscv32 -mattr=+m,+unaligned-scalar-mem \
+; RUN:   | FileCheck %s --check-prefixes=RV32-BOTH,RV32-FAST
+; RUN: llc < %s -mtriple=riscv64 -mattr=+m,+unaligned-scalar-mem \
+; RUN:   | FileCheck %s --check-prefixes=RV64-BOTH,RV64-FAST
+
+define void @memset_1(ptr %a, i128 %value) nounwind {
+; RV32-BOTH-LABEL: memset_1:
+; RV32-BOTH:       # %bb.0:
+; RV32-BOTH-NEXT:    lw a1, 0(a1)
+; RV32-BOTH-NEXT:    sb a1, 0(a0)
+; RV32-BOTH-NEXT:    ret
+;
+; RV64-BOTH-LABEL: memset_1:
+; RV64-BOTH:       # %bb.0:
+; RV64-BOTH-NEXT:    sb a1, 0(a0)
+; RV64-BOTH-NEXT:    ret
+  tail call void @llvm.memset_pattern.p0.i64.i128(ptr %a, i128 %value, i64 1, i1 0)
+  ret void
+}
+
+define void @memset_2(ptr %a, i128 %value) nounwind {
+; RV32-LABEL: memset_2:
+; RV32:       # %bb.0:
+; RV32-NEXT:    lw a1, 0(a1)
+; RV32-NEXT:    sb a1, 0(a0)
+; RV32-NEXT:    srli a1, a1, 8
+; RV32-NEXT:    sb a1, 1(a0)
+; RV32-NEXT:    ret
+;
+; RV64-LABEL: memset_2:
+; RV64:       # %bb.0:
+; RV64-NEXT:    sb a1, 0(a0)
+; RV64-NEXT:    srli a1, a1, 8
+; RV64-NEXT:    sb a1, 1(a0)
+; RV64-NEXT:    ret
+;
+; RV32-FAST-LABEL: memset_2:
+; RV32-FAST:       # %bb.0:
+; RV32-FAST-NEXT:    lw a1, 0(a1)
+; RV32-FAST-NEXT:    sh a1, 0(a0)
+; RV32-FAST-NEXT:    ret
+;
+; RV64-FAST-LABEL: memset_2:
+; RV64-FAST:       # %bb.0:
+; RV64-FAST-NEXT:    sh a1, 0(a0)
+; RV64-FAST-NEXT:    ret
+  tail call void @llvm.memset_pattern.p0.i64.i128(ptr %a, i128 %value, i64 2, i1 0)
+  ret void
+}
+
+define void @memset_3(ptr %a, i128 %value) nounwind {
+; RV32-LABEL: memset_3:
+; RV32:       # %bb.0:
+; RV32-NEXT:    lw a1, 0(a1)
+; RV32-NEXT:    sb a1, 0(a0)
+; RV32-NEXT:    srli a2, a1, 8
+; RV32-NEXT:    sb a2, 1(a0)
+; RV32-NEXT:    srli a1, a1, 16
+; RV32-NEXT:    sb a1, 2(a0)
+; RV32-NEXT:    ret
+;
+; RV64-LABEL: memset_3:
+; RV64:       # %bb.0:
+; RV64-NEXT:    sb a1, 0(a0)
+; RV64-NEXT:    srli a2, a1, 8
+; RV64-NEXT:    sb a2, 1(a0)
+; RV64-NEXT:    srli a1, a1, 16
+; RV64-NEXT:    sb a1, 2(a0)
+; RV64-NEXT:    ret
+;
+; RV32-FAST-LABEL: memset_3:
+; RV32-FAST:       # %bb.0:
+; RV32-FAST-NEXT:    lw a1, 0(a1)
+; RV32-FAST-NEXT:    sh a1, 0(a0)
+; RV32-FAST-NEXT:    srli a1, a1, 16
+; RV32-FAST-NEXT:    sb a1, 2(a0)
+; RV32-FAST-NEXT:    ret
+;
+; RV64-FAST-LABEL: memset_3:
+; RV64-FAST:       # %bb.0:
+; RV64-FAST-NEXT:    sh a1, 0(a0)
+; RV64-FAST-NEXT:    srli a1, a1, 16
+; RV64-FAST-NEXT:    sb a1, 2(a0)
+; RV64-FAST-NEXT:    ret
+  tail call void @llvm.memset_pattern.p0.i64.i128(ptr %a, i128 %value, i64 3, i1 0)
+  ret void
+}
+
+define void @memset_4(ptr %a, i128 %value) nounwind {
+; RV32-LABEL: memset_4:
+; RV32:       # %bb.0:
+; RV32-NEXT:    lw a1, 0(a1)
+; RV32-NEXT:    sb a1, 0(a0)
+; RV32-NEXT:    srli a2, a1, 24
+; RV32-NEXT:    sb a2, 3(a0)
+; RV32-NEXT:    srli a2, a1, 16
+; RV32-NEXT:    sb a2, 2(a0)
+; RV32-NEXT:    srli a1, a1, 8
+; RV32-NEXT:    sb a1, 1(a0)
+; RV32-NEXT:    ret
+;
+; RV64-LABEL: memset_4:
+; RV64:       # %bb.0:
+; RV64-NEXT:    sb a1, 0(a0)
+; RV64-NEXT:    srli a2, a1, 24
+; RV64-NEXT:    sb a2, 3(a0)
+; RV64-NEXT:    srli a2, a1, 16
+; RV64-NEXT:    sb a2, 2(a0)
+; RV64-NEXT:    srli a1, a1, 8
+; RV64-NEXT:    sb a1, 1(a0)
+; RV64-NEXT:    ret
+;
+; RV32-FAST-LABEL: memset_4:
+; RV32-FAST:       # %bb.0:
+; RV32-FAST-NEXT:    lw a1, 0(a1)
+; RV32-FAST-NEXT:    sw a1, 0(a0)
+; RV32-FAST-NEXT:    ret
+;
+; RV64-FAST-LABEL: memset_4:
+; RV64-FAST:       # %bb.0:
+; RV64-FAST-NEXT:    sw a1, 0(a0)
+; RV64-FAST-NEXT:    ret
+  tail call void @llvm.memset_pattern.p0.i64.i128(ptr %a, i128 %value, i64 4, i1 0)
+  ret void
+}
+
+define void @memset_5(ptr %a, i128 %value) nounwind {
+; RV32-LABEL: memset_5:
+; RV32:       # %bb.0:
+; RV32-NEXT:    lw a2, 0(a1)
+; RV32-NEXT:    lw a1, 4(a1)
+; RV32-NEXT:    sb a2, 0(a0)
+; RV32-NEXT:    sb a1, 4(a0)
+; RV32-NEXT:    srli a1, a2, 24
+; RV32-NEXT:    sb a1, 3(a0)
+; RV32-NEXT:    srli a1, a2, 16
+; RV32-NEXT:    sb a1, 2(a0)
+; RV32-NEXT:    srli a2, a2, 8
+; RV32-NEXT:    sb a2, 1(a0)
+; RV32-NEXT:    ret
+;
+; RV64-LABEL: memset_5:
+; RV64:       # %bb.0:
+; RV64-NEXT:    sb a1, 0(a0)
+; RV64-NEXT:    srli a2, a1, 24
+; RV64-NEXT:    sb a2, 3(a0)
+; RV64-NEXT:    srli a2, a1, 16
+; RV64-NEXT:    sb a2, 2(a0)
+; RV64-NEXT:    srli a2, a1, 8
+; RV64-NEXT:    sb a2, 1(a0)
+; RV64-NEXT:    srli a1, a1, 32
+; RV64-NEXT:    sb a1, 4(a0)
+; RV64-NEXT:    ret
+;
+; RV32-FAST-LABEL: memset_5:
+; RV32-FAST:       # %bb.0:
+; RV32-FAST-NEXT:    lw a2, 4(a1)
+; RV32-FAST-NEXT:    lw a1, 0(a1)
+; RV32-FAST-NEXT:    sb a2, 4(a0)
+; RV32-FAST-NEXT:    sw a1, 0(a0)
+; RV32-FAST-NEXT:    ret
+;
+; RV64-FAST-LABEL: memset_5:
+; RV64-FAST:       # %bb.0:
+; RV64-FAST-NEXT:    sw a1, 0(a0)
+; RV64-FAST-NEXT:    srli a1, a1, 32
+; RV64-FAST-NEXT:    sb a1, 4(a0)
+; RV64-FAST-NEXT:    ret
+  tail call void @llvm.memset_pattern.p0.i64.i128(ptr %a, i128 %value, i64 5, i1 0)
+  ret void
+}
+
+define void @memset_6(ptr %a, i128 %value) nounwind {
+; RV32-LABEL: memset_6:
+; RV32:       # %bb.0:
+; RV32-NEXT:    lw a2, 4(a1)
+; RV32-NEXT:    lw a1, 0(a1)
+; RV32-NEXT:    sb a2, 4(a0)
+; RV32-NEXT:    sb a1, 0(a0)
+; RV32-NEXT:    srli a2, a2, 8
+; RV32-NEXT:    sb a2, 5(a0)
+; RV32-NEXT:    srli a2, a1, 24
+; RV32-NEXT:    sb a2, 3(a0)
+; RV32-NEXT:    srli a2, a1, 16
+; RV32-NEXT:    sb a2, 2(a0)
+; RV32-NEXT:    srli a1, a1, 8
+; RV32-NEXT:    sb a1, 1(a0)
+; RV32-NEXT:    ret
+;
+; RV64-LABEL: memset_6:
+; RV64:       # %bb.0:
+; RV64-NEXT:    sb a1, 0(a0)
+; RV64-NEXT:    srli a2, a1, 40
+; RV64-NEXT:    sb a2, 5(a0)
+; RV64-NEXT:    srli a2, a1, 32
+; RV64-NEXT:    sb a2, 4(a0)
+; RV64-NEXT:    srli a2, a1, 24
+; RV64-NEXT:    sb a2, 3(a0)
+; RV64-NEXT:    srli a2, a1, 16
+; RV64-NEXT:    sb a2, 2(...
[truncated]

@asb asb marked this pull request as draft July 3, 2024 14:33
@topperc
Copy link
Collaborator

topperc commented Jul 3, 2024

Does this need to care about big endian vs little endian? I notice LoopIdiomRecognize.cpp only handles little endian targets.

asb added a commit to asb/llvm-project that referenced this pull request Jul 10, 2024
…disabled by default)

When not disabled, produce the memset_pattern intrinsic that was
introduced in llvm#97583.
@asb asb force-pushed the 2024q2-memset-pattern-inline-preiselintrinsiclowering branch from 6560a5c to 627f1ef Compare July 10, 2024 12:37
@asb asb marked this pull request as ready for review July 10, 2024 12:38
@asb
Copy link
Contributor Author

asb commented Jul 10, 2024

Does this need to care about big endian vs little endian? I notice LoopIdiomRecognize.cpp only handles little endian targets.

You're right, I've adjusted the current implementation to bail out if targeting a big endian system.

With additional tests added and some cleanup now done, I think this is ready for review as is.

@asb asb requested review from arsenm, nikic and caojoshua July 17, 2024 11:37
@asb
Copy link
Contributor Author

asb commented Jul 17, 2024

Ping - review comments greatly appreciated.

@asb
Copy link
Contributor Author

asb commented Jul 24, 2024

Loked at this again now the 19.x branch has taken place, and all comments are now addressed I believe.

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something I'm still not completely sold on is that the size is specified in bytes. I think the only advantage of doing that is that it directly maps onto the apple libcalls. On the flip side, it has the disadvantage that we may end up generating an unnecessary tail loop.

However, the larger conceptual problem I see is that it may prevent future generalization (though I kind of wonder whether that should not be supported right away) to non-integer types. memset.pattern is a good opportunity to perform stores with the correct types. E.g. consider filling memory with non-integral pointers -- in that case you can't just perform the memset after a ptrtoint. And a store of only part of a non-integral pointer would be pretty ill-defined. To a lesser degree this problem also exists for integral pointers, due to provenance considerations.

I'd rather not end up having to need another separate intrinsic to satisfy that case.

asb added 3 commits July 31, 2024 10:30
Also simplify/remove tests that are no longer relevant given we don't
need to worry about writes of a partial pattern.
@asb
Copy link
Contributor Author

asb commented Jul 31, 2024

Something I'm still not completely sold on is that the size is specified in bytes. I think the only advantage of doing that is that it directly maps onto the apple libcalls. On the flip side, it has the disadvantage that we may end up generating an unnecessary tail loop.

However, the larger conceptual problem I see is that it may prevent future generalization (though I kind of wonder whether that should not be supported right away) to non-integer types. memset.pattern is a good opportunity to perform stores with the correct types. E.g. consider filling memory with non-integral pointers -- in that case you can't just perform the memset after a ptrtoint. And a store of only part of a non-integral pointer would be pretty ill-defined. To a lesser degree this problem also exists for integral pointers, due to provenance considerations.

I'd rather not end up having to need another separate intrinsic to satisfy that case.

I think this is a good call - thank you. I've gone ahead and made that change.

One awkwardness I want to flag for review is that MemSetPatternInst still inherits from MemSetInst which inherits from MemSetBase. The awkwardness is that getLength returns the length in bytes for the other mem* intrinsics, but it's the number of pattern sets for memset.pattern. I think in practice this is ok but would appreciate a second opinion.

@@ -1003,6 +1003,14 @@ def int_memset_inline
NoCapture<ArgIndex<0>>, WriteOnly<ArgIndex<0>>,
ImmArg<ArgIndex<3>>]>;

// Memset variant that writes a given pattern.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment what the operands are

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added with inline comments (there's not a totally consistent pattern for describing args - many intrinsics have no description at all, but I see some other examples in the file using inline comments for the args)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really we ought to have tablegen mandatory doc strings

def int_memset_pattern
: Intrinsic<[],
[llvm_anyptr_ty, llvm_anyint_ty, llvm_anyint_ty, llvm_i1_ty],
[IntrWriteMem, IntrArgMemOnly, IntrWillReturn, IntrNoFree, IntrNoCallback,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DefaultAttrIntrinsic would hide most of these

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dug through this and unfortunately I can't make use of DefaultAttrIntrinsic because nosync isn't necessarily true. https://reviews.llvm.org/D86021 switched over memset to using DefaultAttrIntrinsic but this was later backed out in a888e49 due to nosync not applying unconditionally.

Copy link
Collaborator

@preames preames left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


This is an overloaded intrinsic. You can use
``llvm.experimental.memset.pattern`` on any integer bit width that is an
integral number of bytes and for different address spaces. Not all targets
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a Verifier check to check that the size is integral number of bytes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've now added this check to the verifier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually that was my bad. There was some discussion about this before and in the end non-integral bitwidths should work and are tested in llvm/test/Transforms/PreISelIntrinsicLowering/RISCV/memset-pattern.ll. I've now pushed a fix to LangRef to remove the stated restriction.

@asb
Copy link
Contributor Author

asb commented Nov 11, 2024

I think I'm ok to merge with Philip's LGTM and I believe all outstanding comments have been addressed, but I'll give this another day or so as it's been open so long. Please speak up if there's anything you feel wasn't fully reflected in the current code.

the target: i.e. on little endian targets, the least significant byte of the
pattern is first in memory, while the most significant byte is first in memory
for big endian targets. The memory address is incremented by the allocation
size of the type.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the size is not an integral number of bytes, what happens with the padding bits?
Do they get unspecified / undefined values or do they preserve their values?
Should this be clarified in LangRef?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The behavior is the same as whatever stores of the type would do.

Possibly we should just explicit refer to store semantics here and reduce the description to something like this:

The 'llvm.experimental.memset.pattern*' intrinsic stores the provided value <count>> times, incrementing by the allocation size of the type each time. The stores follows the usual semantics of store instructions, including regarding endianness and padding.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that's much clearer - I've adopted that wording.

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

the target: i.e. on little endian targets, the least significant byte of the
pattern is first in memory, while the most significant byte is first in memory
for big endian targets. The memory address is incremented by the allocation
size of the type.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The behavior is the same as whatever stores of the type would do.

Possibly we should just explicit refer to store semantics here and reduce the description to something like this:

The 'llvm.experimental.memset.pattern*' intrinsic stores the provided value <count>> times, incrementing by the allocation size of the type each time. The stores follows the usual semantics of store instructions, including regarding endianness and padding.

static bool classof(const Value *V) {
return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));
}
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need both MemSetPatternIntrinsic and MemSetPatternInst?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm mirroring the pattern followed by other mem intrinsics, and although it's not super pretty, having both classes like this (as for the standard MemSet intrinsics) seems the way that reduces copy and paste of accessor code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm okay, I thought this might only be needed for cases where we have both atomic and non-atomic variants.

@asb asb merged commit 7ff3a9a into llvm:main Nov 15, 2024
6 of 9 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Nov 15, 2024

LLVM Buildbot has detected a new failure on builder ml-opt-rel-x86-64 running on ml-opt-rel-x86-64-b1 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/185/builds/8572

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/RISCV/memset-pattern.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /b/ml-opt-rel-x86-64-b1/build/bin/llc < /b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/RISCV/memset-pattern.ll -mtriple=riscv32 -mattr=+m    | /b/ml-opt-rel-x86-64-b1/build/bin/FileCheck /b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/RISCV/memset-pattern.ll --check-prefixes=RV32-BOTH,RV32
+ /b/ml-opt-rel-x86-64-b1/build/bin/llc -mtriple=riscv32 -mattr=+m
+ /b/ml-opt-rel-x86-64-b1/build/bin/FileCheck /b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/RISCV/memset-pattern.ll --check-prefixes=RV32-BOTH,RV32
/b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/RISCV/memset-pattern.ll:17:19: error: RV32-BOTH-NEXT: expected string not found in input
; RV32-BOTH-NEXT: lw a2, 0(a1)
                  ^
<stdin>:9:36: note: scanning from here
# %bb.0: # %loadstoreloop.preheader
                                   ^
<stdin>:11:2: note: possible intended match here
 lw a3, 0(a1)
 ^
/b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/RISCV/memset-pattern.ll:85:14: error: RV32-NEXT: is not on the line after the previous match
; RV32-NEXT: add s4, a0, s4
             ^
<stdin>:68:2: note: 'next' match was here
 add s4, a0, s4
 ^
<stdin>:66:16: note: previous match ended here
 slli s4, a2, 4
               ^
<stdin>:67:1: note: non-matching line after previous match is here
 addi a2, a2, 1
^
/b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/RISCV/memset-pattern.ll:207:19: error: RV32-BOTH-NEXT: expected string not found in input
; RV32-BOTH-NEXT: lw a2, 0(a1)
                  ^
<stdin>:105:36: note: scanning from here
# %bb.0: # %loadstoreloop.preheader
                                   ^
<stdin>:107:2: note: possible intended match here
 lw a3, 0(a1)
 ^
/b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/RISCV/memset-pattern.ll:252:19: error: RV32-BOTH-NEXT: expected string not found in input
; RV32-BOTH-NEXT: lw a4, 0(a1)
                  ^
<stdin>:139:36: note: scanning from here
# %bb.1: # %loadstoreloop.preheader
                                   ^
<stdin>:141:2: note: possible intended match here
 lw a5, 0(a1)
 ^

Input file: <stdin>
Check file: /b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/RISCV/memset-pattern.ll
...

@llvm-ci
Copy link
Collaborator

llvm-ci commented Nov 15, 2024

LLVM Buildbot has detected a new failure on builder ml-opt-devrel-x86-64 running on ml-opt-devrel-x86-64-b1 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/175/builds/8577

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/RISCV/memset-pattern.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /b/ml-opt-devrel-x86-64-b1/build/bin/llc < /b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/CodeGen/RISCV/memset-pattern.ll -mtriple=riscv32 -mattr=+m    | /b/ml-opt-devrel-x86-64-b1/build/bin/FileCheck /b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/CodeGen/RISCV/memset-pattern.ll --check-prefixes=RV32-BOTH,RV32
+ /b/ml-opt-devrel-x86-64-b1/build/bin/FileCheck /b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/CodeGen/RISCV/memset-pattern.ll --check-prefixes=RV32-BOTH,RV32
+ /b/ml-opt-devrel-x86-64-b1/build/bin/llc -mtriple=riscv32 -mattr=+m
/b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/CodeGen/RISCV/memset-pattern.ll:17:19: error: RV32-BOTH-NEXT: expected string not found in input
; RV32-BOTH-NEXT: lw a2, 0(a1)
                  ^
<stdin>:9:36: note: scanning from here
# %bb.0: # %loadstoreloop.preheader
                                   ^
<stdin>:11:2: note: possible intended match here
 lw a3, 0(a1)
 ^
/b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/CodeGen/RISCV/memset-pattern.ll:85:14: error: RV32-NEXT: is not on the line after the previous match
; RV32-NEXT: add s4, a0, s4
             ^
<stdin>:68:2: note: 'next' match was here
 add s4, a0, s4
 ^
<stdin>:66:16: note: previous match ended here
 slli s4, a2, 4
               ^
<stdin>:67:1: note: non-matching line after previous match is here
 addi a2, a2, 1
^
/b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/CodeGen/RISCV/memset-pattern.ll:207:19: error: RV32-BOTH-NEXT: expected string not found in input
; RV32-BOTH-NEXT: lw a2, 0(a1)
                  ^
<stdin>:105:36: note: scanning from here
# %bb.0: # %loadstoreloop.preheader
                                   ^
<stdin>:107:2: note: possible intended match here
 lw a3, 0(a1)
 ^
/b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/CodeGen/RISCV/memset-pattern.ll:252:19: error: RV32-BOTH-NEXT: expected string not found in input
; RV32-BOTH-NEXT: lw a4, 0(a1)
                  ^
<stdin>:139:36: note: scanning from here
# %bb.1: # %loadstoreloop.preheader
                                   ^
<stdin>:141:2: note: possible intended match here
 lw a5, 0(a1)
 ^

Input file: <stdin>
Check file: /b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/CodeGen/RISCV/memset-pattern.ll
...

@llvm-ci
Copy link
Collaborator

llvm-ci commented Nov 15, 2024

LLVM Buildbot has detected a new failure on builder ml-opt-dev-x86-64 running on ml-opt-dev-x86-64-b1 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/137/builds/8684

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/RISCV/memset-pattern.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /b/ml-opt-dev-x86-64-b1/build/bin/llc < /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/CodeGen/RISCV/memset-pattern.ll -mtriple=riscv32 -mattr=+m    | /b/ml-opt-dev-x86-64-b1/build/bin/FileCheck /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/CodeGen/RISCV/memset-pattern.ll --check-prefixes=RV32-BOTH,RV32
+ /b/ml-opt-dev-x86-64-b1/build/bin/llc -mtriple=riscv32 -mattr=+m
+ /b/ml-opt-dev-x86-64-b1/build/bin/FileCheck /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/CodeGen/RISCV/memset-pattern.ll --check-prefixes=RV32-BOTH,RV32
/b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/CodeGen/RISCV/memset-pattern.ll:17:19: error: RV32-BOTH-NEXT: expected string not found in input
; RV32-BOTH-NEXT: lw a2, 0(a1)
                  ^
<stdin>:9:36: note: scanning from here
# %bb.0: # %loadstoreloop.preheader
                                   ^
<stdin>:11:2: note: possible intended match here
 lw a3, 0(a1)
 ^
/b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/CodeGen/RISCV/memset-pattern.ll:85:14: error: RV32-NEXT: is not on the line after the previous match
; RV32-NEXT: add s4, a0, s4
             ^
<stdin>:68:2: note: 'next' match was here
 add s4, a0, s4
 ^
<stdin>:66:16: note: previous match ended here
 slli s4, a2, 4
               ^
<stdin>:67:1: note: non-matching line after previous match is here
 addi a2, a2, 1
^
/b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/CodeGen/RISCV/memset-pattern.ll:207:19: error: RV32-BOTH-NEXT: expected string not found in input
; RV32-BOTH-NEXT: lw a2, 0(a1)
                  ^
<stdin>:105:36: note: scanning from here
# %bb.0: # %loadstoreloop.preheader
                                   ^
<stdin>:107:2: note: possible intended match here
 lw a3, 0(a1)
 ^
/b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/CodeGen/RISCV/memset-pattern.ll:252:19: error: RV32-BOTH-NEXT: expected string not found in input
; RV32-BOTH-NEXT: lw a4, 0(a1)
                  ^
<stdin>:139:36: note: scanning from here
# %bb.1: # %loadstoreloop.preheader
                                   ^
<stdin>:141:2: note: possible intended match here
 lw a5, 0(a1)
 ^

Input file: <stdin>
Check file: /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/CodeGen/RISCV/memset-pattern.ll
...

asb added a commit that referenced this pull request Nov 15, 2024
…#97583)"

This reverts commit 7ff3a9a.

Recent scheduling changes means tests need to be re-generated. Reverting
to green while I do that.
asb added a commit that referenced this pull request Nov 15, 2024
…#97583)

Relands 7ff3a9a after regenerating the
test case.

Supersedes the draft PR #94992, taking a different approach following
feedback:
* Lower in PreISelIntrinsicLowering
* Don't require that the number of bytes to set is a compile-time
constant
* Define llvm.memset_pattern rather than llvm.memset_pattern.inline

As discussed in the [RFC
thread](https://discourse.llvm.org/t/rfc-introducing-an-llvm-memset-pattern-inline-intrinsic/79496),
the intent is that the intrinsic will be lowered to loops, a sequence of
stores, or libcalls depending on the expected cost and availability of
libcalls on the target. Right now, there's just a single lowering path
that aims to handle all cases. My intent would be to follow up with
additional PRs that add additional optimisations when possible (e.g.
when libcalls are available, when arguments are known to be constant
etc).
@llvm-ci
Copy link
Collaborator

llvm-ci commented Nov 15, 2024

LLVM Buildbot has detected a new failure on builder premerge-monolithic-linux running on premerge-linux-1 while building llvm at step 7 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/153/builds/14776

Here is the relevant piece of the build log for the reference
Step 7 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/RISCV/memset-pattern.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /build/buildbot/premerge-monolithic-linux/build/bin/llc < /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/RISCV/memset-pattern.ll -mtriple=riscv32 -mattr=+m    | /build/buildbot/premerge-monolithic-linux/build/bin/FileCheck /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/RISCV/memset-pattern.ll --check-prefixes=RV32-BOTH,RV32
+ /build/buildbot/premerge-monolithic-linux/build/bin/llc -mtriple=riscv32 -mattr=+m
+ /build/buildbot/premerge-monolithic-linux/build/bin/FileCheck /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/RISCV/memset-pattern.ll --check-prefixes=RV32-BOTH,RV32
/build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/RISCV/memset-pattern.ll:17:19: error: RV32-BOTH-NEXT: expected string not found in input
; RV32-BOTH-NEXT: lw a2, 0(a1)
                  ^
<stdin>:9:36: note: scanning from here
# %bb.0: # %loadstoreloop.preheader
                                   ^
<stdin>:11:2: note: possible intended match here
 lw a3, 0(a1)
 ^
/build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/RISCV/memset-pattern.ll:85:14: error: RV32-NEXT: is not on the line after the previous match
; RV32-NEXT: add s4, a0, s4
             ^
<stdin>:68:2: note: 'next' match was here
 add s4, a0, s4
 ^
<stdin>:66:16: note: previous match ended here
 slli s4, a2, 4
               ^
<stdin>:67:1: note: non-matching line after previous match is here
 addi a2, a2, 1
^
/build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/RISCV/memset-pattern.ll:207:19: error: RV32-BOTH-NEXT: expected string not found in input
; RV32-BOTH-NEXT: lw a2, 0(a1)
                  ^
<stdin>:105:36: note: scanning from here
# %bb.0: # %loadstoreloop.preheader
                                   ^
<stdin>:107:2: note: possible intended match here
 lw a3, 0(a1)
 ^
/build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/RISCV/memset-pattern.ll:252:19: error: RV32-BOTH-NEXT: expected string not found in input
; RV32-BOTH-NEXT: lw a4, 0(a1)
                  ^
<stdin>:139:36: note: scanning from here
# %bb.1: # %loadstoreloop.preheader
                                   ^
<stdin>:141:2: note: possible intended match here
 lw a5, 0(a1)
 ^

Input file: <stdin>
Check file: /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/RISCV/memset-pattern.ll
...

@llvm-ci
Copy link
Collaborator

llvm-ci commented Nov 15, 2024

LLVM Buildbot has detected a new failure on builder lld-x86_64-ubuntu-fast running on as-builder-4 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/33/builds/6533

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/RISCV/memset-pattern.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/llc < /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/CodeGen/RISCV/memset-pattern.ll -mtriple=riscv32 -mattr=+m    | /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/FileCheck /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/CodeGen/RISCV/memset-pattern.ll --check-prefixes=RV32-BOTH,RV32
+ /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/llc -mtriple=riscv32 -mattr=+m
+ /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/FileCheck /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/CodeGen/RISCV/memset-pattern.ll --check-prefixes=RV32-BOTH,RV32
/home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/CodeGen/RISCV/memset-pattern.ll:17:19: error: RV32-BOTH-NEXT: expected string not found in input
; RV32-BOTH-NEXT: lw a2, 0(a1)
                  ^
<stdin>:9:36: note: scanning from here
# %bb.0: # %loadstoreloop.preheader
                                   ^
<stdin>:11:2: note: possible intended match here
 lw a3, 0(a1)
 ^
/home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/CodeGen/RISCV/memset-pattern.ll:85:14: error: RV32-NEXT: is not on the line after the previous match
; RV32-NEXT: add s4, a0, s4
             ^
<stdin>:68:2: note: 'next' match was here
 add s4, a0, s4
 ^
<stdin>:66:16: note: previous match ended here
 slli s4, a2, 4
               ^
<stdin>:67:1: note: non-matching line after previous match is here
 addi a2, a2, 1
^
/home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/CodeGen/RISCV/memset-pattern.ll:207:19: error: RV32-BOTH-NEXT: expected string not found in input
; RV32-BOTH-NEXT: lw a2, 0(a1)
                  ^
<stdin>:105:36: note: scanning from here
# %bb.0: # %loadstoreloop.preheader
                                   ^
<stdin>:107:2: note: possible intended match here
 lw a3, 0(a1)
 ^
/home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/CodeGen/RISCV/memset-pattern.ll:252:19: error: RV32-BOTH-NEXT: expected string not found in input
; RV32-BOTH-NEXT: lw a4, 0(a1)
                  ^
<stdin>:139:36: note: scanning from here
# %bb.1: # %loadstoreloop.preheader
                                   ^
<stdin>:141:2: note: possible intended match here
 lw a5, 0(a1)
 ^

Input file: <stdin>
Check file: /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/CodeGen/RISCV/memset-pattern.ll
...

@llvm-ci
Copy link
Collaborator

llvm-ci commented Nov 15, 2024

LLVM Buildbot has detected a new failure on builder llvm-clang-x86_64-expensive-checks-debian running on gribozavr4 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/16/builds/8935

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/RISCV/memset-pattern.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/llc < /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/RISCV/memset-pattern.ll -mtriple=riscv32 -mattr=+m    | /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/RISCV/memset-pattern.ll --check-prefixes=RV32-BOTH,RV32
+ /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/RISCV/memset-pattern.ll --check-prefixes=RV32-BOTH,RV32
+ /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/llc -mtriple=riscv32 -mattr=+m
/b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/RISCV/memset-pattern.ll:17:19: error: RV32-BOTH-NEXT: expected string not found in input
; RV32-BOTH-NEXT: lw a2, 0(a1)
                  ^
<stdin>:9:36: note: scanning from here
# %bb.0: # %loadstoreloop.preheader
                                   ^
<stdin>:11:2: note: possible intended match here
 lw a3, 0(a1)
 ^
/b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/RISCV/memset-pattern.ll:85:14: error: RV32-NEXT: is not on the line after the previous match
; RV32-NEXT: add s4, a0, s4
             ^
<stdin>:68:2: note: 'next' match was here
 add s4, a0, s4
 ^
<stdin>:66:16: note: previous match ended here
 slli s4, a2, 4
               ^
<stdin>:67:1: note: non-matching line after previous match is here
 addi a2, a2, 1
^
/b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/RISCV/memset-pattern.ll:207:19: error: RV32-BOTH-NEXT: expected string not found in input
; RV32-BOTH-NEXT: lw a2, 0(a1)
                  ^
<stdin>:105:36: note: scanning from here
# %bb.0: # %loadstoreloop.preheader
                                   ^
<stdin>:107:2: note: possible intended match here
 lw a3, 0(a1)
 ^
/b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/RISCV/memset-pattern.ll:252:19: error: RV32-BOTH-NEXT: expected string not found in input
; RV32-BOTH-NEXT: lw a4, 0(a1)
                  ^
<stdin>:139:36: note: scanning from here
# %bb.1: # %loadstoreloop.preheader
                                   ^
<stdin>:141:2: note: possible intended match here
 lw a5, 0(a1)
 ^

Input file: <stdin>
Check file: /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/RISCV/memset-pattern.ll
...

@llvm-ci
Copy link
Collaborator

llvm-ci commented Nov 15, 2024

LLVM Buildbot has detected a new failure on builder clang-x86_64-debian-fast running on gribozavr4 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/56/builds/12314

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/RISCV/memset-pattern.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /b/1/clang-x86_64-debian-fast/llvm.obj/bin/llc < /b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/CodeGen/RISCV/memset-pattern.ll -mtriple=riscv32 -mattr=+m    | /b/1/clang-x86_64-debian-fast/llvm.obj/bin/FileCheck /b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/CodeGen/RISCV/memset-pattern.ll --check-prefixes=RV32-BOTH,RV32
+ /b/1/clang-x86_64-debian-fast/llvm.obj/bin/llc -mtriple=riscv32 -mattr=+m
+ /b/1/clang-x86_64-debian-fast/llvm.obj/bin/FileCheck /b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/CodeGen/RISCV/memset-pattern.ll --check-prefixes=RV32-BOTH,RV32
/b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/CodeGen/RISCV/memset-pattern.ll:17:19: error: RV32-BOTH-NEXT: expected string not found in input
; RV32-BOTH-NEXT: lw a2, 0(a1)
                  ^
<stdin>:9:36: note: scanning from here
# %bb.0: # %loadstoreloop.preheader
                                   ^
<stdin>:11:2: note: possible intended match here
 lw a3, 0(a1)
 ^
/b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/CodeGen/RISCV/memset-pattern.ll:85:14: error: RV32-NEXT: is not on the line after the previous match
; RV32-NEXT: add s4, a0, s4
             ^
<stdin>:68:2: note: 'next' match was here
 add s4, a0, s4
 ^
<stdin>:66:16: note: previous match ended here
 slli s4, a2, 4
               ^
<stdin>:67:1: note: non-matching line after previous match is here
 addi a2, a2, 1
^
/b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/CodeGen/RISCV/memset-pattern.ll:207:19: error: RV32-BOTH-NEXT: expected string not found in input
; RV32-BOTH-NEXT: lw a2, 0(a1)
                  ^
<stdin>:105:36: note: scanning from here
# %bb.0: # %loadstoreloop.preheader
                                   ^
<stdin>:107:2: note: possible intended match here
 lw a3, 0(a1)
 ^
/b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/CodeGen/RISCV/memset-pattern.ll:252:19: error: RV32-BOTH-NEXT: expected string not found in input
; RV32-BOTH-NEXT: lw a4, 0(a1)
                  ^
<stdin>:139:36: note: scanning from here
# %bb.1: # %loadstoreloop.preheader
                                   ^
<stdin>:141:2: note: possible intended match here
 lw a5, 0(a1)
 ^

Input file: <stdin>
Check file: /b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/CodeGen/RISCV/memset-pattern.ll
...

@llvm-ci
Copy link
Collaborator

llvm-ci commented Nov 15, 2024

LLVM Buildbot has detected a new failure on builder llvm-x86_64-debian-dylib running on gribozavr4 while building llvm at step 7 "test-build-unified-tree-check-llvm".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/60/builds/12817

Here is the relevant piece of the build log for the reference
Step 7 (test-build-unified-tree-check-llvm) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/RISCV/memset-pattern.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /b/1/llvm-x86_64-debian-dylib/build/bin/llc < /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/CodeGen/RISCV/memset-pattern.ll -mtriple=riscv32 -mattr=+m    | /b/1/llvm-x86_64-debian-dylib/build/bin/FileCheck /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/CodeGen/RISCV/memset-pattern.ll --check-prefixes=RV32-BOTH,RV32
+ /b/1/llvm-x86_64-debian-dylib/build/bin/FileCheck /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/CodeGen/RISCV/memset-pattern.ll --check-prefixes=RV32-BOTH,RV32
+ /b/1/llvm-x86_64-debian-dylib/build/bin/llc -mtriple=riscv32 -mattr=+m
/b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/CodeGen/RISCV/memset-pattern.ll:17:19: error: RV32-BOTH-NEXT: expected string not found in input
; RV32-BOTH-NEXT: lw a2, 0(a1)
                  ^
<stdin>:9:36: note: scanning from here
# %bb.0: # %loadstoreloop.preheader
                                   ^
<stdin>:11:2: note: possible intended match here
 lw a3, 0(a1)
 ^
/b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/CodeGen/RISCV/memset-pattern.ll:85:14: error: RV32-NEXT: is not on the line after the previous match
; RV32-NEXT: add s4, a0, s4
             ^
<stdin>:68:2: note: 'next' match was here
 add s4, a0, s4
 ^
<stdin>:66:16: note: previous match ended here
 slli s4, a2, 4
               ^
<stdin>:67:1: note: non-matching line after previous match is here
 addi a2, a2, 1
^
/b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/CodeGen/RISCV/memset-pattern.ll:207:19: error: RV32-BOTH-NEXT: expected string not found in input
; RV32-BOTH-NEXT: lw a2, 0(a1)
                  ^
<stdin>:105:36: note: scanning from here
# %bb.0: # %loadstoreloop.preheader
                                   ^
<stdin>:107:2: note: possible intended match here
 lw a3, 0(a1)
 ^
/b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/CodeGen/RISCV/memset-pattern.ll:252:19: error: RV32-BOTH-NEXT: expected string not found in input
; RV32-BOTH-NEXT: lw a4, 0(a1)
                  ^
<stdin>:139:36: note: scanning from here
# %bb.1: # %loadstoreloop.preheader
                                   ^
<stdin>:141:2: note: possible intended match here
 lw a5, 0(a1)
 ^

Input file: <stdin>
Check file: /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/CodeGen/RISCV/memset-pattern.ll
...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants