Skip to content

[mlir][spirv] Add GpuToLLVM cconv suited to Vulkan, migrate last tests #123384

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions mlir/include/mlir/Conversion/GPUCommon/GPUCommonPass.h
Original file line number Diff line number Diff line change
Expand Up @@ -62,10 +62,10 @@ struct FunctionCallBuilder {

/// Collect a set of patterns to convert from the GPU dialect to LLVM and
/// populate converter for gpu types.
void populateGpuToLLVMConversionPatterns(LLVMTypeConverter &converter,
RewritePatternSet &patterns,
bool kernelBarePtrCallConv = false,
bool typeCheckKernelArgs = false);
void populateGpuToLLVMConversionPatterns(
LLVMTypeConverter &converter, RewritePatternSet &patterns,
bool kernelBarePtrCallConv = false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe better to make it enum instead of 2 bools?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might not be a bad idea, but I am really unsure if it's possible for an option to be an enum, or what I have to do if I want that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Specifically the Option defined in Passes.td. I know that it's somehow leveraging the cl infrastructure so enums "should" be possible but I haven't managed to figure it out from staring at the headers and tablegen definitions, there's too many layers.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bool kernelIntersperseSizeCallConv = false);

/// A function that maps a MemorySpace enum to a target-specific integer value.
using MemorySpaceMapping = std::function<unsigned(gpu::AddressSpace)>;
Expand Down
10 changes: 6 additions & 4 deletions mlir/include/mlir/Conversion/Passes.td
Original file line number Diff line number Diff line change
Expand Up @@ -518,11 +518,13 @@ def GpuToLLVMConversionPass : Pass<"gpu-to-llvm", "ModuleOp"> {
"Use bare pointers to pass memref arguments to kernels. "
"The kernel must use the same setting for this option."
>,
Option<"typeCheckKernelArgs", "type-check-kernel-args", "bool",
Option<"kernelIntersperseSizeCallConv", "intersperse-sizes-for-kernels", "bool",
/*default=*/"false",
"Require all kernel arguments to be memrefs of rank 1 and with a "
"32-bit element size. This is a temporary option that will be "
"removed; TODO(https://github.com/llvm/llvm-project/issues/73457)."
"Inserts a size_t argument following each memref argument, "
"containing the static size in bytes of the buffer. Incompatible "
"arguments are rejected. This is intended for use by the Vulkan "
"runtime with the kernel bare pointer calling convention, to enable "
"dynamic binding of buffers as arguments without static type info."
>
];

Expand Down
90 changes: 57 additions & 33 deletions mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -428,18 +428,18 @@ class LegalizeLaunchFuncOpPattern
public:
LegalizeLaunchFuncOpPattern(const LLVMTypeConverter &typeConverter,
bool kernelBarePtrCallConv,
bool typeCheckKernelArgs)
bool kernelIntersperseSizeCallConv)
: ConvertOpToGpuRuntimeCallPattern<gpu::LaunchFuncOp>(typeConverter),
kernelBarePtrCallConv(kernelBarePtrCallConv),
typeCheckKernelArgs(typeCheckKernelArgs) {}
kernelIntersperseSizeCallConv(kernelIntersperseSizeCallConv) {}

private:
LogicalResult
matchAndRewrite(gpu::LaunchFuncOp launchOp, OpAdaptor adaptor,
ConversionPatternRewriter &rewriter) const override;

bool kernelBarePtrCallConv;
bool typeCheckKernelArgs;
bool kernelIntersperseSizeCallConv;
};

/// A rewrite pattern to convert gpu.memcpy operations into a GPU runtime
Expand Down Expand Up @@ -566,8 +566,9 @@ void GpuToLLVMConversionPass::runOnOperation() {
populateFinalizeMemRefToLLVMConversionPatterns(converter, patterns);
populateAsyncStructuralTypeConversionsAndLegality(converter, patterns,
target);
populateGpuToLLVMConversionPatterns(
converter, patterns, kernelBarePtrCallConv, typeCheckKernelArgs);
populateGpuToLLVMConversionPatterns(converter, patterns,
kernelBarePtrCallConv,
kernelIntersperseSizeCallConv);

if (failed(
applyPartialConversion(getOperation(), target, std::move(patterns))))
Expand Down Expand Up @@ -970,33 +971,55 @@ LogicalResult LegalizeLaunchFuncOpPattern::matchAndRewrite(
else if (launchOp.getAsyncToken())
stream = streamCreateCallBuilder.create(loc, rewriter, {}).getResult();

if (typeCheckKernelArgs) {
// The current non-bare-pointer ABI is a bad fit for `mgpuLaunchKernel`,
// which takes an untyped list of arguments. The type check here prevents
// accidentally violating the assumption made in vulkan-runtime-wrappers.cpp
// and creating a unchecked runtime ABI mismatch.
// TODO(https://github.com/llvm/llvm-project/issues/73457): Change the ABI
// here to remove the need for this type check.
for (Value arg : launchOp.getKernelOperands()) {
if (auto memrefTy = dyn_cast<MemRefType>(arg.getType())) {
if (memrefTy.getRank() != 1 ||
memrefTy.getElementTypeBitWidth() != 32) {
return rewriter.notifyMatchFailure(
launchOp, "Operand to launch op is not a rank-1 memref with "
"32-bit element type.");
}
} else {
// Lower the kernel operands to match kernel parameters.
// Note: If `useBarePtrCallConv` is set in the type converter's options,
// the value of `kernelBarePtrCallConv` will be ignored.
OperandRange origArguments = launchOp.getKernelOperands();
SmallVector<Value, 8> llvmArguments = getTypeConverter()->promoteOperands(
loc, origArguments, adaptor.getKernelOperands(), rewriter,
/*useBarePtrCallConv=*/kernelBarePtrCallConv);
SmallVector<Value, 8> llvmArgumentsWithSizes;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: don't need to explicitly specify 8 unless this is some special case (0/1)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the explicit size in this case because promoteOperands returns a SmallVector<Value, 4>, which isn't going to be large enough once the sizes are added (three operands becomes six). But I guess it doesn't matter too much...


// Intersperse size information if requested.
if (kernelIntersperseSizeCallConv) {
if (origArguments.size() != llvmArguments.size()) {
// This shouldn't happen if the bare-pointer calling convention is used.
return rewriter.notifyMatchFailure(
launchOp,
"Cannot add sizes to arguments with one-to-many LLVM IR expansion.");
}

llvmArgumentsWithSizes.reserve(llvmArguments.size() * 2);
for (auto [llvmArg, origArg] : zip_equal(llvmArguments, origArguments)) {
auto memrefTy = dyn_cast<MemRefType>(origArg.getType());
if (!memrefTy) {
return rewriter.notifyMatchFailure(
launchOp, "Operand to launch op is not a memref.");
}

if (!memrefTy.hasStaticShape() ||
!memrefTy.getElementType().isIntOrFloat()) {
return rewriter.notifyMatchFailure(
launchOp, "Operand to launch op is not a memref with a static "
"shape and an integer or float element type.");
}

unsigned bitwidth = memrefTy.getElementTypeBitWidth();
if (bitwidth % 8 != 0) {
return rewriter.notifyMatchFailure(
launchOp, "Operand to launch op is not a memref with a "
"byte-aligned element type.");
}
Comment on lines +1000 to +1012
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like I can't be the first person to have wanted to do this, and I'm not sure I'm checking all the necessary conditions. Is there some handy function for this that I'm missing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't recall seeing a helper for this either. Most code uses llvm::ceilDiv for storage size though instead of rejecting element sizes that are not multiples of 8 bits.

Copy link
Contributor Author

@andfau-amd andfau-amd Jan 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some rule somewhere that guarantees elements smaller than 8 bits get padded to 8?

I doubt the Vulkan target can actually do anything useful with non-multiple-of-8-bits types though, so maybe this simple check is fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was only able to find this:

// Non 1-bit dense elements are padded to 8-bits.
size_t storageSize = llvm::divideCeil(elementWidth, CHAR_BIT);
assert(((data.size() / storageSize) == numElements) &&
"data does not hold expected number of elements");
but it doesn't give us a definite answer. It could be that memref data layout is not universally specified on purpose. cc: @Hardcode84

I think it's fine to reject such memrefs. If someone wants to support them in the future, we can work with them on adding that.


uint64_t staticSize = static_cast<uint64_t>(bitwidth / 8) *
static_cast<uint64_t>(memrefTy.getNumElements());

Value sizeArg = rewriter.create<LLVM::ConstantOp>(
loc, getIndexType(), rewriter.getIndexAttr(staticSize));
llvmArgumentsWithSizes.push_back(llvmArg); // Presumably a bare pointer.
llvmArgumentsWithSizes.push_back(sizeArg);
}
}
// Lower the kernel operands to match kernel parameters.
// Note: If `useBarePtrCallConv` is set in the type converter's options,
// the value of `kernelBarePtrCallConv` will be ignored.
SmallVector<Value, 4> arguments = getTypeConverter()->promoteOperands(
loc, launchOp.getKernelOperands(), adaptor.getKernelOperands(), rewriter,
/*useBarePtrCallConv=*/kernelBarePtrCallConv);

std::optional<gpu::KernelDim3> clusterSize = std::nullopt;
if (launchOp.hasClusterSize()) {
Expand All @@ -1010,7 +1033,9 @@ LogicalResult LegalizeLaunchFuncOpPattern::matchAndRewrite(
adaptor.getGridSizeZ()},
gpu::KernelDim3{adaptor.getBlockSizeX(), adaptor.getBlockSizeY(),
adaptor.getBlockSizeZ()},
adaptor.getDynamicSharedMemorySize(), arguments, stream, clusterSize);
adaptor.getDynamicSharedMemorySize(),
llvmArgumentsWithSizes.empty() ? llvmArguments : llvmArgumentsWithSizes,
stream, clusterSize);
if (launchOp.getAsyncToken())
rewriter.replaceOp(launchOp, {stream});
else
Expand Down Expand Up @@ -1760,10 +1785,9 @@ LogicalResult ConvertCreateBsrOpToGpuRuntimeCallPattern::matchAndRewrite(
return success();
}

void mlir::populateGpuToLLVMConversionPatterns(LLVMTypeConverter &converter,
RewritePatternSet &patterns,
bool kernelBarePtrCallConv,
bool typeCheckKernelArgs) {
void mlir::populateGpuToLLVMConversionPatterns(
LLVMTypeConverter &converter, RewritePatternSet &patterns,
bool kernelBarePtrCallConv, bool kernelIntersperseSizeCallConv) {
addOpaquePointerConversion<gpu::AsyncTokenType>(converter);
addOpaquePointerConversion<gpu::SparseDnTensorHandleType>(converter);
addOpaquePointerConversion<gpu::SparseSpMatHandleType>(converter);
Expand Down Expand Up @@ -1801,7 +1825,7 @@ void mlir::populateGpuToLLVMConversionPatterns(LLVMTypeConverter &converter,
ConvertSpMatGetSizeOpToGpuRuntimeCallPattern,
ConvertSetCsrPointersOpToGpuRuntimeCallPattern>(converter);
patterns.add<LegalizeLaunchFuncOpPattern>(converter, kernelBarePtrCallConv,
typeCheckKernelArgs);
kernelIntersperseSizeCallConv);
}

//===----------------------------------------------------------------------===//
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
// RUN: mlir-opt %s --gpu-to-llvm="use-bare-pointers-for-kernels=1 intersperse-sizes-for-kernels=1" -split-input-file | FileCheck %s

module attributes {gpu.container_module, spirv.target_env = #spirv.target_env<#spirv.vce<v1.0, [Shader], [SPV_KHR_storage_buffer_storage_class]>, #spirv.resource_limits<>>} {
llvm.func @malloc(i64) -> !llvm.ptr
gpu.binary @kernels [#gpu.object<#spirv.target_env<#spirv.vce<v1.0, [Shader], [SPV_KHR_storage_buffer_storage_class]>, #spirv.resource_limits<>>, "">]
func.func @main() attributes {llvm.emit_c_interface} {
// CHECK: [[RANK1UMD:%.*]] = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
%rank1UndefMemrefDescriptor = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
// CHECK: [[RANK2UMD:%.*]] = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
%rank2UndefMemrefDescriptor = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
%c1 = arith.constant 1 : index
// CHECK: [[PTR1:%.*]] = llvm.extractvalue [[RANK1UMD]][1]
// CHECK: [[PTR2:%.*]] = llvm.extractvalue [[RANK2UMD]][1]
// CHECK: [[PTR3:%.*]] = llvm.extractvalue [[RANK2UMD]][1]
// CHECK: [[SIZE1:%.*]] = llvm.mlir.constant(32 : index) : i64
// CHECK: [[SIZE2:%.*]] = llvm.mlir.constant(256 : index) : i64
// CHECK: [[SIZE3:%.*]] = llvm.mlir.constant(48 : index) : i64
%6 = builtin.unrealized_conversion_cast %rank1UndefMemrefDescriptor : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> to memref<8xf32>
%10 = builtin.unrealized_conversion_cast %rank2UndefMemrefDescriptor : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> to memref<8x8xi32>
%14 = builtin.unrealized_conversion_cast %rank2UndefMemrefDescriptor : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> to memref<4x12xi8>
// CHECK: gpu.launch_func @kernels::@kernel_add blocks in ({{.*}}) threads in ({{.*}}) : i64 args([[PTR1]] : !llvm.ptr, [[SIZE1]] : i64, [[PTR2]] : !llvm.ptr, [[SIZE2]] : i64, [[PTR3]] : !llvm.ptr, [[SIZE3]] : i64)
gpu.launch_func @kernels::@kernel_add blocks in (%c1, %c1, %c1) threads in (%c1, %c1, %c1) args(%6 : memref<8xf32>, %10 : memref<8x8xi32>, %14 : memref<4x12xi8>)
return
}
}
7 changes: 3 additions & 4 deletions mlir/test/lib/Pass/TestVulkanRunnerPipeline.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -68,12 +68,11 @@ void buildTestVulkanRunnerPipeline(OpPassManager &passManager,
passManager.addPass(createFinalizeMemRefToLLVMConversionPass());
passManager.nest<func::FuncOp>().addPass(
LLVM::createRequestCWrappersPass());
// vulkan-runtime-wrappers.cpp uses the non-bare-pointer calling convention,
// and the type check is needed to prevent accidental ABI mismatches.
// vulkan-runtime-wrappers.cpp requires these calling convention options.
GpuToLLVMConversionPassOptions opt;
opt.hostBarePtrCallConv = false;
opt.kernelBarePtrCallConv = false;
opt.typeCheckKernelArgs = true;
opt.kernelBarePtrCallConv = true;
opt.kernelIntersperseSizeCallConv = true;
passManager.addPass(createGpuToLLVMConversionPass(opt));
}
}
Expand Down
4 changes: 2 additions & 2 deletions mlir/test/mlir-vulkan-runner/addi.mlir
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
// RUN: mlir-opt %s -test-vulkan-runner-pipeline \
// RUN: | mlir-vulkan-runner - --shared-libs=%vulkan-runtime-wrappers,%mlir_runner_utils --entry-point-result=void | FileCheck %s
// RUN: mlir-opt %s -test-vulkan-runner-pipeline=to-llvm \
// RUN: | mlir-cpu-runner - --shared-libs=%vulkan-runtime-wrappers,%mlir_runner_utils --entry-point-result=void | FileCheck %s

// CHECK-COUNT-64: [3, 3, 3, 3, 3, 3, 3, 3]
module attributes {
Expand Down
4 changes: 2 additions & 2 deletions mlir/test/mlir-vulkan-runner/addi8.mlir
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
// RUN: mlir-opt %s -test-vulkan-runner-pipeline \
// RUN: | mlir-vulkan-runner - --shared-libs=%vulkan-runtime-wrappers,%mlir_runner_utils --entry-point-result=void | FileCheck %s
// RUN: mlir-opt %s -test-vulkan-runner-pipeline=to-llvm \
// RUN: | mlir-cpu-runner - --shared-libs=%vulkan-runtime-wrappers,%mlir_runner_utils --entry-point-result=void | FileCheck %s

// CHECK-COUNT-64: [3, 3, 3, 3, 3, 3, 3, 3]
module attributes {
Expand Down
4 changes: 2 additions & 2 deletions mlir/test/mlir-vulkan-runner/mulf.mlir
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
// RUN: mlir-opt %s -test-vulkan-runner-pipeline \
// RUN: | mlir-vulkan-runner - --shared-libs=%vulkan-runtime-wrappers,%mlir_runner_utils --entry-point-result=void | FileCheck %s
// RUN: mlir-opt %s -test-vulkan-runner-pipeline=to-llvm \
// RUN: | mlir-cpu-runner - --shared-libs=%vulkan-runtime-wrappers,%mlir_runner_utils --entry-point-result=void | FileCheck %s

// CHECK-COUNT-4: [6, 6, 6, 6]
module attributes {
Expand Down
4 changes: 2 additions & 2 deletions mlir/test/mlir-vulkan-runner/subf.mlir
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
// RUN: mlir-opt %s -test-vulkan-runner-pipeline \
// RUN: | mlir-vulkan-runner - --shared-libs=%vulkan-runtime-wrappers,%mlir_runner_utils --entry-point-result=void | FileCheck %s
// RUN: mlir-opt %s -test-vulkan-runner-pipeline=to-llvm \
// RUN: | mlir-cpu-runner - --shared-libs=%vulkan-runtime-wrappers,%mlir_runner_utils --entry-point-result=void | FileCheck %s

// CHECK-COUNT-32: [2.2, 2.2, 2.2, 2.2]
module attributes {
Expand Down
25 changes: 10 additions & 15 deletions mlir/tools/mlir-vulkan-runner/vulkan-runtime-wrappers.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -169,26 +169,21 @@ mgpuLaunchKernel(void *vkKernel, size_t gridX, size_t gridY, size_t gridZ,
void ** /*extra*/, size_t paramsCount) {
auto manager = static_cast<VulkanRuntimeManager *>(vkRuntimeManager);

// The non-bare-pointer memref ABI interacts badly with mgpuLaunchKernel's
// signature:
// - The memref descriptor struct gets split into several elements, each
// passed as their own "param".
// - No metadata is provided as to the rank or element type/size of a memref.
// Here we assume that all MemRefs have rank 1 and an element size of
// 4 bytes. This means each descriptor struct will have five members.
// TODO(https://github.com/llvm/llvm-project/issues/73457): Refactor the
// ABI/API of mgpuLaunchKernel to use a different ABI for memrefs, so
// that other memref types can also be used. This will allow migrating
// the remaining tests and removal of mlir-vulkan-runner.
const size_t paramsPerMemRef = 5;
// GpuToLLVMConversionPass with the kernelBarePtrCallConv and
// kernelIntersperseSizeCallConv options will set up the params array like:
// { &memref_ptr0, &memref_size0, &memref_ptr1, &memref_size1, ... }
const size_t paramsPerMemRef = 2;
if (paramsCount % paramsPerMemRef != 0) {
abort();
abort(); // This would indicate a serious calling convention mismatch.
}
const DescriptorSetIndex setIndex = 0;
BindingIndex bindIndex = 0;
for (size_t i = 0; i < paramsCount; i += paramsPerMemRef) {
auto memref = static_cast<MemRefDescriptor<uint32_t, 1> *>(params[i]);
bindMemRef<uint32_t, 1>(manager, setIndex, bindIndex, memref);
void *memrefBufferBasePtr = *static_cast<void **>(params[i + 0]);
size_t memrefBufferSize = *static_cast<size_t *>(params[i + 1]);
VulkanHostMemoryBuffer memBuffer{memrefBufferBasePtr,
static_cast<uint32_t>(memrefBufferSize)};
manager->setResourceData(setIndex, bindIndex, memBuffer);
++bindIndex;
}

Expand Down
Loading