[mlir][gpu][NVPTX] Enable NVIDIA GPU JIT compilation path #66220

fabianmcg · 2023-09-13T15:45:32Z

This patch adds an NVPTX compilation path that enables JIT compilation on NVIDIA targets. The following modifications were performed:

Adding a format field to the GPU object attribute, allowing the translation attribute to use the correct runtime function to load the module. Likewise, a dictionary attribute was added to add any possible extra options.
Adding the createObject method to GPUTargetAttrInterface; this method returns a GPU object from a binary string.
Adding the function mgpuModuleLoadJIT, which is only available for NVIDIA GPUs, as there is no equivalent for AMD.
Adding the CMake flag MLIR_GPU_COMPILATION_TEST_FORMAT to specify the format to use during testing.

NOTE:

Not all tests are using MLIR_GPU_COMPILATION_TEST_FORMAT.
An option needs to be added to the SparseCompiler to support the format option, however I didn't know if there's any preference.
I'm basing the implementation of mgpuModuleLoadJIT on the assumption there's a JIT cache. Another option is to implement the cache itself in MLIR.

This patch adds an NVPTX compilation path that enables JIT compilation on NVIDIA targets. The following modifications were performed: 1. Adding a format field to the GPU object attribute, allowing the translation attribute to use the correct runtime function to load the module. Likewise, a dictionary attribute was added to add any possible extra options. 2. Adding the "createObject" method to "GPUTargetAttrInterface"; this method returns a GPU object from a binary string. 3. Adding the function "mgpuModuleLoadJIT", which is only available for NVIDIA GPUs, as there is no equivalent for AMD. 4. Adding the CMake flag `MLIR_GPU_COMPILATION_TEST_FORMAT` to specify the format to use during testing.

llvmbot · 2023-09-13T15:46:38Z

@llvm/pr-subscribers-mlir-sparse
@llvm/pr-subscribers-mlir-llvm

@llvm/pr-subscribers-mlir

Changes

This patch adds an NVPTX compilation path that enables JIT compilation on NVIDIA targets. The following modifications were performed: 1. Adding a format field to the GPU object attribute, allowing the translation attribute to use the correct runtime function to load the module. Likewise, a dictionary attribute was added to add any possible extra options.

Adding the createObject method to GPUTargetAttrInterface; this method returns a GPU object from a binary string.
Adding the function mgpuModuleLoadJIT, which is only available for NVIDIA GPUs, as there is no equivalent for AMD.
Adding the CMake flag MLIR_GPU_COMPILATION_TEST_FORMAT to specify the format to use during testing.

NOTE:

Not all tests are using MLIR_GPU_COMPILATION_TEST_FORMAT.
An option needs to be added to the SparseCompiler to support the format option, however I didn't know if there's any preference.
I'm basing the implementation of mgpuModuleLoadJIT on the assumption there's a JIT cache. Another option is to implement the cache itself in MLIR.
--

Patch is 50.36 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/66220.diff

33 Files Affected:

(modified) mlir/include/mlir/Dialect/GPU/IR/CompilationAttrInterfaces.td (+12-3)
(modified) mlir/include/mlir/Dialect/GPU/IR/CompilationAttrs.td (+23-2)
(modified) mlir/include/mlir/Dialect/GPU/IR/CompilationInterfaces.h (+16-22)
(modified) mlir/include/mlir/Dialect/GPU/Transforms/Passes.td (+1-2)
(modified) mlir/lib/Dialect/GPU/IR/GPUDialect.cpp (+44-5)
(modified) mlir/lib/Dialect/GPU/Transforms/ModuleToBinary.cpp (+18-18)
(modified) mlir/lib/ExecutionEngine/CudaRuntimeWrappers.cpp (+21)
(modified) mlir/lib/ExecutionEngine/RocmRuntimeWrappers.cpp (+5)
(modified) mlir/lib/Target/LLVM/NVVM/Target.cpp (+31-7)
(modified) mlir/lib/Target/LLVM/ROCDL/Target.cpp (+19-2)
(modified) mlir/lib/Target/LLVMIR/Dialect/GPU/SelectObjectAttr.cpp (+69-21)
(modified) mlir/test/CMakeLists.txt (+2)
(modified) mlir/test/Dialect/GPU/module-to-binary-nvvm.mlir (+3-3)
(modified) mlir/test/Dialect/GPU/module-to-binary-rocdl.mlir (+3-3)
(modified) mlir/test/Dialect/GPU/ops.mlir (+10)
(modified) mlir/test/Integration/GPU/CUDA/all-reduce-and.mlir (+1-1)
(modified) mlir/test/Integration/GPU/CUDA/all-reduce-max.mlir (+1-1)
(modified) mlir/test/Integration/GPU/CUDA/all-reduce-min.mlir (+1-1)
(modified) mlir/test/Integration/GPU/CUDA/all-reduce-op.mlir (+1-1)
(modified) mlir/test/Integration/GPU/CUDA/all-reduce-or.mlir (+1-1)
(modified) mlir/test/Integration/GPU/CUDA/all-reduce-region.mlir (+1-1)
(modified) mlir/test/Integration/GPU/CUDA/all-reduce-xor.mlir (+1-1)
(modified) mlir/test/Integration/GPU/CUDA/async.mlir (+1-1)
(modified) mlir/test/Integration/GPU/CUDA/gpu-to-cubin.mlir (+1-1)
(modified) mlir/test/Integration/GPU/CUDA/lit.local.cfg (+2)
(modified) mlir/test/Integration/GPU/CUDA/multiple-all-reduce.mlir (+1-1)
(modified) mlir/test/Integration/GPU/CUDA/printf.mlir (+1-1)
(modified) mlir/test/Integration/GPU/CUDA/shuffle.mlir (+1-1)
(modified) mlir/test/Integration/GPU/CUDA/two-modules.mlir (+1-1)
(modified) mlir/test/lib/Dialect/GPU/TestLowerToNVVM.cpp (+7-1)
(modified) mlir/test/lit.site.cfg.py.in (+1)
(modified) mlir/unittests/Target/LLVM/SerializeNVVMTarget.cpp (+3-3)
(modified) mlir/unittests/Target/LLVM/SerializeROCDLTarget.cpp (+3-3)

<pre>
diff --git a/mlir/include/mlir/Dialect/GPU/IR/CompilationAttrInterfaces.td b/mlir/include/mlir/Dialect/GPU/IR/CompilationAttrInterfaces.td
index 5255286619e3bf2..160730480394272 100644
--- a/mlir/include/mlir/Dialect/GPU/IR/CompilationAttrInterfaces.td
+++ b/mlir/include/mlir/Dialect/GPU/IR/CompilationAttrInterfaces.td
@@ -33,12 +33,21 @@ def GPUTargetAttrInterface : AttrInterface<"TargetAttrInterface"> {

     If serialization fails then the method should return `std::nullopt`.

   The `module` argument must be a GPU Module Op. The `options` argument is

   meant to be used for passing additional options that are not in the

   The `module` parameter must be a GPU Module Op. The `options` parameter

   is meant to be used for passing additional options that are not in the
   attribute.
 }],
 &amp;quot;std::optional&amp;lt;SmallVector&amp;lt;char, 0&amp;gt;&amp;gt;&amp;quot;, &amp;quot;serializeToObject&amp;quot;,

 (ins &amp;quot;Operation*&amp;quot;:$module, &amp;quot;const gpu::TargetOptions&amp;amp;&amp;quot;:$options)&amp;gt;

 (ins &amp;quot;Operation*&amp;quot;:$module, &amp;quot;const gpu::TargetOptions&amp;amp;&amp;quot;:$options)&amp;gt;,

InterfaceMethod<[{

   Creates a GPU object attribute from a binary string.

   The `object` parameter is a binary string. The `options` parameter is

   meant to be used for passing additional options that are not in the

```
   attribute.
```

 }], &amp;quot;Attribute&amp;quot;, &amp;quot;createObject&amp;quot;,

   (ins &amp;quot;const SmallVector&amp;lt;char, 0&amp;gt;&amp;amp;&amp;quot;:$object,

        &amp;quot;const gpu::TargetOptions&amp;amp;&amp;quot;:$options)&amp;gt;

];
}

diff --git a/mlir/include/mlir/Dialect/GPU/IR/CompilationAttrs.td b/mlir/include/mlir/Dialect/GPU/IR/CompilationAttrs.td
index 9c1110d8e9a9463..3d2e9848a2b25a0 100644
--- a/mlir/include/mlir/Dialect/GPU/IR/CompilationAttrs.td
+++ b/mlir/include/mlir/Dialect/GPU/IR/CompilationAttrs.td
@@ -20,6 +20,18 @@ include "mlir/Dialect/GPU/IR/CompilationAttrInterfaces.td"
// GPU object attribute.
//===----------------------------------------------------------------------===//

+def GPU_ObjectOffload : I32EnumAttrCase<"Offload", 1, "offload">;
+def GPU_ObjectISA : I32EnumAttrCase<"Assembly", 2, "assembly">;
+def GPU_ObjectBinary : I32EnumAttrCase<"Binary", 3, "bin">;
+def GPU_ObjectFatbin : I32EnumAttrCase<"Fatbin", 4, "fatbin">;
+def GPU_CompilationTargetEnum : GPU_I32Enum<

"CompilationTarget", "GPU object format", [
GPU_ObjectOffload,
GPU_ObjectISA,
GPU_ObjectBinary,
GPU_ObjectFatbin
]>;

def GPU_ObjectAttr : GPU_Attr<"Object", "object"> {
let description = [{
A GPU object attribute pairs a GPU target with a binary string,
@@ -32,8 +44,17 @@ def GPU_ObjectAttr : GPU_Attr<"Object", "object"> {
#gpu.object<#nvvm.target, "...">
```
}];

let parameters = (ins "Attribute":$target, "StringAttr":$object);
let assemblyFormat = [{&lt; $target , $object &gt;}];

let parameters = (ins
"Attribute":$target,
DefaultValuedParameter<"CompilationTarget", "CompilationTarget::Fatbin">:$format,
"StringAttr":$object,
OptionalParameter<"DictionaryAttr">:$properties
);
let assemblyFormat = [{ &lt;

 $target `,`  (`properties` `=` $properties ^ `,`)?

 custom&amp;lt;Object&amp;gt;($format, $object)

&gt;
}];
let genVerifyDecl = 1;
}

diff --git a/mlir/include/mlir/Dialect/GPU/IR/CompilationInterfaces.h b/mlir/include/mlir/Dialect/GPU/IR/CompilationInterfaces.h
index a1f64be57fa699d..ee7daed58f98314 100644
--- a/mlir/include/mlir/Dialect/GPU/IR/CompilationInterfaces.h
+++ b/mlir/include/mlir/Dialect/GPU/IR/CompilationInterfaces.h
@@ -25,6 +25,8 @@ namespace LLVM {
class ModuleTranslation;
}
namespace gpu {
+enum class CompilationTarget : uint32_t;
+
/// This class indicates that the attribute associated with this trait is a GPU
/// offloading translation attribute. These kinds of attributes must implement
/// an interface for handling the translation of GPU offloading operations like
@@ -42,27 +44,15 @@ class OffloadingTranslationAttrTrait
/// ensure type safeness. Targets are free to ignore these options.
class TargetOptions {
public:

/// The target representation of the compilation process.
typedef enum {
offload = 1, /// The process should produce an offloading representation.

             /// For the NVVM &amp;amp; ROCDL targets this option produces LLVM IR.

assembly = 2, /// The process should produce assembly code.
binary = 4, /// The process should produce a binary.
fatbinary = 8, /// The process should produce a fat binary.
binOrFatbin =
```
   binary |
```

   fatbinary, /// The process should produce a binary or fatbinary. It&amp;#x27;s up

              /// to the target to decide which.

} CompilationTarget;
/// Constructor initializing the toolkit path, the list of files to link to,
/// extra command line options, the compilation target and a callback for
/// obtaining the parent symbol table. The default compilation target is
/// binOrFatbin.
TargetOptions(StringRef toolkitPath = {},

           ArrayRef&amp;lt;std::string&amp;gt; linkFiles = {}, StringRef cmdOptions = {},

           CompilationTarget compilationTarget = binOrFatbin,

           function_ref&amp;lt;SymbolTable *()&amp;gt; getSymbolTableCallback = {});

TargetOptions(

 StringRef toolkitPath = {}, ArrayRef&amp;lt;std::string&amp;gt; linkFiles = {},

```
 StringRef cmdOptions = {},
```

 CompilationTarget compilationTarget = getDefaultCompilationTarget(),

```
 function_ref&amp;lt;SymbolTable *()&amp;gt; getSymbolTableCallback = {});
```
/// Returns the typeID.
TypeID getTypeID() const;
@@ -90,13 +80,17 @@ class TargetOptions {
/// table.
SymbolTable *getSymbolTable() const;
/// Returns the default compilation target: CompilationTarget::Fatbin.
static CompilationTarget getDefaultCompilationTarget();

protected:
/// Derived classes must use this constructor to initialize typeID to the
/// appropiate value: ie. TargetOptions(TypeID::get&lt;DerivedClass&gt;()).

TargetOptions(TypeID typeID, StringRef toolkitPath = {},

           ArrayRef&amp;lt;std::string&amp;gt; linkFiles = {}, StringRef cmdOptions = {},

           CompilationTarget compilationTarget = binOrFatbin,

           function_ref&amp;lt;SymbolTable *()&amp;gt; getSymbolTableCallback = {});

TargetOptions(

 TypeID typeID, StringRef toolkitPath = {},

 ArrayRef&amp;lt;std::string&amp;gt; linkFiles = {}, StringRef cmdOptions = {},

 CompilationTarget compilationTarget = getDefaultCompilationTarget(),

```
 function_ref&amp;lt;SymbolTable *()&amp;gt; getSymbolTableCallback = {});
```
/// Path to the target toolkit.
std::string toolkitPath;
@@ -108,7 +102,7 @@ class TargetOptions {
/// process.
std::string cmdOptions;

/// Compilation process target representation.

/// Compilation process target format.
CompilationTarget compilationTarget;

/// Callback for obtaining the parent symbol table of all the GPU modules
diff --git a/mlir/include/mlir/Dialect/GPU/Transforms/Passes.td b/mlir/include/mlir/Dialect/GPU/Transforms/Passes.td
index 0bfb2750992058f..3de8e18851369df 100644
--- a/mlir/include/mlir/Dialect/GPU/Transforms/Passes.td
+++ b/mlir/include/mlir/Dialect/GPU/Transforms/Passes.td
@@ -68,7 +68,6 @@ def GpuModuleToBinaryPass
2. assembly, isa: produces assembly code.
3. binary, bin: produces binaries.
4. fatbinary, fatbin: produces fatbinaries.

1. binOrFatbin: produces bins or fatbins, the target decides which.
  }];
  let options = [
  Option<"offloadingHandler", "handler", "Attribute", "nullptr",
  @@ -79,7 +78,7 @@ def GpuModuleToBinaryPass
  "Extra files to link to.">,
  Option<"cmdOptions", "opts", "std::string", [{""}],
  "Command line options to pass to the tools.">,
Option<"compilationTarget", "format", "std::string", [{"binOrFatbin"}],

Option<"compilationTarget", "format", "std::string", [{"fatbin"}],
"The target representation of the compilation process.">
];
}
diff --git a/mlir/lib/Dialect/GPU/IR/GPUDialect.cpp b/mlir/lib/Dialect/GPU/IR/GPUDialect.cpp
index fde379cd0afe13f..5eb2cadc884e151 100644
--- a/mlir/lib/Dialect/GPU/IR/GPUDialect.cpp
+++ b/mlir/lib/Dialect/GPU/IR/GPUDialect.cpp
@@ -1959,7 +1959,8 @@ void AllocOp::getCanonicalizationPatterns(RewritePatternSet &results,
//===----------------------------------------------------------------------===//

LogicalResult ObjectAttr::verify(function_ref<InFlightDiagnostic()> emitError,

                            Attribute target, StringAttr object) {

                            Attribute target, CompilationTarget format,

```
                            StringAttr object, DictionaryAttr properties) {
```
if (!target)
return emitError() << "the target attribute cannot be null";
if (target.hasPromiseOrImplementsInterface<TargetAttrInterface>())
@@ -1968,6 +1969,40 @@ LogicalResult ObjectAttr::verify(function_ref<InFlightDiagnostic()> emitError,
"gpu::TargetAttrInterface";
}

+namespace {
+LogicalResult parseObject(AsmParser &odsParser, CompilationTarget &format,

                     StringAttr &amp;amp;object) {

std::optional<CompilationTarget> formatResult;
StringRef enumKeyword;
auto loc = odsParser.getCurrentLocation();
if (failed(odsParser.parseOptionalKeyword(&enumKeyword)))
formatResult = CompilationTarget::Fatbin;
if (!formatResult &&
```
 (formatResult =
```

      gpu::symbolizeEnum&amp;lt;gpu::CompilationTarget&amp;gt;(enumKeyword)) &amp;amp;&amp;amp;

```
 odsParser.parseEqual())
```
return odsParser.emitError(loc, "expected an equal sign");
if (!formatResult)
return odsParser.emitError(loc, "expected keyword for GPU object format");
FailureOr<StringAttr> objectResult =

 FieldParser&amp;lt;StringAttr&amp;gt;::parse(odsParser);

if (failed(objectResult))
return odsParser.emitError(odsParser.getCurrentLocation(),

                          &amp;quot;failed to parse GPU_ObjectAttr parameter &amp;quot;

                          &amp;quot;&amp;#x27;object&amp;#x27; which is to be a `StringAttr`&amp;quot;);

format = *formatResult;
object = *objectResult;
return success();
+}

+void printObject(AsmPrinter &odsParser, CompilationTarget format,

```
            StringAttr object) {
```
if (format != CompilationTarget::Fatbin)
odsParser << stringifyEnum(format) << " = ";
odsParser << object;
+}
+} // namespace

//===----------------------------------------------------------------------===//
// GPU select object attribute
//===----------------------------------------------------------------------===//
@@ -2020,6 +2055,14 @@ SymbolTable *TargetOptions::getSymbolTable() const {
return getSymbolTableCallback ? getSymbolTableCallback() : nullptr;
}

+CompilationTarget TargetOptions::getCompilationTarget() const {

return compilationTarget;
+}

+CompilationTarget TargetOptions::getDefaultCompilationTarget() {

return CompilationTarget::Fatbin;
+}

std::pair<llvm::BumpPtrAllocator, SmallVector<const char *>>
TargetOptions::tokenizeCmdOptions() const {
std::pair<llvm::BumpPtrAllocator, SmallVector<const char *>> options;
@@ -2043,10 +2086,6 @@ TargetOptions::tokenizeCmdOptions() const {
return options;
}

-TargetOptions::CompilationTarget TargetOptions::getCompilationTarget() const {

return compilationTarget;
-}

MLIR_DEFINE_EXPLICIT_TYPE_ID(::mlir::gpu::TargetOptions)

#include "mlir/Dialect/GPU/IR/GPUOpInterfaces.cpp.inc"
diff --git a/mlir/lib/Dialect/GPU/Transforms/ModuleToBinary.cpp b/mlir/lib/Dialect/GPU/Transforms/ModuleToBinary.cpp
index e29a1f0c3248d04..2bf89f8c57903e5 100644
--- a/mlir/lib/Dialect/GPU/Transforms/ModuleToBinary.cpp
+++ b/mlir/lib/Dialect/GPU/Transforms/ModuleToBinary.cpp
@@ -57,14 +57,14 @@ void GpuModuleToBinaryPass::getDependentDialects(

void GpuModuleToBinaryPass::runOnOperation() {
RewritePatternSet patterns(&getContext());

int targetFormat = llvm::StringSwitch<int>(compilationTarget)

                    .Cases(&amp;quot;offloading&amp;quot;, &amp;quot;llvm&amp;quot;, TargetOptions::offload)

                    .Cases(&amp;quot;assembly&amp;quot;, &amp;quot;isa&amp;quot;, TargetOptions::assembly)

                    .Cases(&amp;quot;binary&amp;quot;, &amp;quot;bin&amp;quot;, TargetOptions::binary)

                    .Cases(&amp;quot;fatbinary&amp;quot;, &amp;quot;fatbin&amp;quot;, TargetOptions::fatbinary)

                    .Case(&amp;quot;binOrFatbin&amp;quot;, TargetOptions::binOrFatbin)

```
                    .Default(-1);
```
if (targetFormat == -1)

auto targetFormat =

 llvm::StringSwitch&amp;lt;std::optional&amp;lt;CompilationTarget&amp;gt;&amp;gt;(compilationTarget)

     .Cases(&amp;quot;offloading&amp;quot;, &amp;quot;llvm&amp;quot;, CompilationTarget::Offload)

     .Cases(&amp;quot;assembly&amp;quot;, &amp;quot;isa&amp;quot;, CompilationTarget::Assembly)

     .Cases(&amp;quot;binary&amp;quot;, &amp;quot;bin&amp;quot;, CompilationTarget::Binary)

     .Cases(&amp;quot;fatbinary&amp;quot;, &amp;quot;fatbin&amp;quot;, CompilationTarget::Fatbin)

```
     .Default(std::nullopt);
```
if (!targetFormat)
getOperation()->emitError() << "Invalid format specified.";

// Lazy symbol table builder callback.
@@ -82,10 +82,8 @@ void GpuModuleToBinaryPass::runOnOperation() {
return &parentTable.value();
};

TargetOptions targetOptions(
```
 toolkitPath, linkFiles, cmdOptions,
```

 static_cast&amp;lt;TargetOptions::CompilationTarget&amp;gt;(targetFormat),

```
 lazyTableBuilder);
```

TargetOptions targetOptions(toolkitPath, linkFiles, cmdOptions, *targetFormat,
```
                         lazyTableBuilder);
```
if (failed(transformGpuModulesToBinaries(
getOperation(),
offloadingHandler ? dyn_cast<OffloadingLLVMTranslationAttrInterface>(
@@ -107,17 +105,19 @@ LogicalResult moduleSerializer(GPUModuleOp op,
auto target = dyn_cast<gpu::TargetAttrInterface>(targetAttr);
assert(target &&
"Target attribute doesn't implements TargetAttrInterface.");

std::optional<SmallVector<char, 0>> object =

std::optional<SmallVector<char, 0>> serializedModule =
target.serializeToObject(op, targetOptions);

if (!object) {

if (!serializedModule) {
op.emitError("An error happened while serializing the module.");
return failure();
}

objects.push_back(builder.getAttr<gpu::ObjectAttr>(
```
   target,
```

   builder.getStringAttr(StringRef(object-&amp;gt;data(), object-&amp;gt;size()))));

Attribute object = target.createObject(*serializedModule, targetOptions);
if (!object) {

 op.emitError(&amp;quot;An error happened while creating the object.&amp;quot;);

```
 return failure();
```
}
objects.push_back(object);
}
builder.setInsertionPointAfter(op);
builder.create<gpu::BinaryOp>(op.getLoc(), op.getName(), handler,
diff --git a/mlir/lib/ExecutionEngine/CudaRuntimeWrappers.cpp b/mlir/lib/ExecutionEngine/CudaRuntimeWrappers.cpp
index 7bf6804902479a8..d19d473a5327627 100644
--- a/mlir/lib/ExecutionEngine/CudaRuntimeWrappers.cpp
+++ b/mlir/lib/ExecutionEngine/CudaRuntimeWrappers.cpp
@@ -126,6 +126,27 @@ extern "C" MLIR_CUDA_WRAPPERS_EXPORT CUmodule mgpuModuleLoad(void *data) {
return module;
}

+extern "C" MLIR_CUDA_WRAPPERS_EXPORT CUmodule mgpuModuleLoadJIT(void *data,

                                                           int optLevel) {

ScopedContext scopedContext;
CUmodule module = nullptr;
char jitErrorBuffer[4096] = {0};
CUjit_option jitOptions[] = {CU_JIT_ERROR_LOG_BUFFER,

                          CU_JIT_ERROR_LOG_BUFFER_SIZE_BYTES,

                          CU_JIT_OPTIMIZATION_LEVEL};

void *jitOptionsVals[] = {jitErrorBuffer,

                       reinterpret_cast&amp;lt;void *&amp;gt;(sizeof(jitErrorBuffer)),

                       reinterpret_cast&amp;lt;void *&amp;gt;(optLevel)};

CUresult result =

 cuModuleLoadDataEx(&amp;amp;module, data, 3, jitOptions, jitOptionsVals);

if (result) {
fprintf(stderr, "JIT compilation failed with: '%s'\n", jitErrorBuffer);
CUDA_REPORT_IF_ERROR(result);
}
return module;
+}

extern "C" MLIR_CUDA_WRAPPERS_EXPORT void mgpuModuleUnload(CUmodule module) {
CUDA_REPORT_IF_ERROR(cuModuleUnload(module));
}
diff --git a/mlir/lib/ExecutionEngine/RocmRuntimeWrappers.cpp b/mlir/lib/ExecutionEngine/RocmRuntimeWrappers.cpp
index bd3868a8e196f6f..da2ae87fef6715f 100644
--- a/mlir/lib/ExecutionEngine/RocmRuntimeWrappers.cpp
+++ b/mlir/lib/ExecutionEngine/RocmRuntimeWrappers.cpp
@@ -38,6 +38,11 @@ extern "C" hipModule_t mgpuModuleLoad(void *data) {
return module;
}

+extern "C" hipModule_t mgpuModuleLoadJIT(void *data, int optLevel) {

assert(false && "This function is not available in HIP.");
return nullptr;
+}

extern "C" void mgpuModuleUnload(hipModule_t module) {
HIP_REPORT_IF_ERROR(hipModuleUnload(module));
}
diff --git a/mlir/lib/Target/LLVM/NVVM/Target.cpp b/mlir/lib/Target/LLVM/NVVM/Target.cpp
index 13188b1107d928b..7f263627db54fbe 100644
--- a/mlir/lib/Target/LLVM/NVVM/Target.cpp
+++ b/mlir/lib/Target/LLVM/NVVM/Target.cpp
@@ -47,6 +47,10 @@ class NVVMTargetAttrImpl
std::optional<SmallVector<char, 0>>
serializeT...

aartbik · 2023-09-13T17:50:48Z

Ad "An option needs to be added to the SparseCompiler to support the format option, however I didn't know if there's any preference."

Since I don't see changes in the sparse code, I assume you want some feedback, but I need a bit more context on what you had in mind. In general, we have a lot of "knobs" in the sparse pipeline setup, so generally I am not opposed to adding one more ;-)

fabianmcg · 2023-09-13T18:11:46Z

Since I don't see changes in the sparse code, I assume you want some feedback, but I need a bit more context on what you had in mind. In general, we have a lot of "knobs" in the sparse pipeline setup, so generally I am not opposed to adding one more ;-)

With this patch we have 3 ways to compile code, JIT->format=assembly, Cubin->format=bin, Fatbin->format=fatbin, JIT will obviously add a performance hit at runtime, so the questions are:

Is it okay to add another option to the sparse compiler to specify which format to use?
Is there a preference to which option to use by default?

JIT will make the test work, but fatbin is preferable for runtime performance as it can be used for AOT.

aartbik · 2023-09-13T18:19:11Z

Ad "Is it okay to add another option to the sparse compiler to specify which format to use?"

Yes, more than okay!

Ass "Is there a preference to which option to use by default?"

If JIT make the test work again, let's make that the default. But please describe the three options in detail with performance implications (possibly indirectly by referring to where you add this as comment)

fabianmcg · 2023-09-13T19:06:49Z

If JIT make the test work again, let's make that the default. But please describe the three options in detail with performance implications (possibly indirectly by referring to where you add this as comment)

I'll do that. Btw, did you have the chance to try the fix I posted in #65857 yesterday?

grypp · 2023-09-13T20:36:13Z

I haven't looked at the code carefully, I will do that on my tomorrow, but adding JIT sounds great.

Is there a preference to which option to use by default?

Would it be okay if we didn't do this for default behaviour?

Nvidia's state-of-art compiler is nvcc and it uses ptxas, not JIT. When comparing the performance of MLIR, it gives sanity to use same toolchain as nvcc. I had problems using the driver for JIT, which produced different SASS code from the exact same PTX, even though it was the same version with ptxas.

fabianmcg · 2023-09-13T20:51:06Z

Would it be okay if we didn't do this for default behaviour?

The default behavior remains fatbin. However, I made isa (JIT) the default behavior for the sparse compiler.

Another option, is setting MLIR_GPU_COMPILATION_TEST_FORMAT=isa so tests run with JIT, but everywhere else keep fatbin as the default behavior.

I'm inclined to keep fatbin as default everywhere and make downstream users set MLIR_GPU_COMPILATION_TEST_FORMAT=isa in their builds.

aartbik · 2023-09-14T00:32:09Z

Perfectly okay with another default for the sparse compiler for consistency. I was merely suggesting this so the test would pass without changes but explicitly marking them as isa. I don't feel strongly either way, so please pick whatever feels best.

joker-eph · 2023-09-14T05:02:52Z

mlir/include/mlir/Dialect/GPU/IR/CompilationAttrs.td

+    GPU_ObjectISA,
+    GPU_ObjectBinary,
+    GPU_ObjectFatbin
+  ]>;


This deserves some doc.

(I'm not totally sure right now what "offload" does in this list actually)

I added the docs in the ObjectAttr docs. The offload format is meant to be a generic format, for NVPTX and & AMDGPU it generates LLVM bitcode. Execution from this format is not enabled in trunk, however downstream users could use it.

joker-eph · 2023-09-14T05:03:38Z

mlir/include/mlir/Dialect/GPU/IR/CompilationAttrs.td

@@ -32,8 +44,17 @@ def GPU_ObjectAttr : GPU_Attr<"Object", "object"> {
      #gpu.object<#nvvm.target, "...">
    ```
  }];


Update the doc please

joker-eph · 2023-09-14T05:04:05Z

mlir/include/mlir/Dialect/GPU/IR/CompilationInterfaces.h

-    binOrFatbin =
-        binary |
-        fatbinary, /// The process should produce a binary or fatbinary. It's up
-                   /// to the target to decide which.


(this is the doc that may have been lost moving to ODS, cf the other comment above)

joker-eph · 2023-09-14T05:05:00Z

mlir/include/mlir/Dialect/SparseTensor/Pipelines/Passes.h

@@ -144,6 +144,22 @@ struct SparseCompilerOptions
                                           desc("GPU target architecture")};
  PassOptions::Option<std::string> gpuFeatures{*this, "gpu-features",
                                               desc("GPU target features")};
+  /// For NVIDIA GPUs there are 3 compilation format options:
+  /// 1. `isa`: the compiler generates PTX and the runtime JITs the PTX.


Suggested change

/// 1. `isa`: the compiler generates PTX and the runtime JITs the PTX.

/// 1. `isa`: the compiler generates PTX and the driver JITs the PTX.

joker-eph · 2023-09-14T05:06:13Z

mlir/include/mlir/Dialect/SparseTensor/Pipelines/Passes.h

+  /// GPU running the program.
+  /// Option 3 is the best compromise between options 1 & 2 as it can JIT in
+  /// case of an arch mismatch, however, it's only possible to JIT to a higher
+  /// CC than `gpuChip`.


What is the CC target when using 1.?

To some extent there shouldn't be any difference between 1 and 3?

It's never specified that's why gpu-to-cubin always worked, it's always JITted to the running arch.

If there's an arch mismatch then 1 and 3 have the same performance hit, however if the compiled arch matches the running arch, then it behaves like 2 and there's no performance hit.

…ST_FORMAT

fabianmcg · 2023-09-14T13:20:30Z

The last commit updated the docs, migrated all tests to use MLIR_GPU_COMPILATION_TEST_FORMAT, thus downstream users can simply set -DMLIR_GPU_COMPILATION_TEST_FORMAT=isa when building and all tests should work, if @aartbik or @grypp can verify this it would be appreciated.
For the sake of consistency I also made fatbin the default format everywhere, including the sparse compiler.

aartbik · 2023-09-14T18:28:01Z

mlir/include/mlir/Dialect/SparseTensor/Pipelines/Passes.h

+  /// 3. `fatbin`: generates a fat binary with a CUBIN object for `gpuChip` and
+  /// also embeds the PTX in the fat binary.
+  /// Notes:
+  /// Option 1 adds a significant runtime performance hit, however, tests are


Thank you for adding this detailed explanation.

aartbik

Thank you for this change! I have a few nits, but good to go once addressed so I am approving this (for the sparse changes part)

aartbik · 2023-09-14T18:28:53Z

mlir/include/mlir/Dialect/SparseTensor/Pipelines/Passes.h

+  /// Option 1 adds a significant runtime performance hit, however, tests are
+  /// more likely to pass with this option.
+  /// Option 2 is better for execution time as there is no JIT; however, the
+  /// program will fail if there's an arch mismatch between `gpuChip` and the


nit: can you please spell out "architecture" (unless this is NVidia convention to write it that way)

mlir/include/mlir/Dialect/SparseTensor/Pipelines/Passes.h

aartbik · 2023-09-14T18:32:04Z

mlir/test/Integration/Dialect/SparseTensor/GPU/CUDA/lit.local.cfg

@@ -1,2 +1,4 @@
 if not config.enable_cuda_runner or not config.mlir_run_cuda_sm80_tests:
    config.unsupported = True
+
+config.substitutions.append(("%format", config.gpu_compilation_format))


can we please use a slightly more specific name for this (format is very generic, how about at least gpu_format or so)

… & -> and

aartbik · 2023-09-14T22:52:37Z

Just a random note, @fabianmcg , that I really appreciate your refactoring. The GPU "pipeline" for the sparse compiler still had a few rough edges and you really smoothed these out! So, thanks!

fabianmcg · 2023-09-14T23:06:34Z

Thank you, happy to help! Also thanks for all the feedback and testing!

This patch adds an NVPTX compilation path that enables JIT compilation on NVIDIA targets. The following modifications were performed: 1. Adding a format field to the GPU object attribute, allowing the translation attribute to use the correct runtime function to load the module. Likewise, a dictionary attribute was added to add any possible extra options. 2. Adding the `createObject` method to `GPUTargetAttrInterface`; this method returns a GPU object from a binary string. 3. Adding the function `mgpuModuleLoadJIT`, which is only available for NVIDIA GPUs, as there is no equivalent for AMD. 4. Adding the CMake flag `MLIR_GPU_COMPILATION_TEST_FORMAT` to specify the format to use during testing.

fabianmcg requested review from joker-eph, grypp and aartbik September 13, 2023 15:45

fabianmcg requested review from a team as code owners September 13, 2023 15:45

fabianmcg removed request for a team September 13, 2023 15:45

llvmbot added mlir:core MLIR Core Infrastructure mlir:llvm mlir:gpu mlir mlir:execution-engine labels Sep 13, 2023

Add a format option to the SparseCompiler pipeline.

05afac0

fabianmcg requested a review from a team as a code owner September 13, 2023 20:44

llvmbot added the mlir:sparse Sparse compiler in MLIR label Sep 13, 2023

joker-eph reviewed Sep 14, 2023

View reviewed changes

Updated the docs & migrated more tests to use MLIR_GPU_COMPILATION_TE…

4ad818f

…ST_FORMAT

fabianmcg requested a review from joker-eph September 14, 2023 14:18

aartbik reviewed Sep 14, 2023

View reviewed changes

aartbik approved these changes Sep 14, 2023

View reviewed changes

Switched %format -> %gpu_compilation_format, arch -> architecture and…

a9c9e7f

… & -> and

joker-eph approved these changes Sep 14, 2023

View reviewed changes

fabianmcg merged commit 5093413 into llvm:main Sep 14, 2023

fabianmcg deleted the ptx-jit branch September 18, 2023 15:47

	/// 1. `isa`: the compiler generates PTX and the runtime JITs the PTX.
	/// 1. `isa`: the compiler generates PTX and the driver JITs the PTX.

[mlir][gpu][NVPTX] Enable NVIDIA GPU JIT compilation path #66220

[mlir][gpu][NVPTX] Enable NVIDIA GPU JIT compilation path #66220

Uh oh!

Conversation

fabianmcg commented Sep 13, 2023

Uh oh!

llvmbot commented Sep 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aartbik commented Sep 13, 2023

Uh oh!

fabianmcg commented Sep 13, 2023

Uh oh!

aartbik commented Sep 13, 2023

Uh oh!

fabianmcg commented Sep 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

grypp commented Sep 13, 2023

Uh oh!

fabianmcg commented Sep 13, 2023

Uh oh!

aartbik commented Sep 14, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fabianmcg Sep 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fabianmcg commented Sep 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aartbik left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aartbik commented Sep 14, 2023

Uh oh!

fabianmcg commented Sep 14, 2023

Uh oh!

Uh oh!

llvmbot commented Sep 13, 2023 •

edited

Loading

fabianmcg commented Sep 13, 2023 •

edited

Loading

fabianmcg Sep 14, 2023 •

edited

Loading

fabianmcg commented Sep 14, 2023 •

edited

Loading