-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[mlir] Target Description and Cost Model in MLIR #85141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write If you have received no comments on your PR for a week, you can request a review If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-mlir @llvm/pr-subscribers-mlir-core Author: Niranjan Hasabnis (nhasabni) ChangesThis pull request (PR) demonstrates one example of a target description and cost model in MLIR. See RFC for the context. It is not a complete work by any means, and the main purpose of this PR is to initiate a discussion around this idea. At a high-level, the PR develops a mechanism to read the system description/cost model from a config file specified on the command line (The PR uses JSON format as an example). In case the user does not specify the description file, the cost model is populated with the default values (which could be known properties of the devices (e.g., L1 cache size) or heuristics used in various compiler passes (e.g., tile size for matrix multiplication)). The system description consists of a number of device descriptions for different devices in the system. The PR also demonstrates modifications to a couple of passes to access information from the device-specific cost models. In terms of implementation, Patch is 30.95 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/85141.diff 9 Files Affected:
diff --git a/mlir/include/mlir/IR/MLIRContext.h b/mlir/include/mlir/IR/MLIRContext.h
index 11e5329f43e681..6007f19ed503a3 100644
--- a/mlir/include/mlir/IR/MLIRContext.h
+++ b/mlir/include/mlir/IR/MLIRContext.h
@@ -34,6 +34,7 @@ class MLIRContextImpl;
class RegisteredOperationName;
class StorageUniquer;
class IRUnit;
+class SystemDesc;
/// MLIRContext is the top-level object for a collection of MLIR operations. It
/// holds immortal uniqued objects like types, and the tables used to unique
@@ -240,6 +241,9 @@ class MLIRContext {
/// (attributes, operations, types, etc.).
llvm::hash_code getRegistryHash();
+ /// Get context-specific system description
+ SystemDesc &getSystemDesc();
+
//===--------------------------------------------------------------------===//
// Action API
//===--------------------------------------------------------------------===//
diff --git a/mlir/include/mlir/Support/SystemDesc.h b/mlir/include/mlir/Support/SystemDesc.h
new file mode 100644
index 00000000000000..0784a5183d2061
--- /dev/null
+++ b/mlir/include/mlir/Support/SystemDesc.h
@@ -0,0 +1,514 @@
+//===- SYSTEMDESC.h - class to represent hardware configuration --*- C++
+//-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// Hardware configuration provides commonly used hardware information to
+// different users, such as optimization passes.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef MLIR_SUPPORT_SYSTEMDESC_H
+#define MLIR_SUPPORT_SYSTEMDESC_H
+
+#include <map>
+#include <memory>
+#include <vector>
+
+#include "mlir/IR/MLIRContext.h"
+#include "mlir/Support/FileUtilities.h"
+#include "mlir/Support/LogicalResult.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/Support/Error.h"
+#include "llvm/Support/JSON.h"
+
+/// Sytem description file contains a list of device descriptions that
+/// each describe a device (e.g., CPU, GPU, ASIC, etc.) in the system.
+/// Example:
+/// [
+/// {
+/// "ID": 1,
+/// "TYPE": "CPU",
+/// "DESCRIPTION": "Intel Xeon 8480",
+/// "L1_CACHE_SIZE_IN_BYTES": 8192,
+/// ...
+/// },
+/// {
+///
+/// },
+/// ...
+/// ]
+namespace mlir {
+
+/// Describes the individual device from the system description
+class DeviceDesc {
+public:
+ /// Some typedefs
+ using DeviceID = uint32_t;
+ using DevicePropertyName = std::string;
+ struct DevicePropertyValue {
+ enum Tag { INT, DOUBLE, INT_VECTOR, DOUBLE_VECTOR } tag;
+ struct Data {
+ int64_t iValue;
+ double dValue;
+ std::vector<int64_t> ivValue;
+ std::vector<double> dvValue;
+
+ Data() : iValue(0), dValue(0.0), ivValue({0}), dvValue({0.0}) {}
+ ~Data() {}
+ } data;
+
+ DevicePropertyValue() = default;
+ DevicePropertyValue(const mlir::DeviceDesc::DevicePropertyValue &rhs) {
+ this->tag = rhs.tag;
+ if (this->tag == INT)
+ this->data.iValue = rhs.data.iValue;
+ else if (this->tag == DOUBLE)
+ this->data.dValue = rhs.data.dValue;
+ else if (this->tag == INT_VECTOR)
+ this->data.ivValue = rhs.data.ivValue;
+ else
+ this->data.dvValue = rhs.data.dvValue;
+ }
+ bool operator==(const mlir::DeviceDesc::DevicePropertyValue &rhs) const {
+ return tag == rhs.tag &&
+ ((tag == INT && data.iValue == rhs.data.iValue) ||
+ (tag == DOUBLE && data.dValue == rhs.data.dValue) ||
+ (tag == INT_VECTOR && data.ivValue == rhs.data.ivValue) ||
+ (tag == DOUBLE_VECTOR && data.dvValue == rhs.data.dvValue));
+ }
+ bool operator!=(const mlir::DeviceDesc::DevicePropertyValue &rhs) const {
+ return !(*this == rhs);
+ }
+ };
+ using DevicePropertiesMapTy =
+ std::map<DevicePropertyName, DevicePropertyValue>;
+
+ typedef enum { CPU, GPU, SPECIAL } DeviceType;
+
+ /// Basic constructor
+ DeviceDesc() = delete;
+ DeviceDesc(DeviceID id, DeviceType type) : ID(id), type(type) {}
+ bool operator==(const mlir::DeviceDesc &rhs) const {
+ return ID == rhs.getID() && type == rhs.getType() &&
+ deviceProperties == rhs.getProperties();
+ }
+ bool operator!=(const mlir::DeviceDesc &rhs) const { return !(*this == rhs); }
+
+ /// Type converters
+ static DeviceID strToDeviceID(const std::string &id_str) {
+ llvm::Expected<int64_t> id = llvm::json::parse<int64_t>(id_str);
+ if (!id)
+ llvm::report_fatal_error("Value of \"ID\" is not int");
+ return static_cast<DeviceID>(id.get());
+ }
+ static DeviceType strToDeviceType(const std::string &type_str) {
+ if (type_str == "CPU")
+ return DeviceType::CPU;
+ else if (type_str == "GPU")
+ return DeviceType::GPU;
+ else if (type_str == "SPECIAL")
+ return DeviceType::SPECIAL;
+ llvm::report_fatal_error("Value of \"Type\" is not CPU, GPU, or SPECIAL");
+ }
+
+ /// Set description
+ DeviceDesc &setDescription(std::string desc) {
+ description = desc;
+ return *this;
+ }
+ /// Set property
+ DeviceDesc &setProperty(llvm::StringRef name, int64_t iv) {
+ DevicePropertyValue value;
+ value.tag = DevicePropertyValue::Tag::INT;
+ value.data.iValue = iv;
+ auto inserted =
+ deviceProperties.insert(std::make_pair(std::string(name), value));
+ if (!inserted.second && inserted.first->second != value) {
+ llvm::report_fatal_error("Duplicate device property name found:" + name);
+ }
+ return *this;
+ }
+ DeviceDesc &setProperty(llvm::StringRef name, double dv) {
+ DevicePropertyValue value;
+ value.tag = DevicePropertyValue::Tag::DOUBLE;
+ value.data.dValue = dv;
+ auto inserted =
+ deviceProperties.insert(std::make_pair(std::string(name), value));
+ if (!inserted.second && inserted.first->second != value) {
+ llvm::report_fatal_error("Duplicate device property name found:" + name);
+ }
+ return *this;
+ }
+ DeviceDesc &setProperty(llvm::StringRef name,
+ const std::vector<int64_t> &ivv) {
+ DevicePropertyValue value;
+ value.tag = DevicePropertyValue::Tag::INT_VECTOR;
+ value.data.ivValue = ivv;
+ auto inserted =
+ deviceProperties.insert(std::make_pair(std::string(name), value));
+ if (!inserted.second && inserted.first->second != value) {
+ llvm::report_fatal_error("Duplicate device property name found:" + name);
+ }
+ return *this;
+ }
+ DeviceDesc &setProperty(llvm::StringRef name,
+ const std::vector<double> &idv) {
+ DevicePropertyValue value;
+ value.tag = DevicePropertyValue::Tag::DOUBLE_VECTOR;
+ value.data.dvValue = idv;
+ auto inserted =
+ deviceProperties.insert(std::make_pair(std::string(name), value));
+ if (!inserted.second && inserted.first->second != value) {
+ llvm::report_fatal_error("Duplicate device property name found:" + name);
+ }
+ return *this;
+ }
+ // We provide convenience interface to handle int/float value as string
+ DeviceDesc &setProperty(llvm::StringRef name, const std::string &json_value) {
+ if (json_value.length() > 0 && json_value[0] == '[') {
+ // Parse as an array
+ llvm::Expected<std::vector<int64_t>> ivv =
+ llvm::json::parse<std::vector<int64_t>>(json_value);
+ if (ivv) {
+ *this = this->setProperty(name, ivv.get());
+ return *this;
+ }
+
+ llvm::Expected<std::vector<double>> idv =
+ llvm::json::parse<std::vector<double>>(json_value);
+ if (idv) {
+ *this = this->setProperty(name, idv.get());
+ return *this;
+ }
+ } else {
+ // int64_t because llvm::json has int64_t support (not int)
+ llvm::Expected<int64_t> iv = llvm::json::parse<int64_t>(json_value);
+ if (iv) {
+ *this = this->setProperty(name, iv.get());
+ return *this;
+ }
+
+ // Int type failed, try float now.
+ // double because llvm::json has double support (not float)
+ llvm::Expected<double> dv = llvm::json::parse<double>(json_value);
+ if (dv) {
+ *this = this->setProperty(name, dv.get());
+ return *this;
+ }
+ }
+
+ llvm::report_fatal_error(
+ "Neither int/float/vector value in Device Description: key " + name);
+ }
+
+ /// Get ID
+ DeviceID getID() const { return ID; }
+ /// Get device type
+ DeviceType getType() const { return type; }
+ /// Get device description
+ std::string getDescription() const { return description; }
+ /// Get all of device properties
+ const DevicePropertiesMapTy &getProperties() const {
+ return deviceProperties;
+ }
+ /// Get property value: returns the value of the property with given name, if
+ /// it exists. Otherwise throws exception (TODO)
+ std::optional<int64_t> getPropertyValueAsInt(llvm::StringRef name) const {
+ // check that property with the given name exists
+ auto iter = deviceProperties.find(std::string(name));
+ if (iter == deviceProperties.end()) {
+ return std::nullopt;
+ }
+ // TODO: we can do a tag check here.
+ return iter->second.data.iValue;
+ }
+ std::optional<double> getPropertyValueAsFloat(llvm::StringRef name) const {
+ // check that property with the given name exists
+ auto iter = deviceProperties.find(std::string(name));
+ if (iter == deviceProperties.end()) {
+ return std::nullopt;
+ }
+ // TODO: we can do a tag check here.
+ return iter->second.data.dValue;
+ }
+
+ /// Special functions
+ auto getAllDevicePropertyNames() const {
+ return llvm::map_range(
+ deviceProperties,
+ [](const DevicePropertiesMapTy::value_type &item) -> llvm::StringRef {
+ return item.first;
+ });
+ }
+
+ /// We use a list of key-value pairs to represent a system description in
+ /// JSON.
+ using DeviceDescJSONTy = std::map<std::string, std::string>;
+ static DeviceDesc
+ parseDeviceDescFromJSON(const DeviceDescJSONTy &device_desc);
+
+ // -----------------------------------------------------------------------
+ // CPU specific methods
+ // -----------------------------------------------------------------------
+ static constexpr llvm::StringRef getCPUL1CacheSizeInBytesKeyName() {
+ return "L1_CACHE_SIZE_IN_BYTES";
+ }
+ static constexpr llvm::StringRef getConvAndMatMulBlockingFactorKeyName() {
+ return "CONV_AND_MATMUL_BLOCKING_FACTOR";
+ }
+ static constexpr llvm::StringRef getMatMulTileSizeInBytesKeyName() {
+ return "MATMUL_TILE_SIZE_IN_BYTES";
+ }
+ static constexpr llvm::StringRef getCanonicalizerMaxIterationsKeyName() {
+ return "CANONICALIZER_MAX_ITERS";
+ }
+ static constexpr llvm::StringRef getCanonicalizerMaxNumRewritesKeyName() {
+ return "CANONICALIZER_MAX_NUM_REWRITES";
+ }
+ static constexpr llvm::StringRef getMaxVectorWidthKeyName() {
+ return "MAX_VECTOR_WIDTH";
+ }
+
+ std::optional<int64_t> getL1CacheSizeInBytes() const {
+ if (std::optional<int64_t> v = this->getPropertyValueAsInt(
+ DeviceDesc::getCPUL1CacheSizeInBytesKeyName())) {
+ return v;
+ }
+ return std::nullopt;
+ }
+ void setL1CacheSizeInBytes(int64_t value) {
+ // Temporarily use int override until we support size_t
+ this->setProperty(DeviceDesc::getCPUL1CacheSizeInBytesKeyName(), value);
+ }
+ std::optional<int64_t> getConvAndMatMulBlockingFactor() const {
+ if (std::optional<int64_t> v = this->getPropertyValueAsInt(
+ DeviceDesc::getConvAndMatMulBlockingFactorKeyName())) {
+ return v;
+ }
+ return std::nullopt;
+ }
+ void setConvAndMatMulBlockingFactor(int64_t value) {
+ // Temporarily use int override until we support size_t
+ this->setProperty(DeviceDesc::getConvAndMatMulBlockingFactorKeyName(),
+ value);
+ }
+ std::optional<int64_t> getMatMulTileSizeInBytes() const {
+ if (std::optional<int64_t> v = this->getPropertyValueAsInt(
+ DeviceDesc::getMatMulTileSizeInBytesKeyName())) {
+ return v;
+ }
+ return std::nullopt;
+ }
+ void setMatMulTileSizeInBytes(int64_t value) {
+ // Temporarily use int override until we support size_t
+ this->setProperty(DeviceDesc::getMatMulTileSizeInBytesKeyName(), value);
+ }
+ std::optional<int64_t> getCanonicalizerMaxNumRewrites() const {
+ if (std::optional<int64_t> v = this->getPropertyValueAsInt(
+ DeviceDesc::getCanonicalizerMaxNumRewritesKeyName())) {
+ return v;
+ }
+ return std::nullopt;
+ }
+ void setCanonicalizerMaxNumRewrites(int64_t value) {
+ this->setProperty(DeviceDesc::getCanonicalizerMaxNumRewritesKeyName(),
+ value);
+ }
+ std::optional<int64_t> getCanonicalizerMaxIterations() const {
+ if (std::optional<int64_t> v = this->getPropertyValueAsInt(
+ DeviceDesc::getCanonicalizerMaxIterationsKeyName())) {
+ return v;
+ }
+ return std::nullopt;
+ }
+ void setCanonicalizerMaxIterations(int64_t value) {
+ this->setProperty(DeviceDesc::getCanonicalizerMaxIterationsKeyName(),
+ value);
+ }
+ std::optional<int64_t> getMaxVectorWidth() const {
+ if (std::optional<int64_t> v = this->getPropertyValueAsInt(
+ DeviceDesc::getMaxVectorWidthKeyName())) {
+ return v;
+ }
+ return std::nullopt;
+ }
+ void setMaxVectorWidth(uint32_t value) {
+ this->setProperty(DeviceDesc::getMaxVectorWidthKeyName(),
+ static_cast<int64_t>(value));
+ }
+
+private:
+ /// Unique device ID for every device
+ DeviceID ID;
+
+ /// Type of device
+ DeviceType type;
+
+ /// Some description of the device
+ std::string description;
+
+ /// Dictionary to store rest of the properties
+ DevicePropertiesMapTy deviceProperties;
+};
+
+class SystemDesc {
+public:
+ SystemDesc() = default;
+
+ /// Read and parse system description from JSON file
+ LogicalResult readSystemDescFromJSONFile(llvm::StringRef filename);
+ void writeSystemDescToJSONFile(llvm::StringRef filename);
+
+ /// Insert a new device description
+ SystemDesc &addDeviceDesc(const DeviceDesc &desc) {
+ auto inserted = deviceDescs.insert(std::make_pair(desc.getID(), desc));
+ if (!inserted.second || inserted.first->second != desc) {
+ llvm::report_fatal_error("Duplicate device description for ID:" +
+ llvm::StringRef(std::to_string(desc.getID())));
+ }
+ return *this;
+ }
+ /// Get a device description
+ const DeviceDesc &getDeviceDesc(DeviceDesc::DeviceID deviceID) {
+ auto iter = deviceDescs.find(deviceID);
+ if (iter != deviceDescs.end()) {
+ return iter->second;
+ }
+ llvm::report_fatal_error("Device description with ID not found:" +
+ llvm::StringRef(std::to_string(deviceID)));
+ }
+
+ /// Types
+ using DeviceDescsMapTy = std::map<DeviceDesc::DeviceID, DeviceDesc>;
+
+ // Generic functions: TODO
+ /// Get number of CPU devices in the system
+ static uint32_t getNumCPUDevices() { return 0; }
+ static uint32_t getNumGPUDevices() { return 0; }
+
+private:
+ SystemDesc(const SystemDesc &) = delete;
+ void operator=(const SystemDesc &) = delete;
+
+private:
+ /// Map to store all the device descriptions
+ DeviceDescsMapTy deviceDescs;
+};
+
+// An abstract class that represent device description for an abstract base
+// device
+//
+// This class specifies a set of device properties that could be specified by
+// the default device descriptor that will be used in case a user does not
+// specify its own properties for the device.
+class DefaultBaseDeviceDesc {
+public:
+ virtual ~DefaultBaseDeviceDesc() {}
+ virtual void registerDeviceDesc(MLIRContext *context) const = 0;
+
+ /// -----------------------------------------------------------------------
+ /// Device-agnostic parameters of system description
+ /// -----------------------------------------------------------------------
+ /// Set of common questions asked by various passes
+ // Blocking factor and tile size are typically used by tile/block passes.
+ virtual void setConvAndMatMulBlockingFactor(){};
+ virtual void setMatMulTileSize(){};
+
+ virtual void setCanonicalizerMaxIterations(){};
+ virtual void setCanonicalizerMaxNumRewrites(){};
+
+ /// -----------------------------------------------------------------------
+ /// CPU-specific parameters of system description
+ /// -----------------------------------------------------------------------
+ virtual void setL1CacheSizeInBytes(){};
+
+ /// -----------------------------------------------------------------------
+ /// GPU-specific parameters of system description
+ /// -----------------------------------------------------------------------
+ // Used by Conversion/AMDGPUToROCDL/AMDGPUToROCDL.cpp#L52
+ virtual void setMaxVectorWidth(){};
+};
+
+// Class that represent device description for a typical CPU device
+class DefaultCPUDeviceDesc : public DefaultBaseDeviceDesc {
+public:
+ // We use default ID of 0 because we are expecting to have only one device so
+ // far. Not heterogeneous setup.
+ DefaultCPUDeviceDesc()
+ : cpu_device_desc(DeviceDesc(/* id */ 0, DeviceDesc::CPU)) {
+ // Register all system properties
+ this->setL1CacheSizeInBytes();
+ this->setConvAndMatMulBlockingFactor();
+ this->setMatMulTileSize();
+ this->setCanonicalizerMaxNumRewrites();
+ this->setCanonicalizerMaxIterations();
+ }
+
+ ~DefaultCPUDeviceDesc() {}
+
+ void registerDeviceDesc(MLIRContext *context) const override {
+ context->getSystemDesc().addDeviceDesc(cpu_device_desc);
+ }
+
+ // -------------------------------------------------------------------------
+ // CPU-specific properties
+ // -------------------------------------------------------------------------
+
+ void setL1CacheSizeInBytes() override {
+ cpu_device_desc.setL1CacheSizeInBytes(8192);
+ }
+ void setConvAndMatMulBlockingFactor() override {
+ cpu_device_desc.setConvAndMatMulBlockingFactor(32);
+ }
+ void setMatMulTileSize() override {
+ cpu_device_desc.setMatMulTileSizeInBytes(32);
+ }
+ void setCanonicalizerMaxNumRewrites() override {
+ // taken from include/mlir/Transforms/Passes.td
+ cpu_device_desc.setCanonicalizerMaxNumRewrites(-1);
+ }
+ void setCanonicalizerMaxIterations() override {
+ // taken from include/mlir/Transforms/Passes.td
+ cpu_device_desc.setCanonicalizerMaxIterations(10);
+ }
+
+private:
+ DeviceDesc cpu_device_desc;
+};
+
+class DefaultGPUDeviceDesc : public DefaultBaseDeviceDesc {
+public:
+ // We use default ID of 0 because we are expecting to have only one device so
+ // far. Not heterogeneous setup.
+ DefaultGPUDeviceDesc()
+ : gpu_device_desc(DeviceDesc(/* id */ 1, DeviceDesc::GPU)) {
+ // GPU device supports default value for MaxVectorWidth so far.
+ this->setMaxVectorWidth();
+ }
+
+ ~DefaultGPUDeviceDesc() {}
+
+ void registerDeviceDesc(MLIRContext *context) const override {
+ context->getSystemDesc().addDeviceDesc(gpu_device_desc);
+ }
+
+ // -------------------------------------------------------------------------
+ // GPU-specific properties
+ // -------------------------------------------------------------------------
+
+ void setMaxVectorWidth() override {
+ // Conversion/AMDGPUToROCDL/AMDGPUToROCDL.cpp#L52
+ gpu_device_desc.setMaxVectorWidth(128);
+ }
+
+private:
+ DeviceDesc gpu_device_desc;
+};
+
+} // namespace mlir
+#endif // MLIR_SUPPORT_SYSTEMDESC_H
diff --git a/mlir/include/mlir/Tools/mlir-opt/MlirOptMain.h b/mlir/include/mlir/Tools/mlir-opt/MlirOptMain.h
index 6e90fad1618d21..5022567a5f256b 100644
--- a/mlir/include/mlir/Tools/mlir-opt/MlirOptMain.h
+++ b/mlir/include/mlir/Tools/mlir-opt/MlirOptMain.h
@@ -176,6 +176,11 @@ class MlirOptMainConfig {
/// Reproducer file generation (no crash required).
StringRef getReproducerFilename() const { return generateReproducerFileFlag; }
+ /// System description file
+ StringRef getSystemDescriptionFileName() const {
+ return systemDescriptionFileFlag;
+ }
+
protected:
/// Allow operation with no registered dialects.
/// This option is for convenience during testing only and discouraged in
@@ -234,6 +239,9 @@ class MlirOptMainConfig {
/// The rep...
[truncated]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll chime in out of the blue here because I spend a lot of time in my day job dealing with various forms of system and device descriptions (Devicetree, IP-XACT, etc.) . I won't weigh in on how this could ultimately fit into MLIR, just some thoughts in a vacuum from someone who is interested in this kind of stuff.
IMHO the trick here is picking a point in the general <-> specific spectrum and doing it well. This seems to take a stance middle-left of that spectrum where the DeviceDesc has a handful of required constructs (id, type, description), and a general set of key value pairs (property). It sort of reminds me of Devicetree.
I've seen and used setups similar to this, and it can work well, if you can do interesting things in general with the required constructs, and you have flexibility in the optional properties. You can layer more domain specific constructs on top, which take away optionality and "stringly" typed field, like the DefaultBaseDeviceDesc. But I think getting agreement on such definitions is challenging. Even for "CPU" there are wildly different architectures and microarchitectures. I think in this regard, the current proposal is tending a little towards the specific end of the spectrum, and I think there are challenges here to get a specific model everyone can agree on.
In the CIRCT project, we ended up going as far as we could towards the general side. We built an IR for representing a simple "object oriented" language with classes, fields, objects, and references, an "interpreter" to elaborate an object graph, and APIs to query an elaborated object graph. It is up to the frontend and backend tooling to provide APIs to end users which emit the correct IR, elaborate the object graph, and query the elaborated model. Some more info here and a quick talk here if you're curious.
Separately from the above, some other random thoughts...
Have you thought about references between devices? E.g., a cache hierarchy where L1 has a reference to L2, etc.
Have you looked at MDL? https://discourse.llvm.org/t/rfc-landing-mdl-in-llvm-codegen/76507. As someone who is just curious about these things, I also vaguely follow that effort. It is clearly the product of a lot of thought and experience.
In general, thanks for pushing on this, I think it's a really interesting topic.
Thanks @joker-eph for the comments. I will update my PR soon. Thanks @mikeurbach for your detailed thoughts/comments as well. I will go through the links that you shared. I am sure there are several points to consider here in terms of appropriate set of APIs. We are currently using this PoC in our TPP-MLIR project to replace device properties/pass-specific heuristics used for optimizations (link). As we will support more devices, we will need to evolve this PoC as well. Our thought was to get early feedback on this effort and evolve as needed. |
You can test this locally with the following command:git-clang-format --diff ece2903ce730392e5236d27f1f387fa8067fcb1b 34f7612a2e1e0a3ba19ff71b54059fa4975bd861 -- mlir/include/mlir/Support/SystemDesc.h mlir/lib/Support/SystemDesc.cpp mlir/include/mlir/IR/MLIRContext.h mlir/include/mlir/Tools/mlir-opt/MlirOptMain.h mlir/lib/Conversion/AMDGPUToROCDL/AMDGPUToROCDL.cpp mlir/lib/IR/MLIRContext.cpp mlir/lib/Tools/mlir-opt/MlirOptMain.cpp mlir/lib/Transforms/Canonicalizer.cpp View the diff from clang-format here.diff --git a/mlir/include/mlir/Support/SystemDesc.h b/mlir/include/mlir/Support/SystemDesc.h
index 2c03edb118..5ff9b6421a 100644
--- a/mlir/include/mlir/Support/SystemDesc.h
+++ b/mlir/include/mlir/Support/SystemDesc.h
@@ -382,22 +382,22 @@ public:
/// -----------------------------------------------------------------------
/// Set of common questions asked by various passes
// Blocking factor and tile size are typically used by tile/block passes.
- virtual void setConvAndMatMulBlockingFactor(MLIRContext *context){};
- virtual void setMatMulTileSize(MLIRContext *context){};
+ virtual void setConvAndMatMulBlockingFactor(MLIRContext *context) {};
+ virtual void setMatMulTileSize(MLIRContext *context) {};
- virtual void setCanonicalizerMaxIterations(MLIRContext *context){};
- virtual void setCanonicalizerMaxNumRewrites(MLIRContext *context){};
+ virtual void setCanonicalizerMaxIterations(MLIRContext *context) {};
+ virtual void setCanonicalizerMaxNumRewrites(MLIRContext *context) {};
/// -----------------------------------------------------------------------
/// CPU-specific parameters of system description
/// -----------------------------------------------------------------------
- virtual void setL1CacheSizeInBytes(MLIRContext *context){};
+ virtual void setL1CacheSizeInBytes(MLIRContext *context) {};
/// -----------------------------------------------------------------------
/// GPU-specific parameters of system description
/// -----------------------------------------------------------------------
// Used by Conversion/AMDGPUToROCDL/AMDGPUToROCDL.cpp#L52
- virtual void setMaxVectorWidth(MLIRContext *context){};
+ virtual void setMaxVectorWidth(MLIRContext *context) {};
};
// Class that represent device description for a typical CPU device
diff --git a/mlir/lib/Support/SystemDesc.cpp b/mlir/lib/Support/SystemDesc.cpp
index 0527490fd0..c5e55493e6 100644
--- a/mlir/lib/Support/SystemDesc.cpp
+++ b/mlir/lib/Support/SystemDesc.cpp
@@ -26,16 +26,14 @@ impl::SystemDescJSONConfigParser::buildDeviceDescFromConfigFile(
// ID and Type are mandatory fields.
auto iter = device_desc_in_json.find("ID");
if (iter == device_desc_in_json.end()) {
- llvm::errs() << "\"ID\" key missing in Device Description"
- << "\n";
+ llvm::errs() << "\"ID\" key missing in Device Description" << "\n";
return std::nullopt;
}
DeviceDesc::DeviceID id = DeviceDesc::strToDeviceID(iter->second);
iter = device_desc_in_json.find("Type");
if (iter == device_desc_in_json.end()) {
- llvm::errs() << "\"Type\" key missing in Device Description"
- << "\n";
+ llvm::errs() << "\"Type\" key missing in Device Description" << "\n";
return std::nullopt;
}
DeviceDesc::DeviceType type = DeviceDesc::strToDeviceType(iter->second);
@@ -77,8 +75,7 @@ impl::SystemDescJSONConfigParser::buildSystemDescFromConfigFile(
using SystemDescJSONTy = std::vector<DeviceDescJSONTy>;
SystemDescJSONTy system_desc_in_json;
if (!json::fromJSON(*parsed, system_desc_in_json, NullRoot)) {
- llvm::errs() << "Invalid System Description in JSON"
- << "\n";
+ llvm::errs() << "Invalid System Description in JSON" << "\n";
return std::nullopt;
}
diff --git a/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp b/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp
index f71ea8d456..99dc5eb910 100644
--- a/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp
+++ b/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp
@@ -31,8 +31,8 @@
#include "mlir/Pass/Pass.h"
#include "mlir/Pass/PassManager.h"
#include "mlir/Support/FileUtilities.h"
-#include "mlir/Support/SystemDesc.h"
#include "mlir/Support/LogicalResult.h"
+#include "mlir/Support/SystemDesc.h"
#include "mlir/Support/Timing.h"
#include "mlir/Support/ToolUtilities.h"
#include "mlir/Tools/ParseUtilities.h"
|
Very excited to see the progress of adding target description and cost model interface into MLIR. I have below questions and comments, most of them relating to CPU devices:
|
Thanks @ZhennanQin for the questions. Please see my answers below.
I think the initialization of a target description looks orthogonal to this PR. But we do envision that there could be multiple approaches for initialization: 1) pre-existing target descriptions that are written by system designers or could be auto-generated using system details (e.g., ark.intel.com, cpuid could be useful here), or 2) as the number of possible targets are considerably more than the uarchs (an approach followed by LLVM TTI may not be feasible), we can provide reasonable defaults to common target parameters and then provide ways to override them using target-specific values. Let us know if you have other thoughts.
Yes, this PR is laying out very basic infrastructure to get things going and provide a couple of examples of system descriptions that provide parameters used by existing MLIR passes. We will definitely need further details such as arch/ISA in addition to CPU as a device so that passes have the appropriate context to consume the target description information. That being said, we do not want to duplicate information provided by LLVM TTI already.
Yes, I think compile-time configurations will not be part of the target descriptions. As you mentioned, target description describes the underlying system.
Agree. |
… VarDecls (llvm#90948) With the commit d530894, we now preserve the initializer for invalid decls with the recovery-expr. However there is a chance that the original init expr is a typo-expr, we should not preserve it in the final AST, as typo-expr is an internal AST node. We should use the one after the typo correction. This is spotted by a clangd hover crash on the testcase.
Cycle is associated with construct-names and not labels. Change name of a few variables to reflect this. Also add appropriate comment to describe the else case of error checking.
Only VRs should use $noreg, this GPR was accidentally changed in d392520
- Instead of comparing the identity of the `PointerValue`s, compare the underlying `StorageLocation`s. - If the `StorageLocation`s are the same, return a definite "true" as the result of the comparison. Before, if the `PointerValue`s were different, we would return an atom, even if the storage locations themselves were the same. - If the `StorageLocation`s are different, return an atom (as before). Pointers that have different storage locations may still alias, so we can't return a definite "false" in this case. The application-level gains from this are relatively modest. For the Crubit nullability check running on an internal codebase, this change reduces the number of functions on which the SAT solver times out from 223 to 221; the number of "pointer expression not modeled" errors reduces from 3815 to 3778. Still, it seems that the gain in precision is generally worthwhile. @Xazax-hun inspired me to think about this with his [comments](llvm#73860 (review)) on a different PR.
llvm#90474) [llvm-mca] Abort on parse error without -skip-unsupported-instructions Prior to this patch, llvm-mca would continue executing after parse errors. These errors can lead to some confusion since some analysis results are printed on the standard output, and they're printed after the errors, which could otherwise be easy to miss. However it is still useful to be able to continue analysis after errors; so extend the recently added -skip-unsupported-instructions to support this. Two tests which have parse errors for some of the 'RUN' branches are updated to use -skip-unsupported-instructions so they can remain as-is. Add a description of -skip-unsupported-instructions to the llvm-mca command guide, and add it to the llvm-mca --help output: ``` --skip-unsupported-instructions=<value> - Force analysis to continue in the presence of unsupported instructions =none - Exit with an error when an instruction is unsupported for any reason (default) =lack-sched - Skip instructions on input which lack scheduling information =parse-failure - Skip lines on the input which fail to parse for any reason =any - Skip instructions or lines on input which are unsupported for any reason ``` Tests within this patch are intended to cover each of the cases. Reason | Flag | Comment --------------|------|------- none | none | Usual case, existing test suite lack-sched | none | Advises user to use -skip-unsupported-instructions=lack-sched, tested in llvm/test/tools/llvm-mca/X86/BtVer2/unsupported-instruction.s parse-failure | none | Advises user to use -skip-unsupported-instructions=parse-failure, tested in llvm/test/tools/llvm-mca/bad-input.s any | none | (N/A, covered above) lack-sched | any | Continues, prints warnings, tested in llvm/test/tools/llvm-mca/X86/BtVer2/unsupported-instruction.s parse-failure | any | Continues, prints errors, tested in llvm/test/tools/llvm-mca/bad-input.s lack-sched | parse-failure | Advises user to use -skip-unsupported-instructions=lack-sched, tested in llvm/test/tools/llvm-mca/X86/BtVer2/unsupported-instruction.s parse-failure | lack-sched | Advises user to use -skip-unsupported-instructions=parse-failure, tested in llvm/test/tools/llvm-mca/bad-input.s none | * | This would be any test case with skip-unsupported-instructions, coverage added in llvm/test/tools/llvm-mca/X86/BtVer2/simple-test.s any | * | (Logically covered by the other cases)
This patch updates the unittests that can be changed to the new format after llvm#89799 (which changes the default format everywhere) to avoid a loss in coverage for the (new) default debug info format.
When using a hard-float ABI for a target without FP registers, it's not possible to correctly generate code for functions with arguments which must be passed in floating-point registers. This is diagnosed in CodeGen instead of Sema, to more closely match GCC's behaviour around inline functions, which is relied on by the Linux kernel. Previously, this only checked function signatures as they were code-generated, but this missed some cases: * Calls to functions not defined in this translation unit. * Calls through function pointers. * Calls to variadic functions, where the variadic arguments have a floating-point type. This adds checks to function calls, as well as definitions, so that these cases are correctly diagnosed.
…vm#88845) Fixes llvm#47549 `lldb-server`'s platform mode seems to have an issue with its `--min-gdbserver-port` `--max-gdbserver-port` flags (and probably the `--gdbserver-port` flag, but I didn't test it). How the platform code seems to work is that it listens on a port, and whenever there's an incoming connection, it forks the process to handle the connection. To handle the port flags, the main process uses an instance of the helper class `GDBRemoteCommunicationServerPlatform::PortMap`, that can be configured and track usages of ports. The child process handling the platform connection, can then use the port map to allocate a port for the gdb-server connection it will make (this is another process it spawns). However, in the current code, this works only once. After the first connection is handled by forking a child process, the main platform listener code loops around, and then 'forgets' about the port map. This is because this code: ```cpp GDBRemoteCommunicationServerPlatform platform( acceptor_up->GetSocketProtocol(), acceptor_up->GetSocketScheme()); if (!gdbserver_portmap.empty()) { platform.SetPortMap(std::move(gdbserver_portmap)); } ``` is within the connection listening loop. This results in the `gdbserver_portmap` being moved into the platform object at the beginning of the first iteration of the loop, but on the second iteration, after the first fork, the next instance of the platform object will not have its platform port mapped. The result of this bug is that subsequent connections to the platform, when spawning the gdb-remote connection, will be supplied a random port - which isn't bounded by the `--min-gdbserver-port` and `--max-gdbserver--port` parameters passed in by the user. This PR fixes this issue by having the port map be maintained by the parent platform listener process. On connection, the listener allocates a single available port from the port map, associates the child process pid with the port, and lets the connection handling child use that single port number. Additionally, when cleaning up child processes, the main listener process tracks the child that exited to deallocate the previously associated port, so it can be reused for a new connection.
…lies (llvm#90635) This patch limits the icmp_i16(x,c) -> icmp_i32(ext(x),ext(c)) fold to CPUs that aren't known to have fast handling for length-changing prefixes for imm16 operands. We are always assuming that 66/67h length-changing prefixes cause severe stalls and we should always extend imm16 operands and use a i32 icmp instead, the only exception being Intel Bonnell CPUs. Agner makes this clear (see microarchitecture.pdf) that there are no stalls for any of the Intel Atom family (at least as far as Tremont - not sure about Gracemont or later). This is also true for AMD Bobcat/Jaguar and Ryzen families. Recent performance Intel CPUs are trickier - Core2/Nehalem and earlier could have a 6-11cy stall, while SandyBridge onwards this is reduced to 3cy or less. I'm not sure if we should accept this as fast or not, we only use this flag for the icmp_i16 case, so that might be acceptable? If so, we should add this to x86-64-v3/v4 tuning as well. Part of llvm#90355 + llvm#62952
Vectorized ADDRSPACECASTs were not supported by the type legalizer. This patch adds the support for: - splitting the vector result: <2 x ptr> => 2 x <1 x ptr> - scalarization: <1 x ptr> => ptr - widening: <3 x ptr> => <4 x ptr> This is all exercised by the added NVPTX tests.
As detailed here: https://github.com/InstLatx64/InstLatX64_Demo/blob/master/GFNI_Demo.h We can use the gf2p8affine instruction to lower byte shifts/rotates as well as the existing bitreverse case. Based off the original patch here: https://reviews.llvm.org/D137026
…sic IDs (llvm#90854) I've refactored the code to genericise the implementation to better allow for target specific constrained fp intrinsics.
This reverts commit 0bbada9. The update of Clang in the CI broke due to the new LLVM version naming. It needs to look for the clang 18.1 package instead of 18. Since it couldn't find 18 it used 17 as fallback. This gives ODR violations which caused the output for the module test to be wrong. (It didn't crash which would be a lot more obvious to debug.) The clang-tidy selection was fixed by llvm#81362. That patch also makes clang-19 work with clang-tidy-19 out of the box. The time-out have not been addressed; that is a CI issue and not an issue with this test. They run in 200-ish s so they are slow but far below the 1500s threshold. (Due to a dependency in this test it can't be split in multiple tests.)
…ctor (llvm#76615) This patch adds constexpr support for __builtin_shufflevector and __builtin_convertvector. NB: the changes went in under 2903df0 , this commit is just github PR bookkeepping.
Fixes "error: private field 'FileMgr' is not used" introduced by 11a6799
This pull request (PR) demonstrates one example of a target description and cost model in MLIR. See [RFC](https://discourse.llvm.org/t/rfc-target-description-and-cost-model-in-mlir/76990) for the context. **It is not a complete work by any means, and the main purpose of this PR is to initiate a discussion around this idea.** At a high-level, the PR develops a mechanism to read the system description/cost model from a config file specified on the command line (The PR uses JSON format as an example). In case the user does not specify the description file, the cost model is populated with the default values (which could be known properties of the devices (e.g., L1 cache size) or heuristics used in various compiler passes (e.g., tile size for matrix multiplication)). The system description consists of a number of device descriptions for different devices in the system. The PR also demonstrates modifications to a couple of passes to access information from the device-specific cost models. In terms of implementation, `SystemDesc` class implements methods to parse a system description file (and build a set of device descriptions) and to add/get device descriptions. `DeviceDesc` class implements methods to set/get fields of a device description (which is defined as a set of key-value pairs) and a set of APIs to get values of commonly-used device properties (such as vector width) or second-order info derived by the MLIR passes from those device properties (e.g., tile size). **Currently, these APIs are added in an ad-hoc manner and better design is possible.** The PR also provides an extendable mechanism (in the form of `DefaultBaseDeviceDesc`) to specify default values of various device properties as well as to derive second-order info from the device properties.
…dAttrsList; moving parsers out of core class This PR makes following changes: 1. Replaces tagged union from DeviceDesc by NamedAttrsList 2. Moves config file parsers out of core SystemDesc and DeviceDesc classes Not addressing system description in context comment yet as it it under discussion.
This reverts commit 0bbada9. The update of Clang in the CI broke due to the new LLVM version naming. It needs to look for the clang 18.1 package instead of 18. Since it couldn't find 18 it used 17 as fallback. This gives ODR violations which caused the output for the module test to be wrong. (It didn't crash which would be a lot more obvious to debug.) The clang-tidy selection was fixed by llvm#81362. That patch also makes clang-19 work with clang-tidy-19 out of the box. The time-out have not been addressed; that is a CI issue and not an issue with this test. They run in 200-ish s so they are slow but far below the 1500s threshold. (Due to a dependency in this test it can't be split in multiple tests.)
…ctor (llvm#76615) This patch adds constexpr support for __builtin_shufflevector and __builtin_convertvector. NB: the changes went in under 2903df0 , this commit is just github PR bookkeepping.
Fixes "error: private field 'FileMgr' is not used" introduced by 11a6799
…aces As an example of attributes supported by this commit: ``` module attributes { dlti.tsd_spec = #dlti.tsd_spec< #dlti.tdd_spec<#dlti.dl_entry<"dlti.device_id", 0: ui32>, #dlti.dl_entry<"dlti.device_type", "CPU">, #dlti.dl_entry<"dlti.canonicalizer_max_iterations", 100 : i32>, #dlti.dl_entry<"dlti.canonicalizer_max_num_rewrites", -5 : i32>>, #dlti.tdd_spec< #dlti.dl_entry<"dlti.device_id", 1: ui32>, #dlti.dl_entry<"dlti.device_type", "GPU">, #dlti.dl_entry<"dlti.max_vector_op_width", 64 : ui32>>, #dlti.tdd_spec< #dlti.dl_entry<"dlti.device_id", 2: ui32>, #dlti.dl_entry<"dlti.device_type", "XPU">>> } ```
Rebase seems to have messed up with my branch. Sorry for the trouble - working on fixing it. Thanks. |
Closing this in favour of its updated version: #91670 |
and Interfaces. This is a newer implementation of PR #85141 and [RFC](https://discourse.llvm.org/t/rfc-target-description-and-cost-model-in-mlir/76990) by considering reviews and comments on the original PR. As an example of attributes supported by this commit: ``` module attributes { dlti.target_system_spec = #dlti.target_device_spec< #dlti.dl_entry<"dlti.device_id", 0: ui32>, #dlti.dl_entry<"dlti.device_type", "CPU">, #dlti.dl_entry<"dlti.L1_cache_size_in_bytes", 8192 : ui32>>, #dlti.target_device_spec < #dlti.dl_entry<"dlti.device_id", 1: ui32>, #dlti.dl_entry<"dlti.device_type", "GPU">, #dlti.dl_entry<"dlti.max_vector_op_width", 64 : ui32>>, #dlti.target_device_spec < #dlti.dl_entry<"dlti.device_id", 2: ui32>, #dlti.dl_entry<"dlti.device_type", "XPU">>> } ```
…92138) and Interfaces. This is a newer implementation of PR llvm#85141 and [RFC](https://discourse.llvm.org/t/rfc-target-description-and-cost-model-in-mlir/76990) by considering reviews and comments on the original PR. As an example of attributes supported by this commit: ``` module attributes { dlti.target_system_spec = #dlti.target_device_spec< #dlti.dl_entry<"dlti.device_id", 0: ui32>, #dlti.dl_entry<"dlti.device_type", "CPU">, #dlti.dl_entry<"dlti.L1_cache_size_in_bytes", 8192 : ui32>>, #dlti.target_device_spec < #dlti.dl_entry<"dlti.device_id", 1: ui32>, #dlti.dl_entry<"dlti.device_type", "GPU">, #dlti.dl_entry<"dlti.max_vector_op_width", 64 : ui32>>, #dlti.target_device_spec < #dlti.dl_entry<"dlti.device_id", 2: ui32>, #dlti.dl_entry<"dlti.device_type", "XPU">>> } ```
This pull request (PR) demonstrates one example of a target description and cost model in MLIR. See RFC for the context. It is not a complete work by any means, and the main purpose of this PR is to initiate a discussion around this idea.
At a high-level, the PR develops a mechanism to read the system description/cost model from a config file specified on the command line (The PR uses JSON format as an example). In case the user does not specify the description file, the cost model is populated with the default values (which could be known properties of the devices (e.g., L1 cache size) or heuristics used in various compiler passes (e.g., tile size for matrix multiplication)). The system description consists of a number of device descriptions for different devices in the system. The PR also demonstrates modifications to a couple of passes to access information from the device-specific cost models.
In terms of implementation,
SystemDesc
class implements methods to parse a system description file (and build a set of device descriptions) and to add/get device descriptions.DeviceDesc
class implements methods to set/get fields of a device description (which is defined as a set of key-value pairs) and a set of APIs to get values of commonly-used device properties (such as vector width) or second-order info derived by the MLIR passes from those device properties (e.g., tile size). Currently, these APIs are added in an ad-hoc manner and better design is possible. The PR also provides an extendable mechanism (in the form ofDefaultBaseDeviceDesc
) to specify default values of various device properties as well as to derive second-order info from the device properties.