Skip to content

[Offload] Add MPI Proxy Plugin #114574

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion offload/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,7 @@ if(DEFINED LIBOMPTARGET_BUILD_CUDA_PLUGIN OR
message(WARNING "Option removed, use 'LIBOMPTARGET_PLUGINS_TO_BUILD' instead")
endif()

set(LIBOMPTARGET_ALL_PLUGIN_TARGETS amdgpu cuda host)
set(LIBOMPTARGET_ALL_PLUGIN_TARGETS mpi amdgpu cuda host)
set(LIBOMPTARGET_PLUGINS_TO_BUILD "all" CACHE STRING
"Semicolon-separated list of plugins to use: cuda, amdgpu, host or \"all\".")

Expand Down Expand Up @@ -200,8 +200,10 @@ set (LIBOMPTARGET_ALL_TARGETS "${LIBOMPTARGET_ALL_TARGETS} powerpc64-ibm-linux-g
set (LIBOMPTARGET_ALL_TARGETS "${LIBOMPTARGET_ALL_TARGETS} powerpc64-ibm-linux-gnu-LTO")
set (LIBOMPTARGET_ALL_TARGETS "${LIBOMPTARGET_ALL_TARGETS} x86_64-unknown-linux-gnu")
set (LIBOMPTARGET_ALL_TARGETS "${LIBOMPTARGET_ALL_TARGETS} x86_64-unknown-linux-gnu-LTO")
set (LIBOMPTARGET_ALL_TARGETS "${LIBOMPTARGET_ALL_TARGETS} x86_64-unknown-linux-gnu-mpi")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to know both architectures for this? I figured that would be opaque since it's just a separate plugin.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still curious why the MPI plugin needs to target a GPU triple.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not using this as a compilation triple; it's only used to add new tests that invoke mpirun in LIT, and only if the MPI plugin was built. For reference, see this line in lit.cfg. Do you have any suggestion for a cleaner or more appropriate way to handle this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused what this is testing. The target regions get lowered to MPI calls? Where does a GPU come in? The tests basically just make sure that the target region executes properly, we can just make that agnostic to the underlying device that it actually executes on. Honestly a lot of this stuff should be reworked but that's time consuming.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MPI is only used for the communication between the host and the devices but the target region can be lowered either to x86_64 and executed on remote CPUs or lowered to nvptx64 and executed on remote GPUs.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we are testing the support of the x86 and CUDA plugins with remote CPU/GPU using MPI as the communication layer

set (LIBOMPTARGET_ALL_TARGETS "${LIBOMPTARGET_ALL_TARGETS} nvptx64-nvidia-cuda")
set (LIBOMPTARGET_ALL_TARGETS "${LIBOMPTARGET_ALL_TARGETS} nvptx64-nvidia-cuda-LTO")
set (LIBOMPTARGET_ALL_TARGETS "${LIBOMPTARGET_ALL_TARGETS} nvptx64-nvidia-cuda-mpi")
set (LIBOMPTARGET_ALL_TARGETS "${LIBOMPTARGET_ALL_TARGETS} nvptx64-nvidia-cuda-JIT-LTO")
set (LIBOMPTARGET_ALL_TARGETS "${LIBOMPTARGET_ALL_TARGETS} s390x-ibm-linux-gnu")
set (LIBOMPTARGET_ALL_TARGETS "${LIBOMPTARGET_ALL_TARGETS} s390x-ibm-linux-gnu-LTO")
Expand Down Expand Up @@ -363,6 +365,8 @@ set(LIBOMPTARGET_LLVM_LIBRARY_DIR "${LLVM_LIBRARY_DIR}" CACHE STRING
set(LIBOMPTARGET_LLVM_LIBRARY_INTDIR "${LIBOMPTARGET_INTDIR}" CACHE STRING
"Path to folder where intermediate libraries will be output")

set(LIBOMPTARGET_SRC_DIR ${CMAKE_CURRENT_SOURCE_DIR}/libomptarget)

add_subdirectory(tools/offload-tblgen)

# Build offloading plugins and device RTLs if they are available.
Expand Down
75 changes: 74 additions & 1 deletion offload/include/omptarget.h
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,77 @@ inline KernelArgsTy CTorDTorKernelArgs = {1, 0, nullptr, nullptr,
nullptr, nullptr, nullptr, nullptr,
0, {0,0,0}, {1, 0, 0}, {1, 0, 0}, 0};

using llvm::SmallVector;
struct DeviceTy;
class AsyncInfoTy;

/// A class manages private arguments in a target region.
class PrivateArgumentManagerTy {
/// A data structure for the information of first-private arguments. We can
/// use this information to optimize data transfer by packing all
/// first-private arguments and transfer them all at once.
struct FirstPrivateArgInfoTy {
/// Host pointer begin
char *HstPtrBegin;
/// Host pointer end
char *HstPtrEnd;
/// The index of the element in \p TgtArgs corresponding to the argument
int Index;
/// Alignment of the entry (base of the entry, not after the entry).
uint32_t Alignment;
/// Size (without alignment, see padding)
uint32_t Size;
/// Padding used to align this argument entry, if necessary.
uint32_t Padding;
/// Host pointer name
map_var_info_t HstPtrName = nullptr;

FirstPrivateArgInfoTy(int Index, void *HstPtr, uint32_t Size,
uint32_t Alignment, uint32_t Padding,
map_var_info_t HstPtrName = nullptr)
: HstPtrBegin(reinterpret_cast<char *>(HstPtr)),
HstPtrEnd(HstPtrBegin + Size), Index(Index), Alignment(Alignment),
Size(Size), Padding(Padding), HstPtrName(HstPtrName) {}
};

/// A vector of target pointers for all private arguments
SmallVector<void *> TgtPtrs;

/// A vector of information of all first-private arguments to be packed
SmallVector<FirstPrivateArgInfoTy> FirstPrivateArgInfo;
/// Host buffer for all arguments to be packed
SmallVector<char> FirstPrivateArgBuffer;
/// The total size of all arguments to be packed
int64_t FirstPrivateArgSize = 0;

/// A reference to the \p DeviceTy object
DeviceTy &Device;
/// A pointer to a \p AsyncInfoTy object
AsyncInfoTy &AsyncInfo;

// TODO: What would be the best value here? Should we make it configurable?
// If the size is larger than this threshold, we will allocate and transfer it
// immediately instead of packing it.
static constexpr const int64_t FirstPrivateArgSizeThreshold = 1024;

public:
/// Constructor
PrivateArgumentManagerTy(DeviceTy &Dev, AsyncInfoTy &AsyncInfo)
: Device(Dev), AsyncInfo(AsyncInfo) {}

/// Add a private argument
int addArg(void *HstPtr, int64_t ArgSize, int64_t ArgOffset,
bool IsFirstPrivate, void *&TgtPtr, int TgtArgsIndex,
map_var_info_t HstPtrName = nullptr,
const bool AllocImmediately = false);

/// Pack first-private arguments, replace place holder pointers in \p TgtArgs,
/// and start the transfer.
int packAndTransfer(SmallVector<void *> &TgtArgs);

/// Free all target memory allocated for private arguments
int free();
};

/// The libomptarget wrapper around a __tgt_async_info object directly
/// associated with a libomptarget layer device. RAII semantics to avoid
Expand All @@ -136,8 +206,11 @@ class AsyncInfoTy {
/// Synchronization method to be used.
SyncTy SyncType;

PrivateArgumentManagerTy PrivateArgumentManager;

AsyncInfoTy(DeviceTy &Device, SyncTy SyncType = SyncTy::BLOCKING)
: Device(Device), SyncType(SyncType) {}
: Device(Device), SyncType(SyncType),
PrivateArgumentManager(Device, *this) {}
~AsyncInfoTy() { synchronize(); }

/// Implicit conversion to the __tgt_async_info which is used in the
Expand Down
7 changes: 7 additions & 0 deletions offload/libomptarget/PluginManager.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
#include "PluginManager.h"
#include "OffloadPolicy.h"
#include "Shared/Debug.h"
#include "Shared/EnvironmentVar.h"
#include "Shared/Profile.h"
#include "device.h"

Expand Down Expand Up @@ -71,6 +72,12 @@ bool PluginManager::initializePlugin(GenericPluginTy &Plugin) {
if (Plugin.is_initialized())
return true;

// Disable Host Plugin when it is needed
IntEnvar DisableHostPlugin("OMPTARGET_DISABLE_HOST_PLUGIN", 0);
if (DisableHostPlugin.get() && !strcmp(Plugin.getName(), "x86_64")) {
return false;
}

if (auto Err = Plugin.init()) {
[[maybe_unused]] std::string InfoMsg = toString(std::move(Err));
DP("Failed to init plugin: %s\n", InfoMsg.c_str());
Expand Down
Loading
Loading