Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion llvm/include/llvm/Frontend/OpenMP/OMPDeviceConstants.h
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,8 @@ enum OMPTgtExecModeFlags : unsigned char {
OMP_TGT_EXEC_MODE_GENERIC = 1 << 0,
OMP_TGT_EXEC_MODE_SPMD = 1 << 1,
OMP_TGT_EXEC_MODE_GENERIC_SPMD =
OMP_TGT_EXEC_MODE_GENERIC | OMP_TGT_EXEC_MODE_SPMD
OMP_TGT_EXEC_MODE_GENERIC | OMP_TGT_EXEC_MODE_SPMD,
OMP_TGT_EXEC_MODE_SPMD_NO_LOOP = 1 << 2 | OMP_TGT_EXEC_MODE_SPMD
};

} // end namespace omp
Expand Down
12 changes: 11 additions & 1 deletion offload/plugins-nextgen/common/include/PluginInterface.h
Original file line number Diff line number Diff line change
Expand Up @@ -431,6 +431,8 @@ struct GenericKernelTy {
return "Generic";
case OMP_TGT_EXEC_MODE_GENERIC_SPMD:
return "Generic-SPMD";
case OMP_TGT_EXEC_MODE_SPMD_NO_LOOP:
return "SPMD-No-Loop";
}
llvm_unreachable("Unknown execution mode!");
}
Expand Down Expand Up @@ -468,7 +470,8 @@ struct GenericKernelTy {
uint32_t BlockLimitClause[3], uint64_t LoopTripCount,
uint32_t &NumThreads, bool IsNumThreadsFromUser) const;

/// Indicate if the kernel works in Generic SPMD, Generic or SPMD mode.
/// Indicate if the kernel works in Generic SPMD, Generic, No-Loop
/// or SPMD mode.
bool isGenericSPMDMode() const {
return KernelEnvironment.Configuration.ExecMode ==
OMP_TGT_EXEC_MODE_GENERIC_SPMD;
Expand All @@ -483,6 +486,10 @@ struct GenericKernelTy {
bool isBareMode() const {
return KernelEnvironment.Configuration.ExecMode == OMP_TGT_EXEC_MODE_BARE;
}
bool isNoLoopMode() const {
return KernelEnvironment.Configuration.ExecMode ==
OMP_TGT_EXEC_MODE_SPMD_NO_LOOP;
}

/// The kernel name.
std::string Name;
Expand Down Expand Up @@ -1152,6 +1159,9 @@ struct GenericDeviceTy : public DeviceAllocatorTy {
/// deallocated by the allocator.
llvm::SmallVector<DeviceImageTy *> LoadedImages;

/// Return value of OMP_TEAMS_THREAD_LIMIT environment variable
int32_t getOMPTeamsThreadLimit() const { return OMP_TeamsThreadLimit; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not used anymore.


private:
/// Get and set the stack size and heap size for the device. If not used, the
/// plugin can implement the setters as no-op and setting the output
Expand Down
22 changes: 22 additions & 0 deletions offload/plugins-nextgen/common/src/PluginInterface.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -640,6 +640,18 @@ uint32_t GenericKernelTy::getNumThreads(GenericDeviceTy &GenericDevice,
if (ThreadLimitClause[0] > 0 && isGenericMode())
ThreadLimitClause[0] += GenericDevice.getWarpSize();

// Honor OMP_TEAMS_THREAD_LIMIT environment variable and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking a step back, what is the reason for special-casing getNumThreads handling for No-Loop kernels? I believe the existing logic (involving MaxNumThreads and PreferredNumThreads) handles both thread-related envars and OpenMP clauses. Is there a test case for No-Loop that does not work with the existing logic?

The primary change required for No-Loop kernels is making sure that the grid size is appropriate and that is ensured by the change in getNumBlocks. I am wondering whether this special handling in getNumThreads can be removed altogether.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a test case for No-Loop that does not work with the existing logic?
I haven't found and I removed unnecessary code.

// num_threads/thread_limit clause for NoLoop kernel types.
int32_t TeamsThreadLimitEnvVar = GenericDevice.getOMPTeamsThreadLimit();
uint16_t ConstWGSize = GenericDevice.getDefaultNumThreads();
if (isNoLoopMode()) {
if (TeamsThreadLimitEnvVar > 0)
return std::min(static_cast<int32_t>(ConstWGSize),
TeamsThreadLimitEnvVar);
if ((ThreadLimitClause[0] > 0) && (ThreadLimitClause[0] != (uint32_t)-1))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if ((ThreadLimitClause[0] > 0) && (ThreadLimitClause[0] != (uint32_t)-1))
if ((ThreadLimitClause[0] > 0) && (ThreadLimitClause[0] != ~0U))

return std::min(static_cast<uint32_t>(ConstWGSize), ThreadLimitClause[0]);
return ConstWGSize;
}
return std::min(MaxNumThreads, (ThreadLimitClause[0] > 0)
? ThreadLimitClause[0]
: PreferredNumThreads);
Expand All @@ -662,6 +674,16 @@ uint32_t GenericKernelTy::getNumBlocks(GenericDeviceTy &GenericDevice,
return std::min(NumTeamsClause[0], GenericDevice.getBlockLimit());
}

const auto getNumGroupsFromThreadsAndTripCount =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the point of doing a lambda here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed lambda

[](const uint64_t TripCount, const uint32_t NumThreads) {
return ((TripCount - 1) / NumThreads) + 1;
};
if (isNoLoopMode()) {
return LoopTripCount > 0
? getNumGroupsFromThreadsAndTripCount(LoopTripCount, NumThreads)
: 1;
}

uint64_t DefaultNumBlocks = GenericDevice.getDefaultNumBlocks();
uint64_t TripCountNumBlocks = std::numeric_limits<uint64_t>::max();
if (LoopTripCount > 0) {
Expand Down
Loading