-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[AMDGPU] Document amdgpu-as in AMDGPUUsage #94335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add a section about fence & address spaces that covers amdgpu-as.
@llvm/pr-subscribers-backend-amdgpu Author: Pierre van Houtryve (Pierre-vh) ChangesAdd a section about fence & address spaces that covers amdgpu-as. Patch is 46.14 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/94335.diff 1 Files Affected:
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index bb6751038fc9c..7510c4ae644c6 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -5969,6 +5969,31 @@ following sections:
* :ref:`amdgpu-amdhsa-memory-model-gfx942`
* :ref:`amdgpu-amdhsa-memory-model-gfx10-gfx11`
+.. _amdgpu-fence-as:
+
+Fence and Address Spaces
+++++++++++++++++++++++++++++++
+
+LLVM fences do not have address space information, thus, fence
+codegen usually needs to be conservative and fence all address spaces.
+
+In the case of OpenCL, where synchronization can only happen in the
+same address space, this can result in extra unnecessary waits.
+For instance, a fence that is supposed to only target local memory will
+also have to wait on all global memory operations, which is unnecessary.
+
+:doc:`Memory Model Relaxation Annotations <MemoryModelRelaxationAnnotations>` can
+be used as an optimization hint for fences to solve this problem.
+The AMDGPU backend handles the following tags on fences:
+
+- ``amdgpu-as:local`` - fence only the local address space
+- ``amdgpu-as:global``- fence only the global address space
+
+This can avoid unnecessary waiting in many cases. However, those annotations are
+attached using metadata, which can always be dropped by the optimizer when it
+inhibits optimizations, and the cost of not performing that optimization is
+greater than the cost of dropping the metadata.
+
.. _amdgpu-amdhsa-memory-model-gfx6-gfx9:
Memory Model GFX6-GFX9
@@ -6306,21 +6331,9 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx6-gfx9-table`.
- If OpenCL and
address space is
not generic, omit.
- - However, since LLVM
- currently has no
- address space on
- the fence need to
- conservatively
- always generate. If
- fence had an
- address space then
- set to address
- space of OpenCL
- fence flag, or to
- generic if both
- local and global
- flags are
- specified.
+ - See :ref:`amdgpu-fence-as` for
+ more details on fencing specific
+ address spaces.
- Must happen after
any preceding
local/generic load
@@ -6352,14 +6365,9 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx6-gfx9-table`.
address space is
not generic, omit
lgkmcnt(0).
- - However, since LLVM
- currently has no
- address space on
- the fence need to
- conservatively
- always generate
- (see comment for
- previous fence).
+ - See :ref:`amdgpu-fence-as` for
+ more details on fencing specific
+ address spaces.
- Could be split into
separate s_waitcnt
vmcnt(0) and
@@ -6562,21 +6570,9 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx6-gfx9-table`.
- If OpenCL and
address space is
not generic, omit.
- - However, since LLVM
- currently has no
- address space on
- the fence need to
- conservatively
- always generate. If
- fence had an
- address space then
- set to address
- space of OpenCL
- fence flag, or to
- generic if both
- local and global
- flags are
- specified.
+ - See :ref:`amdgpu-fence-as` for
+ more details on fencing specific
+ address spaces.
- Must happen after
any preceding
local/generic
@@ -6612,21 +6608,9 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx6-gfx9-table`.
address space is
local, omit
vmcnt(0).
- - However, since LLVM
- currently has no
- address space on
- the fence need to
- conservatively
- always generate. If
- fence had an
- address space then
- set to address
- space of OpenCL
- fence flag, or to
- generic if both
- local and global
- flags are
- specified.
+ - See :ref:`amdgpu-fence-as` for
+ more details on fencing specific
+ address spaces.
- Could be split into
separate s_waitcnt
vmcnt(0) and
@@ -6956,14 +6940,9 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx6-gfx9-table`.
address space is
not generic, omit
lgkmcnt(0).
- - However, since LLVM
- currently has no
- address space on
- the fence need to
- conservatively
- always generate
- (see comment for
- previous fence).
+ - See :ref:`amdgpu-fence-as` for
+ more details on fencing specific
+ address spaces.
- Could be split into
separate s_waitcnt
vmcnt(0) and
@@ -7904,21 +7883,9 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`.
address space is
local, omit
vmcnt(0).
- - However, since LLVM
- currently has no
- address space on
- the fence need to
- conservatively
- always generate. If
- fence had an
- address space then
- set to address
- space of OpenCL
- fence flag, or to
- generic if both
- local and global
- flags are
- specified.
+ - See :ref:`amdgpu-fence-as` for
+ more details on fencing specific
+ address spaces.
- s_waitcnt vmcnt(0)
must happen after
any preceding
@@ -7977,14 +7944,9 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`.
address space is
not generic, omit
lgkmcnt(0).
- - However, since LLVM
- currently has no
- address space on
- the fence need to
- conservatively
- always generate
- (see comment for
- previous fence).
+ - See :ref:`amdgpu-fence-as` for
+ more details on fencing specific
+ address spaces.
- Could be split into
separate s_waitcnt
vmcnt(0) and
@@ -8055,14 +8017,9 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`.
address space is
not generic, omit
lgkmcnt(0).
- - However, since LLVM
- currently has no
- address space on
- the fence need to
- conservatively
- always generate
- (see comment for
- previous fence).
+ - See :ref:`amdgpu-fence-as` for
+ more details on fencing specific
+ address spaces.
- Could be split into
separate s_waitcnt
vmcnt(0) and
@@ -8430,21 +8387,9 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`.
address space is
local, omit
vmcnt(0).
- - However, since LLVM
- currently has no
- address space on
- the fence need to
- conservatively
- always generate. If
- fence had an
- address space then
- set to address
- space of OpenCL
- fence flag, or to
- generic if both
- local and global
- flags are
- specified.
+ - See :ref:`amdgpu-fence-as` for
+ more details on fencing specific
+ address spaces.
- s_waitcnt vmcnt(0)
must happen after
any preceding
@@ -8490,21 +8435,9 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`.
address space is
local, omit
vmcnt(0).
- - However, since LLVM
- currently has no
- address space on
- the fence need to
- conservatively
- always generate. If
- fence had an
- address space then
- set to address
- space of OpenCL
- fence flag, or to
- generic if both
- local and global
- flags are
- specified.
+ - See :ref:`amdgpu-fence-as` for
+ more details on fencing specific
+ address spaces.
- Could be split into
separate s_waitcnt
vmcnt(0) and
@@ -8572,21 +8505,9 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`.
address space is
local, omit
vmcnt(0).
- - However, since LLVM
- currently has no
- address space on
- the fence need to
- conservatively
- always generate. If
- fence had an
- address space then
- set to address
- space of OpenCL...
[truncated]
|
llvm/docs/AMDGPUUsage.rst
Outdated
|
||
This can avoid unnecessary waiting in many cases. However, those annotations are | ||
attached using metadata, which can always be dropped by the optimizer when it | ||
inhibits optimizations, and the cost of not performing that optimization is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this paragraph trying to say to the reader? Is there something actionable for the reader to take care of?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to make it clear that the tags aren't a promise, just an optimization hint.
I changed the wording, is that better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I think this looks good.
As noted offline, I do think we should also accept amdgpu-as
on atomic operations for consistency, but that's something for a different change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. But please do a quick spelling/grammar check.
Add a section about fence & address spaces that covers amdgpu-as.
Add a section about fence & address spaces that covers amdgpu-as.