diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst index bb6751038fc9c..eb9a362f6f0cb 100644 --- a/llvm/docs/AMDGPUUsage.rst +++ b/llvm/docs/AMDGPUUsage.rst @@ -5969,6 +5969,33 @@ following sections: * :ref:`amdgpu-amdhsa-memory-model-gfx942` * :ref:`amdgpu-amdhsa-memory-model-gfx10-gfx11` +.. _amdgpu-fence-as: + +Fence and Address Spaces +++++++++++++++++++++++++++++++ + +LLVM fences do not have address space information, thus, fence +codegen usually needs to conservatively synchronize all address spaces. + +In the case of OpenCL, where fences only need to synchronize +user-specified address spaces, this can result in extra unnecessary waits. +For instance, a fence that is supposed to only synchronize local memory will +also have to wait on all global memory operations, which is unnecessary. + +:doc:`Memory Model Relaxation Annotations ` can +be used as an optimization hint for fences to solve this problem. +The AMDGPU backend recognizes the following tags on fences: + +- ``amdgpu-as:local`` - fence only the local address space +- ``amdgpu-as:global``- fence only the global address space + +.. note:: + + As an optimization hint, those tags are not guaranteed to survive until + code generation. Optimizations are free to drop the tags to allow for + better code optimization, at the cost of synchronizing additional address + spaces. + .. _amdgpu-amdhsa-memory-model-gfx6-gfx9: Memory Model GFX6-GFX9 @@ -6306,21 +6333,9 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx6-gfx9-table`. - If OpenCL and address space is not generic, omit. - - However, since LLVM - currently has no - address space on - the fence need to - conservatively - always generate. If - fence had an - address space then - set to address - space of OpenCL - fence flag, or to - generic if both - local and global - flags are - specified. + - See :ref:`amdgpu-fence-as` for + more details on fencing specific + address spaces. - Must happen after any preceding local/generic load @@ -6352,14 +6367,9 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx6-gfx9-table`. address space is not generic, omit lgkmcnt(0). - - However, since LLVM - currently has no - address space on - the fence need to - conservatively - always generate - (see comment for - previous fence). + - See :ref:`amdgpu-fence-as` for + more details on fencing specific + address spaces. - Could be split into separate s_waitcnt vmcnt(0) and @@ -6562,21 +6572,9 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx6-gfx9-table`. - If OpenCL and address space is not generic, omit. - - However, since LLVM - currently has no - address space on - the fence need to - conservatively - always generate. If - fence had an - address space then - set to address - space of OpenCL - fence flag, or to - generic if both - local and global - flags are - specified. + - See :ref:`amdgpu-fence-as` for + more details on fencing specific + address spaces. - Must happen after any preceding local/generic @@ -6612,21 +6610,9 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx6-gfx9-table`. address space is local, omit vmcnt(0). - - However, since LLVM - currently has no - address space on - the fence need to - conservatively - always generate. If - fence had an - address space then - set to address - space of OpenCL - fence flag, or to - generic if both - local and global - flags are - specified. + - See :ref:`amdgpu-fence-as` for + more details on fencing specific + address spaces. - Could be split into separate s_waitcnt vmcnt(0) and @@ -6956,14 +6942,9 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx6-gfx9-table`. address space is not generic, omit lgkmcnt(0). - - However, since LLVM - currently has no - address space on - the fence need to - conservatively - always generate - (see comment for - previous fence). + - See :ref:`amdgpu-fence-as` for + more details on fencing specific + address spaces. - Could be split into separate s_waitcnt vmcnt(0) and @@ -7904,21 +7885,9 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`. address space is local, omit vmcnt(0). - - However, since LLVM - currently has no - address space on - the fence need to - conservatively - always generate. If - fence had an - address space then - set to address - space of OpenCL - fence flag, or to - generic if both - local and global - flags are - specified. + - See :ref:`amdgpu-fence-as` for + more details on fencing specific + address spaces. - s_waitcnt vmcnt(0) must happen after any preceding @@ -7977,14 +7946,9 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`. address space is not generic, omit lgkmcnt(0). - - However, since LLVM - currently has no - address space on - the fence need to - conservatively - always generate - (see comment for - previous fence). + - See :ref:`amdgpu-fence-as` for + more details on fencing specific + address spaces. - Could be split into separate s_waitcnt vmcnt(0) and @@ -8055,14 +8019,9 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`. address space is not generic, omit lgkmcnt(0). - - However, since LLVM - currently has no - address space on - the fence need to - conservatively - always generate - (see comment for - previous fence). + - See :ref:`amdgpu-fence-as` for + more details on fencing specific + address spaces. - Could be split into separate s_waitcnt vmcnt(0) and @@ -8430,21 +8389,9 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`. address space is local, omit vmcnt(0). - - However, since LLVM - currently has no - address space on - the fence need to - conservatively - always generate. If - fence had an - address space then - set to address - space of OpenCL - fence flag, or to - generic if both - local and global - flags are - specified. + - See :ref:`amdgpu-fence-as` for + more details on fencing specific + address spaces. - s_waitcnt vmcnt(0) must happen after any preceding @@ -8490,21 +8437,9 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`. address space is local, omit vmcnt(0). - - However, since LLVM - currently has no - address space on - the fence need to - conservatively - always generate. If - fence had an - address space then - set to address - space of OpenCL - fence flag, or to - generic if both - local and global - flags are - specified. + - See :ref:`amdgpu-fence-as` for + more details on fencing specific + address spaces. - Could be split into separate s_waitcnt vmcnt(0) and @@ -8572,21 +8507,9 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`. address space is local, omit vmcnt(0). - - However, since LLVM - currently has no - address space on - the fence need to - conservatively - always generate. If - fence had an - address space then - set to address - space of OpenCL - fence flag, or to - generic if both - local and global - flags are - specified. + - See :ref:`amdgpu-fence-as` for + more details on fencing specific + address spaces. - Could be split into separate s_waitcnt vmcnt(0) and @@ -9207,14 +9130,9 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`. address space is not generic, omit lgkmcnt(0). - - However, since LLVM - currently has no - address space on - the fence need to - conservatively - always generate - (see comment for - previous fence). + - See :ref:`amdgpu-fence-as` for + more details on fencing specific + address spaces. - Could be split into separate s_waitcnt vmcnt(0) and @@ -9316,14 +9234,9 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`. address space is not generic, omit lgkmcnt(0). - - However, since LLVM - currently has no - address space on - the fence need to - conservatively - always generate - (see comment for - previous fence). + - See :ref:`amdgpu-fence-as` for + more details on fencing specific + address spaces. - Could be split into separate s_waitcnt vmcnt(0) and @@ -10279,21 +10192,9 @@ are defined in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-gfx9 address space is local, omit vmcnt(0). - - However, since LLVM - currently has no - address space on - the fence need to - conservatively - always generate. If - fence had an - address space then - set to address - space of OpenCL - fence flag, or to - generic if both - local and global - flags are - specified. + - See :ref:`amdgpu-fence-as` for + more details on fencing specific + address spaces. - s_waitcnt vmcnt(0) must happen after any preceding @@ -10352,14 +10253,9 @@ are defined in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-gfx9 address space is not generic, omit lgkmcnt(0). - - However, since LLVM - currently has no - address space on - the fence need to - conservatively - always generate - (see comment for - previous fence). + - See :ref:`amdgpu-fence-as` for + more details on fencing specific + address spaces. - Could be split into separate s_waitcnt vmcnt(0) and @@ -10430,14 +10326,9 @@ are defined in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-gfx9 address space is not generic, omit lgkmcnt(0). - - However, since LLVM - currently has no - address space on - the fence need to - conservatively - always generate - (see comment for - previous fence). + - See :ref:`amdgpu-fence-as` for + more details on fencing specific + address spaces. - Could be split into separate s_waitcnt vmcnt(0) and @@ -10836,21 +10727,9 @@ are defined in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-gfx9 address space is local, omit vmcnt(0). - - However, since LLVM - currently has no - address space on - the fence need to - conservatively - always generate. If - fence had an - address space then - set to address - space of OpenCL - fence flag, or to - generic if both - local and global - flags are - specified. + - See :ref:`amdgpu-fence-as` for + more details on fencing specific + address spaces. - s_waitcnt vmcnt(0) must happen after any preceding @@ -10909,21 +10788,9 @@ are defined in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-gfx9 address space is local, omit vmcnt(0). - - However, since LLVM - currently has no - address space on - the fence need to - conservatively - always generate. If - fence had an - address space then - set to address - space of OpenCL - fence flag, or to - generic if both - local and global - flags are - specified. + - See :ref:`amdgpu-fence-as` for + more details on fencing specific + address spaces. - Could be split into separate s_waitcnt vmcnt(0) and @@ -10988,21 +10855,9 @@ are defined in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-gfx9 address space is local, omit vmcnt(0). - - However, since LLVM - currently has no - address space on - the fence need to - conservatively - always generate. If - fence had an - address space then - set to address - space of OpenCL - fence flag, or to - generic if both - local and global - flags are - specified. + - See :ref:`amdgpu-fence-as` for + more details on fencing specific + address spaces. - Could be split into separate s_waitcnt vmcnt(0) and @@ -11651,14 +11506,9 @@ are defined in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-gfx9 address space is not generic, omit lgkmcnt(0). - - However, since LLVM - currently has no - address space on - the fence need to - conservatively - always generate - (see comment for - previous fence). + - See :ref:`amdgpu-fence-as` for + more details on fencing specific + address spaces. - Could be split into separate s_waitcnt vmcnt(0) and @@ -11760,14 +11610,9 @@ are defined in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-gfx9 address space is not generic, omit lgkmcnt(0). - - However, since LLVM - currently has no - address space on - the fence need to - conservatively - always generate - (see comment for - previous fence). + - See :ref:`amdgpu-fence-as` for + more details on fencing specific + address spaces. - Could be split into separate s_waitcnt vmcnt(0) and @@ -12613,21 +12458,9 @@ table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx10-gfx11-table`. address space is local, omit vmcnt(0) and vscnt(0). - - However, since LLVM - currently has no - address space on - the fence need to - conservatively - always generate. If - fence had an - address space then - set to address - space of OpenCL - fence flag, or to - generic if both - local and global - flags are - specified. + - See :ref:`amdgpu-fence-as` for + more details on fencing specific + address spaces. - Could be split into separate s_waitcnt vmcnt(0), s_waitcnt @@ -12710,14 +12543,9 @@ table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx10-gfx11-table`. address space is local, omit vmcnt(0) and vscnt(0). - - However, since LLVM - currently has no - address space on - the fence need to - conservatively - always generate - (see comment for - previous fence). + - See :ref:`amdgpu-fence-as` for + more details on fencing specific + address spaces. - Could be split into separate s_waitcnt vmcnt(0), s_waitcnt @@ -13081,21 +12909,9 @@ table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx10-gfx11-table`. address space is local, omit vmcnt(0) and vscnt(0). - - However, since LLVM - currently has no - address space on - the fence need to - conservatively - always generate. If - fence had an - address space then - set to address - space of OpenCL - fence flag, or to - generic if both - local and global - flags are - specified. + - See :ref:`amdgpu-fence-as` for + more details on fencing specific + address spaces. - Could be split into separate s_waitcnt vmcnt(0), s_waitcnt @@ -13154,21 +12970,9 @@ table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx10-gfx11-table`. address space is local, omit vmcnt(0) and vscnt(0). - - However, since LLVM - currently has no - address space on - the fence need to - conservatively - always generate. If - fence had an - address space then - set to address - space of OpenCL - fence flag, or to - generic if both - local and global - flags are - specified. + - See :ref:`amdgpu-fence-as` for + more details on fencing specific + address spaces. - Could be split into separate s_waitcnt vmcnt(0), s_waitcnt @@ -13720,14 +13524,9 @@ table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx10-gfx11-table`. address space is local, omit vmcnt(0) and vscnt(0). - - However, since LLVM - currently has no - address space on - the fence need to - conservatively - always generate - (see comment for - previous fence). + - See :ref:`amdgpu-fence-as` for + more details on fencing specific + address spaces. - Could be split into separate s_waitcnt vmcnt(0), s_waitcnt