[AMDGPU] Only emit SCOPE_SYS global_wb #110636

Pierre-vh · 2024-10-01T08:03:54Z

global_wb with scopes lower than SCOPE_SYS is unnecessary for correctness.

I was initially optimistic they would be very cheap no-ops but they can actually be quite expensive so let's avoid them.

global_wb with scopes lower than SCOPE_SYS is unnecessary for correctness. I was initially optimistic they would be very cheap no-ops but they can actually be quite expensive so let's avoid them.

Pierre-vh · 2024-10-02T11:26:56Z

LLVMBot hasn't commented on this one for some reason.
@llvm/pr-subscribers-backend-amdgpu

llvmbot · 2024-10-02T11:32:30Z

@llvm/pr-subscribers-backend-amdgpu

Author: Pierre van Houtryve (Pierre-vh)

Changes

global_wb with scopes lower than SCOPE_SYS is unnecessary for correctness.

I was initially optimistic they would be very cheap no-ops but they can actually be quite expensive so let's avoid them.

Patch is 687.25 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/110636.diff

38 Files Affected:

(modified) llvm/docs/AMDGPUUsage.rst (+126-208)
(modified) llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp (+7-29)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/atomicrmw_fmax.ll (+3-18)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/atomicrmw_fmin.ll (+3-18)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/mubuf-global.ll (-10)
(modified) llvm/test/CodeGen/AMDGPU/atomic_optimizations_global_pointer.ll (-32)
(modified) llvm/test/CodeGen/AMDGPU/atomicrmw-expand.ll (-3)
(modified) llvm/test/CodeGen/AMDGPU/buffer-fat-pointer-atomicrmw-fadd.ll (+29-55)
(modified) llvm/test/CodeGen/AMDGPU/buffer-fat-pointer-atomicrmw-fmax.ll (+46-58)
(modified) llvm/test/CodeGen/AMDGPU/buffer-fat-pointer-atomicrmw-fmin.ll (+46-58)
(modified) llvm/test/CodeGen/AMDGPU/flat-atomicrmw-fadd.ll (-66)
(modified) llvm/test/CodeGen/AMDGPU/flat-atomicrmw-fmax.ll (-50)
(modified) llvm/test/CodeGen/AMDGPU/flat-atomicrmw-fmin.ll (-50)
(modified) llvm/test/CodeGen/AMDGPU/flat-atomicrmw-fsub.ll (-46)
(modified) llvm/test/CodeGen/AMDGPU/flat_atomics_i64.ll (-107)
(modified) llvm/test/CodeGen/AMDGPU/fp-atomics-gfx940.ll (-4)
(modified) llvm/test/CodeGen/AMDGPU/global-atomicrmw-fadd.ll (-80)
(modified) llvm/test/CodeGen/AMDGPU/global-atomicrmw-fmax.ll (-50)
(modified) llvm/test/CodeGen/AMDGPU/global-atomicrmw-fmin.ll (-50)
(modified) llvm/test/CodeGen/AMDGPU/global-atomicrmw-fsub.ll (-46)
(modified) llvm/test/CodeGen/AMDGPU/global_atomics_i64.ll (-97)
(modified) llvm/test/CodeGen/AMDGPU/insert_waitcnt_for_precise_memory.ll (-3)
(modified) llvm/test/CodeGen/AMDGPU/local-atomicrmw-fadd.ll (-33)
(modified) llvm/test/CodeGen/AMDGPU/local-atomicrmw-fmax.ll (-30)
(modified) llvm/test/CodeGen/AMDGPU/local-atomicrmw-fmin.ll (-30)
(modified) llvm/test/CodeGen/AMDGPU/local-atomicrmw-fsub.ll (-30)
(modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-fence-mmra-global.ll (-18)
(modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-fence.ll (-18)
(modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-agent.ll (-116)
(modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-volatile.ll (-1)
(modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-workgroup.ll (-54)
(modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-global-agent.ll (-114)
(modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-global-volatile.ll (-1)
(modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-global-workgroup.ll (-58)
(modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-local-agent.ll (-29)
(modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-local-system.ll (-29)
(modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-local-volatile.ll (-1)
(modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-local-workgroup.ll (-29)

diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 9e11b13c101d47..bfac4738732631 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -14182,8 +14182,13 @@ For GFX12:
 * ``global_inv`` invalidates caches whose scope is strictly smaller than the
   instruction's. The invalidation requests cannot be reordered with pending or
   upcoming memory operations.
-* ``global_wb`` additionally ensures that previous memory operation done at
-  a lower scope level have reached the ``SCOPE:`` of the ``global_wb``.
+* ``global_wb`` is a writeback operation that additionally ensures previous
+  memory operation done at a lower scope level have reached the ``SCOPE:``
+  of the ``global_wb``.
+
+  * ``global_wb`` can be omitted for scopes other than ``SCOPE_SYS`` in
+    gfx120x.
+
 * The vector memory operations access a vector L0 cache. There is a single L0
   cache per CU. Each SIMD of a CU accesses the same L0 cache. Therefore, no
   special action is required for coherence between the lanes of a single
@@ -14890,19 +14895,7 @@ the instruction in the code sequence that references the table.
      store atomic release      - singlethread - global   1. buffer/global/ds/flat_store
                                - wavefront    - local
                                               - generic
-     store atomic release      - workgroup    - global   1. ``global_wb scope:SCOPE_SE``
-
-                                                           - If CU wavefront execution
-                                                             mode, omit.
-                                                           - In combination with the waits
-                                                             below, ensures that all
-                                                             memory operations
-                                                             have completed at workgroup
-                                                             scope before performing the
-                                                             store that is being
-                                                             released.
-
-                                                         2. | ``s_wait_bvhcnt 0x0``
+     store atomic release      - workgroup    - global   1. | ``s_wait_bvhcnt 0x0``
                                                             | ``s_wait_samplecnt 0x0``
                                                             | ``s_wait_storecnt 0x0``
                                                             | ``s_wait_loadcnt 0x0``
@@ -14925,7 +14918,11 @@ the instruction in the code sequence that references the table.
                                                              atomicrmw-with-return-value.
                                                            - ``s_wait_storecnt 0x0``
                                                              must happen after
-                                                             ``global_wb``.
+                                                             any preceding
+                                                             global/generic
+                                                             store/store
+                                                             atomic/
+                                                             atomicrmw-no-return-value.
                                                            - ``s_wait_dscnt 0x0``
                                                              must happen after
                                                              any preceding
@@ -14945,19 +14942,7 @@ the instruction in the code sequence that references the table.
 
                                                            - Apply :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx12-scopes-table`.
 
-     store atomic release      - workgroup    - local    1. ``global_wb scope:SCOPE_SE``
-
-                                                           - If CU wavefront execution
-                                                             mode or OpenCL, omit.
-                                                           - In combination with the waits
-                                                             below, ensures that all
-                                                             memory operations
-                                                             have completed at workgroup
-                                                             scope before performing the
-                                                             store that is being
-                                                             released.
-
-                                                         2. | ``s_wait_bvhcnt 0x0``
+     store atomic release      - workgroup    - local    1. | ``s_wait_bvhcnt 0x0``
                                                             | ``s_wait_samplecnt 0x0``
                                                             | ``s_wait_storecnt 0x0``
                                                             | ``s_wait_loadcnt 0x0``
@@ -14980,7 +14965,11 @@ the instruction in the code sequence that references the table.
                                                              atomicrmw-with-return-value.
                                                            - ``s_wait_storecnt 0x0``
                                                              must happen after
-                                                             ``global_wb``.
+                                                             any preceding
+                                                             global/generic
+                                                             store/store
+                                                             atomic/
+                                                             atomicrmw-no-return-value.
                                                            - Must happen before the
                                                              following store.
                                                            - Ensures that all
@@ -14992,16 +14981,9 @@ the instruction in the code sequence that references the table.
                                                              released.
 
                                                          3. ds_store
-     store atomic release      - agent        - global   1. ``global_wb``
+     store atomic release      - agent        - global   1. ``global_wb scope:SCOPE_SYS``
                                - system       - generic
-                                                              - Apply :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx12-scopes-table`.
-                                                              - In combination with the waits
-                                                                below, ensures that all
-                                                                memory operations
-                                                                have completed at agent or system
-                                                                scope before performing the
-                                                                store that is being
-                                                                released.
+                                                            - If agent scope, omit.
 
                                                          2. | ``s_wait_bvhcnt 0x0``
                                                             | ``s_wait_samplecnt 0x0``
@@ -15025,7 +15007,12 @@ the instruction in the code sequence that references the table.
                                                              atomicrmw-with-return-value.
                                                            - ``s_wait_storecnt 0x0``
                                                              must happen after
-                                                             ``global_wb``.
+                                                             ``global_wb`` if present, or
+                                                             any preceding
+                                                             global/generic
+                                                             store/store
+                                                             atomic/
+                                                             atomicrmw-no-return-value.
                                                            - ``s_wait_dscnt 0x0``
                                                              must happen after
                                                              any preceding
@@ -15050,20 +15037,8 @@ the instruction in the code sequence that references the table.
      atomicrmw    release      - singlethread - global   1. buffer/global/ds/flat_atomic
                                - wavefront    - local
                                               - generic
-     atomicrmw    release      - workgroup    - global   1. ``global_wb scope:SCOPE_SE``
-                                              - generic
-                                                            - If CU wavefront execution
-                                                              mode, omit.
-                                                            - In combination with the waits
-                                                              below, ensures that all
-                                                              memory operations
-                                                              have completed at workgroup
-                                                              scope before performing the
-                                                              store that is being
-                                                              released.
-
-                                                         2. | ``s_wait_bvhcnt 0x0``
-                                                            | ``s_wait_samplecnt 0x0``
+     atomicrmw    release      - workgroup    - global   1. | ``s_wait_bvhcnt 0x0``
+                                              - generic     | ``s_wait_samplecnt 0x0``
                                                             | ``s_wait_storecnt 0x0``
                                                             | ``s_wait_loadcnt 0x0``
                                                             | ``s_wait_dscnt 0x0``
@@ -15086,15 +15061,19 @@ the instruction in the code sequence that references the table.
                                                              atomic/
                                                              atomicrmw-with-return-value.
                                                            - ``s_wait_storecnt 0x0``
-                                                              must happen after
-                                                              ``global_wb``.
+                                                             must happen after
+                                                             any preceding
+                                                             global/generic
+                                                             store/store
+                                                             atomic/
+                                                             atomicrmw-no-return-value.
                                                            - ``s_wait_dscnt 0x0``
-                                                              must happen after
-                                                              any preceding
-                                                              local/generic
-                                                              load/store/load
-                                                              atomic/store
-                                                              atomic/atomicrmw.
+                                                             must happen after
+                                                             any preceding
+                                                             local/generic
+                                                             load/store/load
+                                                             atomic/store
+                                                             atomic/atomicrmw.
                                                            - Must happen before the
                                                              following atomic.
                                                            - Ensures that all
@@ -15105,23 +15084,11 @@ the instruction in the code sequence that references the table.
                                                              atomicrmw that is
                                                              being released.
 
-                                                         3. buffer/global/flat_atomic
+                                                         2. buffer/global/flat_atomic
 
                                                            - Apply :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx12-scopes-table`.
 
-     atomicrmw    release      - workgroup    - local    1. ``global_wb scope:SCOPE_SE``
-
-                                                           - If CU wavefront execution
-                                                             mode or OpenCL, omit.
-                                                           - In combination with the waits
-                                                             below, ensures that all
-                                                             memory operations
-                                                             have completed at workgroup
-                                                             scope before performing the
-                                                             store that is being
-                                                             released.
-
-                                                         2. | ``s_wait_bvhcnt 0x0``
+     atomicrmw    release      - workgroup    - local    1. | ``s_wait_bvhcnt 0x0``
                                                             | ``s_wait_samplecnt 0x0``
                                                             | ``s_wait_storecnt 0x0``
                                                             | ``s_wait_loadcnt 0x0``
@@ -15144,7 +15111,11 @@ the instruction in the code sequence that references the table.
                                                              atomicrmw-with-return-value.
                                                            - ``s_wait_storecnt 0x0``
                                                              must happen after
-                                                             ``global_wb``.
+                                                             any preceding
+                                                             global/generic
+                                                             store/store
+                                                             atomic/
+                                                             atomicrmw-no-return-value.
                                                            - Must happen before the
                                                              following atomic.
                                                            - Ensures that all
@@ -15155,17 +15126,10 @@ the instruction in the code sequence that references the table.
                                                              store that is being
                                                              released.
 
-                                                         3. ds_atomic
-     atomicrmw    release      - agent        - global   1. ``global_wb scope:``
+                                                         2. ds_atomic
+     atomicrmw    release      - agent        - global   1. ``global_wb scope:SCOPE_SYS``
                                - system       - generic
-                                                           - Apply :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx12-scopes-table`.
-                                                           - In combination with the waits
-                                                             below, ensures that all
-                                                             memory operations
-                                                             have completed at agent or system
-                                                             scope before performing the
-                                                             store that is being
-                                                             released.
+                                                           - If agent scope, omit.
 
                                                          2. | ``s_wait_bvhcnt 0x0``
                                                             | ``s_wait_samplecnt 0x0``
@@ -15188,7 +15152,12 @@ the instruction in the code sequence that references the table.
                                                              atomicrmw-with-return-value.
                                                            - ``s_wait_storecnt 0x0``
                                                              must happen after
-                                                             ``global_wb``
+                                                             ``global_wb`` if present, or
+                                                             any preceding
+                                                             global/generic
+                                                             store/store
+                                                             atomic/
+                                                             atomicrmw-no-return-value.
                                                            - ``s_wait_dscnt 0x0``
                                                              must happen after
                                                              any preceding
@@ -15212,19 +15181,7 @@ the instruction in the code sequence that references the table.
 
      fence        release      - singlethread *none*     *none*
                                - wavefront
-     fence        release      - workgroup    *none*     1. ``global_wb scope:SCOPE_SE``
-
-                                                            - If CU wavefront execution
-                                                              mode, omit.
-                                                            - In combination with the waits
-                                                              below, ensures that all
-                                                              memory operations
-                                                              have completed at workgroup
-                                                              scope before performing the
-                                                              store that is being
-                                                              released.
-
-                                                         2. | ``s_wait_bvhcnt 0x0``
+     fence        release      - workgroup    *none*     1. | ``s_wait_bvhcnt 0x0``
                                                             | ``s_wait_samplecnt 0x0``
                                                             | ``s_wait_storecnt 0x0``
                                                             | ``s_wait_loadcnt 0x0``
@@ -15254,7 +15211,11 @@ the instruction in the code sequence that references the table.
                                                              atomicrmw-with-return-value.
                                                            - ``s_wait_storecnt 0x0``
                                                              must happen after
-                                                             ``global_wb``
+                              ...
[truncated]

dstutt

Based on the internal discussion for this, I think this LGTM
Maybe get approval from one of the tagged reviewers though.

perlfu

LGTM

llvm-ci · 2024-10-07T06:53:25Z

LLVM Buildbot has detected a new failure on builder clang-aarch64-sve-vla-2stage running on linaro-g3-03 while building llvm at step 11 "build stage 2".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/41/builds/2586

Here is the relevant piece of the build log for the reference

Step 11 (build stage 2) failure: 'ninja' (failure)
...
[7837/8698] Building CXX object tools/flang/lib/Parser/CMakeFiles/FortranParser.dir/instrumented-parser.cpp.o
[7838/8698] Building CXX object tools/flang/lib/Parser/CMakeFiles/FortranParser.dir/token-sequence.cpp.o
[7839/8698] Building CXX object tools/flang/lib/Parser/CMakeFiles/FortranParser.dir/debug-parser.cpp.o
[7840/8698] Building CXX object tools/flang/lib/Parser/CMakeFiles/FortranParser.dir/io-parsers.cpp.o
[7841/8698] Linking CXX executable bin/dexp
clang++: warning: argument unused during compilation: '-mllvm -scalable-vectorization=preferred' [-Wunused-command-line-argument]
clang++: warning: argument unused during compilation: '-mllvm -treat-scalable-fixed-error-as-warning=false' [-Wunused-command-line-argument]
[7842/8698] Building CXX object tools/flang/lib/Parser/CMakeFiles/FortranParser.dir/parse-tree.cpp.o
[7843/8698] Building CXX object tools/flang/lib/Parser/CMakeFiles/FortranParser.dir/parsing.cpp.o
[7844/8698] Building CXX object tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/OpenMP/Decomposer.cpp.o
FAILED: tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/OpenMP/Decomposer.cpp.o 
/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage1.install/bin/clang++ -DFLANG_INCLUDE_TESTS=1 -DFLANG_LITTLE_ENDIAN=1 -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/tools/flang/lib/Lower -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/flang/lib/Lower -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/flang/include -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/tools/flang/include -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/include -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/llvm/include -isystem /home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/llvm/../mlir/include -isystem /home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/tools/mlir/include -isystem /home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/tools/clang/include -isystem /home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/llvm/../clang/include -mcpu=neoverse-512tvb -mllvm -scalable-vectorization=preferred -mllvm -treat-scalable-fixed-error-as-warning=false -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wno-comment -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -Wno-deprecated-copy -Wno-string-conversion -Wno-ctad-maybe-unsupported -Wno-unused-command-line-argument -Wstring-conversion           -Wcovered-switch-default -Wno-nested-anon-types -O3 -DNDEBUG -std=c++17  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/OpenMP/Decomposer.cpp.o -MF tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/OpenMP/Decomposer.cpp.o.d -o tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/OpenMP/Decomposer.cpp.o -c /home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/flang/lib/Lower/OpenMP/Decomposer.cpp
Killed
[7845/8698] Building CXX object tools/flang/lib/Parser/CMakeFiles/FortranParser.dir/program-parsers.cpp.o
[7846/8698] Building CXX object tools/flang/lib/Parser/CMakeFiles/FortranParser.dir/expr-parsers.cpp.o
[7847/8698] Building CXX object tools/flang/lib/Parser/CMakeFiles/FortranParser.dir/tools.cpp.o
[7848/8698] Building CXX object tools/flang/lib/Parser/CMakeFiles/FortranParser.dir/openacc-parsers.cpp.o
[7849/8698] Linking CXX executable bin/clangd-indexer
clang++: warning: argument unused during compilation: '-mllvm -scalable-vectorization=preferred' [-Wunused-command-line-argument]
clang++: warning: argument unused during compilation: '-mllvm -treat-scalable-fixed-error-as-warning=false' [-Wunused-command-line-argument]
[7850/8698] Linking CXX shared library lib/libclang.so.20.0.0git
clang++: warning: argument unused during compilation: '-mllvm -scalable-vectorization=preferred' [-Wunused-command-line-argument]
clang++: warning: argument unused during compilation: '-mllvm -treat-scalable-fixed-error-as-warning=false' [-Wunused-command-line-argument]
[7851/8698] Linking CXX executable bin/clangd-fuzzer
clang++: warning: argument unused during compilation: '-mllvm -scalable-vectorization=preferred' [-Wunused-command-line-argument]
clang++: warning: argument unused during compilation: '-mllvm -treat-scalable-fixed-error-as-warning=false' [-Wunused-command-line-argument]
[7852/8698] Linking CXX executable bin/clangd
clang++: warning: argument unused during compilation: '-mllvm -scalable-vectorization=preferred' [-Wunused-command-line-argument]
clang++: warning: argument unused during compilation: '-mllvm -treat-scalable-fixed-error-as-warning=false' [-Wunused-command-line-argument]
[7853/8698] Building CXX object tools/flang/lib/Parser/CMakeFiles/FortranParser.dir/executable-parsers.cpp.o
[7854/8698] Building CXX object tools/flang/lib/Parser/CMakeFiles/FortranParser.dir/openmp-parsers.cpp.o
[7855/8698] Building CXX object tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/ConvertType.cpp.o
[7856/8698] Building CXX object tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/OpenACC.cpp.o
[7857/8698] Building CXX object tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/IO.cpp.o
[7858/8698] Building CXX object tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/Allocatable.cpp.o
[7859/8698] Building CXX object tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/HostAssociations.cpp.o
[7860/8698] Building CXX object tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/ConvertCall.cpp.o
[7861/8698] Building CXX object tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/ConvertVariable.cpp.o
[7862/8698] Building CXX object tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/ConvertConstant.cpp.o
[7863/8698] Building CXX object tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/IterationSpace.cpp.o
[7864/8698] Building CXX object tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/ConvertArrayConstructor.cpp.o
[7865/8698] Building CXX object tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/CallInterface.cpp.o
[7866/8698] Building CXX object tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/ConvertProcedureDesignator.cpp.o
[7867/8698] Building CXX object tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/ConvertExpr.cpp.o
[7868/8698] Building CXX object tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/ConvertExprToHLFIR.cpp.o
[7869/8698] Building CXX object tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/Bridge.cpp.o
[7870/8698] Building CXX object tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/OpenMP/DataSharingProcessor.cpp.o
[7871/8698] Building CXX object tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/OpenMP/ClauseProcessor.cpp.o
[7872/8698] Building CXX object tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/PFTBuilder.cpp.o

* commit 'FETCH_HEAD': [X86] getIntImmCostInst - pull out repeated Imm.getBitWidth() calls. NFC. [X86] Add test coverage for llvm#111323 [Driver] Use empty multilib file in another test (llvm#111352) [clang][OpenMP][test] Use x86_64-linux-gnu triple for test referencing avx512f feature (llvm#111337) [doc] Fix Kaleidoscope tutorial chapter 3 code snippet and full listing discrepancies (llvm#111289) [Flang][OpenMP] Improve entry block argument creation and binding (llvm#110267) [x86] combineMul - handle 0/-1 KnownBits cases before MUL_IMM logic (REAPPLIED) [llvm-dis] Fix non-deterministic disassembly across multiple inputs (llvm#110988) [lldb][test] TestDataFormatterLibcxxOptionalSimulator.py: change order of ifdefs [lldb][test] Add libcxx-simulators test for std::optional (llvm#111133) [x86] combineMul - use computeKnownBits directly to find MUL_IMM constant splat. (REAPPLIED) Reland "[lldb][test] TestDataFormatterLibcxxStringSimulator.py: add new padding layout" (llvm#111123) Revert "[x86] combineMul - use computeKnownBits directly to find MUL_IMM constant splat." update_test_checks: fix a simple regression (llvm#111347) [LegalizeVectorTypes] Always widen fabs (llvm#111298) [lsan] Make ReportUnsuspendedThreads return bool also for Fuchsia [mlir][vector] Add more tests for ConvertVectorToLLVM (6/n) (llvm#111121) [bazel] port 9144fed [SystemZ] Remove inlining threshold multiplier. (llvm#106058) [LegalizeVectorTypes] When widening don't check for libcalls if promoted (llvm#111297) [clang][Driver] Improve multilib custom error reporting (llvm#110804) [clang][Driver] Rename "FatalError" key to "Error" in multilib.yaml (llvm#110804) [LLVM][Maintainers] Update release managers (llvm#111164) [Clang][Driver] Add option to provide path for multilib's YAML config file (llvm#109640) [LoopVectorize] Remove redundant code in emitSCEVChecks (llvm#111132) [AMDGPU] Only emit SCOPE_SYS global_wb (llvm#110636) [ELF] Change Ctx::target to unique_ptr (llvm#111260) [ELF] Pass Ctx & to some free functions [RISCV] Only disassemble fcvtmod.w.d if the rounding mode is rtz. (llvm#111308) [Clang] Remove the special-casing for RequiresExprBodyDecl in BuildResolvedCallExpr() after fd87d76 (llvm#111277) [ELF] Pass Ctx & to InputFile [clang-format] Add AlignFunctionDeclarations to AlignConsecutiveDeclarations (llvm#108241) [AMDGPU] Support preloading hidden kernel arguments (llvm#98861) [ELF] Move static nextGroupId isInGroup to LinkerDriver [clangd] Add ArgumentLists config option under Completion (llvm#111322) [ELF] Pass Ctx & to SyntheticSections [ELF] Pass Ctx & to Symbols [ELF] Pass Ctx & to Symbols [ELF] getRelocTargetVA: pass Ctx and Relocation. NFC [clang-tidy] Avoid capturing a local variable in a static lambda in UseRangesCheck (llvm#111282) [VPlan] Use pointer to member 0 as VPInterleaveRecipe's pointer arg. (llvm#106431) [clangd] Simplify ternary expressions with std::optional::value_or (NFC) (llvm#111309) [libc++][format][2/3] Optimizes c-string arguments. (llvm#101805) [RISCV] Combine RVBUnary and RVKUnary into classes that are more similar to ALU(W)_r(r/i). NFC (llvm#111279) [ELF] Pass Ctx & to InputFiles [libc] GPU RPC interface: add return value to `rpc_host_call` (llvm#111288) Signed-off-by: kyvangka1610 <[email protected]>

[AMDGPU] Only emit SCOPE_SYS global_wb

0a97840

global_wb with scopes lower than SCOPE_SYS is unnecessary for correctness. I was initially optimistic they would be very cheap no-ops but they can actually be quite expensive so let's avoid them.

Pierre-vh requested review from jayfoad, nhaehnle and t-tye October 1, 2024 08:03

arsenm added the backend:AMDGPU label Oct 2, 2024

dstutt approved these changes Oct 3, 2024

View reviewed changes

jayfoad requested a review from perlfu October 3, 2024 10:40

perlfu approved these changes Oct 4, 2024

View reviewed changes

nhaehnle approved these changes Oct 4, 2024

View reviewed changes

Pierre-vh merged commit 924a64a into llvm:main Oct 7, 2024
11 checks passed

Pierre-vh deleted the speedup-gfx12-mm branch October 7, 2024 05:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMDGPU] Only emit SCOPE_SYS global_wb #110636

[AMDGPU] Only emit SCOPE_SYS global_wb #110636

Uh oh!

Pierre-vh commented Oct 1, 2024

Uh oh!

Pierre-vh commented Oct 2, 2024

Uh oh!

llvmbot commented Oct 2, 2024

Uh oh!

dstutt left a comment

Uh oh!

perlfu left a comment

Uh oh!

Uh oh!

llvm-ci commented Oct 7, 2024

Uh oh!

Uh oh!

[AMDGPU] Only emit SCOPE_SYS global_wb #110636

[AMDGPU] Only emit SCOPE_SYS global_wb #110636

Uh oh!

Conversation

Pierre-vh commented Oct 1, 2024

Uh oh!

Pierre-vh commented Oct 2, 2024

Uh oh!

llvmbot commented Oct 2, 2024

Uh oh!

dstutt left a comment

Choose a reason for hiding this comment

Uh oh!

perlfu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llvm-ci commented Oct 7, 2024

Uh oh!

Uh oh!