Skip to content

[SYCL][HIP] sycl::atomic_ref::fetch_sub does not work with USM #7252

Closed
@krasznaa

Description

@krasznaa

Describe the bug

While testing our project (https://github.com/acts-project/vecmem) with the HIP backend of the compiler, I found a failure in the unit tests with atomic operations.

[bash][pcadp04]:build > ./bin/vecmem_test_sycl
Running main() from /afs/cern.ch/user/k/krasznaa/work/projects/vecmem/build/_deps/googletest-src/googletest/src/gtest_main.cc
[==========] Running 19 tests from 4 test suites.
[----------] Global test environment set-up.
[----------] 5 tests from sycl_containers_test
[ RUN      ] sycl_containers_test.shared_memory
[       OK ] sycl_containers_test.shared_memory (8 ms)
[ RUN      ] sycl_containers_test.device_memory
[       OK ] sycl_containers_test.device_memory (19 ms)
[ RUN      ] sycl_containers_test.atomic_memory
/afs/cern.ch/user/k/krasznaa/work/projects/vecmem/vecmem/tests/sycl/test_sycl_containers.sycl:210: Failure
Expected equality of these values:
  value
    Which is: 800
  ITERATIONS * 4
    Which is: 400
...
/afs/cern.ch/user/k/krasznaa/work/projects/vecmem/vecmem/tests/sycl/test_sycl_containers.sycl:210: Failure
Expected equality of these values:
  value
    Which is: 800
  ITERATIONS * 4
    Which is: 400
[  FAILED  ] sycl_containers_test.atomic_memory (7 ms)
[ RUN      ] sycl_containers_test.extendable_memory
[       OK ] sycl_containers_test.extendable_memory (13 ms)
[ RUN      ] sycl_containers_test.array_memory
[       OK ] sycl_containers_test.array_memory (4 ms)
[----------] 5 tests from sycl_containers_test (54 ms total)

[----------] 5 tests from sycl_jagged_containers_test
[ RUN      ] sycl_jagged_containers_test.mutate_in_kernel
[       OK ] sycl_jagged_containers_test.mutate_in_kernel (9 ms)
[ RUN      ] sycl_jagged_containers_test.set_in_kernel
[       OK ] sycl_jagged_containers_test.set_in_kernel (14 ms)
[ RUN      ] sycl_jagged_containers_test.set_in_contiguous_kernel
[       OK ] sycl_jagged_containers_test.set_in_contiguous_kernel (13 ms)
[ RUN      ] sycl_jagged_containers_test.filter
[       OK ] sycl_jagged_containers_test.filter (15 ms)
[ RUN      ] sycl_jagged_containers_test.zero_capacity
[       OK ] sycl_jagged_containers_test.zero_capacity (20 ms)
[----------] 5 tests from sycl_jagged_containers_test (73 ms total)

[----------] 3 tests from sycl_memory_resource_tests/memory_resource_test_basic
[ RUN      ] sycl_memory_resource_tests/memory_resource_test_basic.allocations/device_resource
[       OK ] sycl_memory_resource_tests/memory_resource_test_basic.allocations/device_resource (0 ms)
[ RUN      ] sycl_memory_resource_tests/memory_resource_test_basic.allocations/host_resource
[       OK ] sycl_memory_resource_tests/memory_resource_test_basic.allocations/host_resource (5 ms)
[ RUN      ] sycl_memory_resource_tests/memory_resource_test_basic.allocations/shared_resource
[       OK ] sycl_memory_resource_tests/memory_resource_test_basic.allocations/shared_resource (5 ms)
[----------] 3 tests from sycl_memory_resource_tests/memory_resource_test_basic (10 ms total)

[----------] 6 tests from sycl_host_accessible_memory_resource_tests/memory_resource_test_host_accessible
[ RUN      ] sycl_host_accessible_memory_resource_tests/memory_resource_test_host_accessible.int_value/host_resource
[       OK ] sycl_host_accessible_memory_resource_tests/memory_resource_test_host_accessible.int_value/host_resource (0 ms)
[ RUN      ] sycl_host_accessible_memory_resource_tests/memory_resource_test_host_accessible.int_value/shared_resource
[       OK ] sycl_host_accessible_memory_resource_tests/memory_resource_test_host_accessible.int_value/shared_resource (0 ms)
[ RUN      ] sycl_host_accessible_memory_resource_tests/memory_resource_test_host_accessible.double_value/host_resource
[       OK ] sycl_host_accessible_memory_resource_tests/memory_resource_test_host_accessible.double_value/host_resource (0 ms)
[ RUN      ] sycl_host_accessible_memory_resource_tests/memory_resource_test_host_accessible.double_value/shared_resource
[       OK ] sycl_host_accessible_memory_resource_tests/memory_resource_test_host_accessible.double_value/shared_resource (0 ms)
[ RUN      ] sycl_host_accessible_memory_resource_tests/memory_resource_test_host_accessible.custom_value/host_resource
[       OK ] sycl_host_accessible_memory_resource_tests/memory_resource_test_host_accessible.custom_value/host_resource (0 ms)
[ RUN      ] sycl_host_accessible_memory_resource_tests/memory_resource_test_host_accessible.custom_value/shared_resource
[       OK ] sycl_host_accessible_memory_resource_tests/memory_resource_test_host_accessible.custom_value/shared_resource (0 ms)
[----------] 6 tests from sycl_host_accessible_memory_resource_tests/memory_resource_test_host_accessible (0 ms total)

[----------] Global test environment tear-down
[==========] 19 tests from 4 test suites ran. (139 ms total)
[  PASSED  ] 18 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] sycl_containers_test.atomic_memory

 1 FAILED TEST

The reason I wanted to show these results is that it seems that this is the only failure with this backend by now. 😄

To Reproduce

I made the following example to reproduce the issue in a standalone way:

// SYCL include(s).
#include <CL/sycl.hpp>

// System include(s).
#include <iostream>
#include <memory>

int main() {

   // Create a queue.
   sycl::queue queue;
   std::cout << "Using device: "
             << queue.get_device().get_info<sycl::info::device::name>()
             << std::endl;

   // Allocate a small shared memory buffer.
   static constexpr std::size_t BUFFER_SIZE=10;
   int* buffer = static_cast<int*>(sycl::malloc_shared(BUFFER_SIZE*sizeof(int), queue));

   // Set all elements of it to some well defined value.
   for (std::size_t i = 0; i < BUFFER_SIZE; ++i) {
      buffer[i] = 100;
   }

   // The type of atomic reference to use.
   using atomic_ref = sycl::atomic_ref<int, sycl::memory_order::relaxed,
                                       sycl::memory_scope::device,
                                       sycl::access::address_space::global_space>;

   // Run a kernel that would decrement each value atomically.
   queue.submit([buffer](sycl::handler& h) {
                   h.parallel_for<class atomic_test>(BUFFER_SIZE,
                                                     [buffer](sycl::item<1> id) {
                                                        atomic_ref a(buffer[id.get_linear_id()]);
                                                        a.fetch_sub(5);
                                                     });
                }).wait_and_throw();

   // Check whether the decrement succeeded.
   for (std::size_t i = 0; i < BUFFER_SIZE; ++i) {
      if (buffer[i] != 95) {
         std::cerr << "buffer[" << i << "] = " << buffer[i] << std::endl;
      }
   }

   // Free the buffer.
   sycl::free(buffer, queue);

   // Return gracefully.
   return 0;
}

With this source, I see:

[bash][pcadp04]:sycl-atomics > clang++ -fsycl -fsycl-targets=amdgcn-amd-amdhsa -Xsycl-target-backend --offload-arch=gfx1031 sycl-atomics.cpp
warning: linking module '/mnt/hdd1/software/intel/clang-2022-09/x86_64-centos9-gcc11-opt/lib/clang/16.0.0/../../clc/remangled-l64-signed_char.libspirv-amdgcn--amdhsa.bc': Linking two modules of different target triples: '/mnt/hdd1/software/intel/clang-2022-09/x86_64-centos9-gcc11-opt/lib/clang/16.0.0/../../clc/remangled-l64-signed_char.libspirv-amdgcn--amdhsa.bc' is 'amdgcn-unknown-amdhsa' whereas 'sycl-atomics.cpp' is 'amdgcn-amd-amdhsa'
 [-Wlinker-warnings]
1 warning generated.
[bash][pcadp04]:sycl-atomics > ./a.out 
Using device: AMD Radeon RX 6700 XT
buffer[0] = 100
buffer[1] = 100
buffer[2] = 100
buffer[3] = 100
buffer[4] = 100
buffer[5] = 100
buffer[6] = 100
buffer[7] = 100
buffer[8] = 100
buffer[9] = 100
[bash][pcadp04]:sycl-atomics >

While other backends produce the correct results. Like:

[bash][pcadp04]:sycl-atomics > clang++ -fsycl -fsycl-targets=nvptx64-nvidia-cuda sycl-atomics.cpp
clang-16: warning: CUDA version is newer than the latest supported version 11.5 [-Wunknown-cuda-version]
warning: linking module '/mnt/hdd1/software/intel/clang-2022-09/x86_64-centos9-gcc11-opt/lib/clang/16.0.0/../../clc/remangled-l64-signed_char.libspirv-nvptx64--nvidiacl.bc': Linking two modules of different target triples: '/mnt/hdd1/software/intel/clang-2022-09/x86_64-centos9-gcc11-opt/lib/clang/16.0.0/../../clc/remangled-l64-signed_char.libspirv-nvptx64--nvidiacl.bc' is 'nvptx64-unknown-nvidiacl' whereas 'sycl-atomics.cpp' is 'nvptx64-nvidia-cuda'
 [-Wlinker-warnings]
1 warning generated.
[bash][pcadp04]:sycl-atomics > ./a.out 
Using device: NVIDIA RTX A5000
[bash][pcadp04]:sycl-atomics >

A couple of things to note:

  • When using sycl::buffer for memory management, the issue did not show up. It seems to be limited to using USM.
    • I only tried it with "shared" USM, did not try what happens with "device" USM.
  • I didn't test every atomic operation possible, but for instance fetch_add(...) does work, even with USM. It's fetch_sub(...) that notably doesn't work.

Environment (please complete the following information):

  • OS: CentOS Stream 9, although I've seen the same on Ubuntu 20.04 before as well. (I don't think the exact Linux version matters.)
  • Target device and vendor: AMD GPU
  • DPC++ version: 2022-09
  • Dependencies version: In this latest test I used ROCm-5.3.0, but previously I've seen the same behaviour with ROCm-4.2.0 as well.

Pinging @ivorobts.

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingconfirmedhipIssues related to execution on HIP backend.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions