Skip to content

[SYCL] Change NativePrograms.insert to [] access #14873

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 2, 2024

Conversation

RossBrunton
Copy link
Contributor

@RossBrunton RossBrunton commented Jul 31, 2024

map.insert doesn't insert values if the set already contains them. This can happen when UR/PI happens to reuse the same program pointer that it used for a previous program.

--

This was causing some tests in the PI 2 UR conversion to randomly fail, including at least #14765 .

Fixes #14819.

@RossBrunton RossBrunton requested a review from a team as a code owner July 31, 2024 16:14
@RossBrunton RossBrunton requested a review from bso-intel July 31, 2024 16:14
@RossBrunton RossBrunton changed the title Change NativePrograms.insert to [] access [SYCL] Change NativePrograms.insert to [] access Jul 31, 2024
@KornevNikita KornevNikita linked an issue Jul 31, 2024 that may be closed by this pull request
@AlexeySachkov
Copy link
Contributor

Strictly speaking, this is a functional change without a test. I understand that there is probably an out-of-tree test that exposes this, but I also assume that it should be possible to write a unit-test for this so we catch that earlier if someone breaks this again

`map.insert` doesn't insert values if the set already contains them.
This can happen when UR/PI happens to reuse the same program pointer
that it used for a previous program.
@RossBrunton
Copy link
Contributor Author

Strictly speaking, this is a functional change without a test. I understand that there is probably an out-of-tree test that exposes this, but I also assume that it should be possible to write a unit-test for this so we catch that earlier if someone breaks this again

Test added. I also removed the comments - hopefully now that there's a proper test, there's no need to explicitly write out the footgun every time.

@sarnex
Copy link
Contributor

sarnex commented Aug 2, 2024

Merging now to attempt to unbreak CI, CUDA passed on the self-hosted runner and I highly doubt any HIP-specific issue, the runner is totally slammed.

@sarnex sarnex merged commit 4f86ab7 into intel:sycl Aug 2, 2024
12 of 14 checks passed
@sarnex
Copy link
Contributor

sarnex commented Aug 2, 2024

@RossBrunton Seeing some postcommit failures (although build passes now), can you please investigate?

 FAIL: SYCL :: DeviceDependencies/objects.cpp (436 of 2135)
******************** TEST 'SYCL :: DeviceDependencies/objects.cpp' FAILED ********************
Exit Code: -6

Command Output (stdout):
--
# RUN: at line 5
/__w/llvm/llvm/toolchain/bin//clang++ -Werror  -fsycl /__w/llvm/llvm/llvm/sycl/test-e2e/DeviceDependencies/Inputs/a.cpp -I /__w/llvm/llvm/llvm/sycl/test-e2e/DeviceDependencies/Inputs -c -o /__w/llvm/llvm/build-e2e/DeviceDependencies/Output/objects.cpp.tmp_a.o
# executed command: /__w/llvm/llvm/toolchain/bin//clang++ -Werror -fsycl /__w/llvm/llvm/llvm/sycl/test-e2e/DeviceDependencies/Inputs/a.cpp -I /__w/llvm/llvm/llvm/sycl/test-e2e/DeviceDependencies/Inputs -c -o /__w/llvm/llvm/build-e2e/DeviceDependencies/Output/objects.cpp.tmp_a.o
# note: command had no output on stdout or stderr
# RUN: at line 6
/__w/llvm/llvm/toolchain/bin//clang++ -Werror  -fsycl /__w/llvm/llvm/llvm/sycl/test-e2e/DeviceDependencies/Inputs/b.cpp -I /__w/llvm/llvm/llvm/sycl/test-e2e/DeviceDependencies/Inputs -c -o /__w/llvm/llvm/build-e2e/DeviceDependencies/Output/objects.cpp.tmp_b.o
# executed command: /__w/llvm/llvm/toolchain/bin//clang++ -Werror -fsycl /__w/llvm/llvm/llvm/sycl/test-e2e/DeviceDependencies/Inputs/b.cpp -I /__w/llvm/llvm/llvm/sycl/test-e2e/DeviceDependencies/Inputs -c -o /__w/llvm/llvm/build-e2e/DeviceDependencies/Output/objects.cpp.tmp_b.o
# note: command had no output on stdout or stderr
# RUN: at line 7
/__w/llvm/llvm/toolchain/bin//clang++ -Werror  -fsycl /__w/llvm/llvm/llvm/sycl/test-e2e/DeviceDependencies/Inputs/c.cpp -I /__w/llvm/llvm/llvm/sycl/test-e2e/DeviceDependencies/Inputs -c -o /__w/llvm/llvm/build-e2e/DeviceDependencies/Output/objects.cpp.tmp_c.o
# executed command: /__w/llvm/llvm/toolchain/bin//clang++ -Werror -fsycl /__w/llvm/llvm/llvm/sycl/test-e2e/DeviceDependencies/Inputs/c.cpp -I /__w/llvm/llvm/llvm/sycl/test-e2e/DeviceDependencies/Inputs -c -o /__w/llvm/llvm/build-e2e/DeviceDependencies/Output/objects.cpp.tmp_c.o
# note: command had no output on stdout or stderr
# RUN: at line 8
/__w/llvm/llvm/toolchain/bin//clang++ -Werror  -fsycl /__w/llvm/llvm/llvm/sycl/test-e2e/DeviceDependencies/Inputs/d.cpp -I /__w/llvm/llvm/llvm/sycl/test-e2e/DeviceDependencies/Inputs -c -o /__w/llvm/llvm/build-e2e/DeviceDependencies/Output/objects.cpp.tmp_d.o
# executed command: /__w/llvm/llvm/toolchain/bin//clang++ -Werror -fsycl /__w/llvm/llvm/llvm/sycl/test-e2e/DeviceDependencies/Inputs/d.cpp -I /__w/llvm/llvm/llvm/sycl/test-e2e/DeviceDependencies/Inputs -c -o /__w/llvm/llvm/build-e2e/DeviceDependencies/Output/objects.cpp.tmp_d.o
# note: command had no output on stdout or stderr
# RUN: at line 9
/__w/llvm/llvm/toolchain/bin//clang++ -Werror  -fsycl -fsycl-targets=spir64  /__w/llvm/llvm/llvm/sycl/test-e2e/DeviceDependencies/objects.cpp -fsycl-allow-device-dependencies /__w/llvm/llvm/build-e2e/DeviceDependencies/Output/objects.cpp.tmp_a.o /__w/llvm/llvm/build-e2e/DeviceDependencies/Output/objects.cpp.tmp_b.o /__w/llvm/llvm/build-e2e/DeviceDependencies/Output/objects.cpp.tmp_c.o /__w/llvm/llvm/build-e2e/DeviceDependencies/Output/objects.cpp.tmp_d.o -I /__w/llvm/llvm/llvm/sycl/test-e2e/DeviceDependencies/Inputs -o /__w/llvm/llvm/build-e2e/DeviceDependencies/Output/objects.cpp.tmp.out
# executed command: /__w/llvm/llvm/toolchain/bin//clang++ -Werror -fsycl -fsycl-targets=spir64 /__w/llvm/llvm/llvm/sycl/test-e2e/DeviceDependencies/objects.cpp -fsycl-allow-device-dependencies /__w/llvm/llvm/build-e2e/DeviceDependencies/Output/objects.cpp.tmp_a.o /__w/llvm/llvm/build-e2e/DeviceDependencies/Output/objects.cpp.tmp_b.o /__w/llvm/llvm/build-e2e/DeviceDependencies/Output/objects.cpp.tmp_c.o /__w/llvm/llvm/build-e2e/DeviceDependencies/Output/objects.cpp.tmp_d.o -I /__w/llvm/llvm/llvm/sycl/test-e2e/DeviceDependencies/Inputs -o /__w/llvm/llvm/build-e2e/DeviceDependencies/Output/objects.cpp.tmp.out
# note: command had no output on stdout or stderr
# RUN: at line 10
env ONEAPI_DEVICE_SELECTOR=level_zero:gpu  /__w/llvm/llvm/build-e2e/DeviceDependencies/Output/objects.cpp.tmp.out
# executed command: env ONEAPI_DEVICE_SELECTOR=level_zero:gpu /__w/llvm/llvm/build-e2e/DeviceDependencies/Output/objects.cpp.tmp.out
# .---command stderr------------
# | terminate called after throwing an instance of 'sycl::_V1::exception'
# |   what():  Native API failed. Native API returns: 45 (UR_RESULT_ERROR_INVALID_ARGUMENT)
# `-----------------------------
# error: command failed with exit status: -6

--

********************
FAIL: SYCL :: DeviceDependencies/singleDynamicLibrary.cpp (438 of 2135)
******************** TEST 'SYCL :: DeviceDependencies/singleDynamicLibrary.cpp' FAILED ********************
Exit Code: -6

Command Output (stdout):
--
# RUN: at line 6
/__w/llvm/llvm/toolchain/bin//clang++ -Werror  -fsycl -fPIC -shared -fsycl-allow-device-dependencies -I /__w/llvm/llvm/llvm/sycl/test-e2e/DeviceDependencies/Inputs     /__w/llvm/llvm/llvm/sycl/test-e2e/DeviceDependencies/Inputs/a.cpp                                                                  /__w/llvm/llvm/llvm/sycl/test-e2e/DeviceDependencies/Inputs/b.cpp                                                                  /__w/llvm/llvm/llvm/sycl/test-e2e/DeviceDependencies/Inputs/c.cpp                                                                  /__w/llvm/llvm/llvm/sycl/test-e2e/DeviceDependencies/Inputs/d.cpp                                                                  /__w/llvm/llvm/llvm/sycl/test-e2e/DeviceDependencies/Inputs/wrapper.cpp                                                            -o /__w/llvm/llvm/build-e2e/DeviceDependencies/Output/libdevice_single.so
# executed command: /__w/llvm/llvm/toolchain/bin//clang++ -Werror -fsycl -fPIC -shared -fsycl-allow-device-dependencies -I /__w/llvm/llvm/llvm/sycl/test-e2e/DeviceDependencies/Inputs /__w/llvm/llvm/llvm/sycl/test-e2e/DeviceDependencies/Inputs/a.cpp /__w/llvm/llvm/llvm/sycl/test-e2e/DeviceDependencies/Inputs/b.cpp /__w/llvm/llvm/llvm/sycl/test-e2e/DeviceDependencies/Inputs/c.cpp /__w/llvm/llvm/llvm/sycl/test-e2e/DeviceDependencies/Inputs/d.cpp /__w/llvm/llvm/llvm/sycl/test-e2e/DeviceDependencies/Inputs/wrapper.cpp -o /__w/llvm/llvm/build-e2e/DeviceDependencies/Output/libdevice_single.so
# note: command had no output on stdout or stderr
# RUN: at line 14
/__w/llvm/llvm/toolchain/bin//clang++ -Werror  -fsycl -fsycl-targets=spir64  /__w/llvm/llvm/llvm/sycl/test-e2e/DeviceDependencies/singleDynamicLibrary.cpp -I/__w/llvm/llvm/llvm/sycl/test-e2e/DeviceDependencies/Inputs -o /__w/llvm/llvm/build-e2e/DeviceDependencies/Output/singleDynamicLibrary.cpp.tmp.out            -L/__w/llvm/llvm/build-e2e/DeviceDependencies/Output -ldevice_single -Wl,-rpath=/__w/llvm/llvm/build-e2e/DeviceDependencies/Output
# executed command: /__w/llvm/llvm/toolchain/bin//clang++ -Werror -fsycl -fsycl-targets=spir64 /__w/llvm/llvm/llvm/sycl/test-e2e/DeviceDependencies/singleDynamicLibrary.cpp -I/__w/llvm/llvm/llvm/sycl/test-e2e/DeviceDependencies/Inputs -o /__w/llvm/llvm/build-e2e/DeviceDependencies/Output/singleDynamicLibrary.cpp.tmp.out -L/__w/llvm/llvm/build-e2e/DeviceDependencies/Output -ldevice_single -Wl,-rpath=/__w/llvm/llvm/build-e2e/DeviceDependencies/Output
# note: command had no output on stdout or stderr
# RUN: at line 20
env ONEAPI_DEVICE_SELECTOR=level_zero:gpu  /__w/llvm/llvm/build-e2e/DeviceDependencies/Output/singleDynamicLibrary.cpp.tmp.out
# executed command: env ONEAPI_DEVICE_SELECTOR=level_zero:gpu /__w/llvm/llvm/build-e2e/DeviceDependencies/Output/singleDynamicLibrary.cpp.tmp.out
# .---command stderr------------
# | terminate called after throwing an instance of 'sycl::_V1::exception'
# |   what():  Native API failed. Native API returns: 45 (UR_RESULT_ERROR_INVALID_ARGUMENT)
# `-----------------------------
# error: command failed with exit status: -6

--

********************
FAIL: SYCL :: DeviceDependencies/dynamic.cpp (443 of 2135)
******************** TEST 'SYCL :: DeviceDependencies/dynamic.cpp' FAILED ********************
Exit Code: -6

Command Output (stdout):
--
# RUN: at line 6
/__w/llvm/llvm/toolchain/bin//clang++ -Werror  -fsycl -fPIC -shared -fsycl-allow-device-dependencies /__w/llvm/llvm/llvm/sycl/test-e2e/DeviceDependencies/Inputs/a.cpp -I /__w/llvm/llvm/llvm/sycl/test-e2e/DeviceDependencies/Inputs -o /__w/llvm/llvm/build-e2e/DeviceDependencies/Output/libdevice_a.so
# executed command: /__w/llvm/llvm/toolchain/bin//clang++ -Werror -fsycl -fPIC -shared -fsycl-allow-device-dependencies /__w/llvm/llvm/llvm/sycl/test-e2e/DeviceDependencies/Inputs/a.cpp -I /__w/llvm/llvm/llvm/sycl/test-e2e/DeviceDependencies/Inputs -o /__w/llvm/llvm/build-e2e/DeviceDependencies/Output/libdevice_a.so
# note: command had no output on stdout or stderr
# RUN: at line 7
/__w/llvm/llvm/toolchain/bin//clang++ -Werror  -fsycl -fPIC -shared -fsycl-allow-device-dependencies /__w/llvm/llvm/llvm/sycl/test-e2e/DeviceDependencies/Inputs/b.cpp -I /__w/llvm/llvm/llvm/sycl/test-e2e/DeviceDependencies/Inputs -o /__w/llvm/llvm/build-e2e/DeviceDependencies/Output/libdevice_b.so
# executed command: /__w/llvm/llvm/toolchain/bin//clang++ -Werror -fsycl -fPIC -shared -fsycl-allow-device-dependencies /__w/llvm/llvm/llvm/sycl/test-e2e/DeviceDependencies/Inputs/b.cpp -I /__w/llvm/llvm/llvm/sycl/test-e2e/DeviceDependencies/Inputs -o /__w/llvm/llvm/build-e2e/DeviceDependencies/Output/libdevice_b.so
# note: command had no output on stdout or stderr
# RUN: at line 8
/__w/llvm/llvm/toolchain/bin//clang++ -Werror  -fsycl -fPIC -shared -fsycl-allow-device-dependencies /__w/llvm/llvm/llvm/sycl/test-e2e/DeviceDependencies/Inputs/c.cpp -I /__w/llvm/llvm/llvm/sycl/test-e2e/DeviceDependencies/Inputs -o /__w/llvm/llvm/build-e2e/DeviceDependencies/Output/libdevice_c.so
# executed command: /__w/llvm/llvm/toolchain/bin//clang++ -Werror -fsycl -fPIC -shared -fsycl-allow-device-dependencies /__w/llvm/llvm/llvm/sycl/test-e2e/DeviceDependencies/Inputs/c.cpp -I /__w/llvm/llvm/llvm/sycl/test-e2e/DeviceDependencies/Inputs -o /__w/llvm/llvm/build-e2e/DeviceDependencies/Output/libdevice_c.so
# note: command had no output on stdout or stderr
# RUN: at line 9
/__w/llvm/llvm/toolchain/bin//clang++ -Werror  -fsycl -fPIC -shared -fsycl-allow-device-dependencies /__w/llvm/llvm/llvm/sycl/test-e2e/DeviceDependencies/Inputs/d.cpp -I /__w/llvm/llvm/llvm/sycl/test-e2e/DeviceDependencies/Inputs -o /__w/llvm/llvm/build-e2e/DeviceDependencies/Output/libdevice_d.so
# executed command: /__w/llvm/llvm/toolchain/bin//clang++ -Werror -fsycl -fPIC -shared -fsycl-allow-device-dependencies /__w/llvm/llvm/llvm/sycl/test-e2e/DeviceDependencies/Inputs/d.cpp -I /__w/llvm/llvm/llvm/sycl/test-e2e/DeviceDependencies/Inputs -o /__w/llvm/llvm/build-e2e/DeviceDependencies/Output/libdevice_d.so
# note: command had no output on stdout or stderr
# RUN: at line 10
/__w/llvm/llvm/toolchain/bin//clang++ -Werror  -fsycl -fsycl-targets=spir64  /__w/llvm/llvm/llvm/sycl/test-e2e/DeviceDependencies/dynamic.cpp -fsycl-allow-device-dependencies -L/__w/llvm/llvm/build-e2e/DeviceDependencies/Output -ldevice_a -ldevice_b -ldevice_c -ldevice_d -I /__w/llvm/llvm/llvm/sycl/test-e2e/DeviceDependencies/Inputs -o /__w/llvm/llvm/build-e2e/DeviceDependencies/Output/dynamic.cpp.tmp.out -Wl,-rpath=/__w/llvm/llvm/build-e2e/DeviceDependencies/Output
# executed command: /__w/llvm/llvm/toolchain/bin//clang++ -Werror -fsycl -fsycl-targets=spir64 /__w/llvm/llvm/llvm/sycl/test-e2e/DeviceDependencies/dynamic.cpp -fsycl-allow-device-dependencies -L/__w/llvm/llvm/build-e2e/DeviceDependencies/Output -ldevice_a -ldevice_b -ldevice_c -ldevice_d -I /__w/llvm/llvm/llvm/sycl/test-e2e/DeviceDependencies/Inputs -o /__w/llvm/llvm/build-e2e/DeviceDependencies/Output/dynamic.cpp.tmp.out -Wl,-rpath=/__w/llvm/llvm/build-e2e/DeviceDependencies/Output
# note: command had no output on stdout or stderr
# RUN: at line 11
env ONEAPI_DEVICE_SELECTOR=level_zero:gpu  /__w/llvm/llvm/build-e2e/DeviceDependencies/Output/dynamic.cpp.tmp.out
# executed command: env ONEAPI_DEVICE_SELECTOR=level_zero:gpu /__w/llvm/llvm/build-e2e/DeviceDependencies/Output/dynamic.cpp.tmp.out
# .---command stderr------------
# | terminate called after throwing an instance of 'sycl::_V1::exception'
# |   what():  Native API failed. Native API returns: 45 (UR_RESULT_ERROR_INVALID_ARGUMENT)
# `-----------------------------
# error: command failed with exit status: -6

--

@omarahmed1111
Copy link
Contributor

omarahmed1111 commented Aug 2, 2024

Hey @sarnex , I have took a fast look on that and reverted the problematic part here. I could confirm this should fix these failures, but couldn't confirm that it won't introduce the ProgramAndKernel unittest fail again as it was not appearing locally. We would have a proper fix next week, but for now, I tried to just unblock the CI.

@sarnex
Copy link
Contributor

sarnex commented Aug 2, 2024

Thanks, will take a look

sarnex pushed a commit that referenced this pull request Aug 2, 2024
@KornevNikita
Copy link
Contributor

KornevNikita commented Aug 5, 2024

@omarahmed1111 @RossBrunton hi, are you going to re-introduce this patch?
UPD. okay, I thought there is a full revert in #14936

@AlexeySachkov
Copy link
Contributor

I've just realized that this PR could be an incorrect change. 4bf1fe3 made NativePrograms a multimap, but it was accidentally reverted as part of PI removal.

I think that we should re-instate that multimap first and then review its uses to see if any changes are needed.

@RossBrunton
Copy link
Contributor Author

Thanks for looking into this, everyone. I've tested the map->multimap test, and it seems to fix things locally for me. So I'll make an MR reverting this and fixing the map issue.

AlexeySachkov pushed a commit that referenced this pull request Aug 6, 2024
4f86ab replaced `insert` with `[]` in order to fix a pointer re-use
issue, however that was not the correct fix. Instead, a multimap was
incorrectly converted to a regular map during the PI->UR conversion.

This change reverts the rest of 4f86ab (besides the test, which is still
valid), and converts NativePrograms back to a multimap.

---

Thanks to @AlexeySachkov for finding the real issue in
#14873 (comment)
AlexeySachkov pushed a commit to AlexeySachkov/llvm that referenced this pull request Nov 26, 2024
4f86ab replaced `insert` with `[]` in order to fix a pointer re-use
issue, however that was not the correct fix. Instead, a multimap was
incorrectly converted to a regular map during the PI->UR conversion.

This change reverts the rest of 4f86ab (besides the test, which is still
valid), and converts NativePrograms back to a multimap.

---

Thanks to @AlexeySachkov for finding the real issue in
intel#14873 (comment)
@RossBrunton RossBrunton deleted the ross/insertfixsycl branch February 19, 2025 15:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate SYCL CTS regressions
7 participants