Skip to content

[SYCL] Switch NativePrograms back to using multimap #14951

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 6, 2024

Conversation

RossBrunton
Copy link
Contributor

4f86ab replaced insert with [] in order to fix a pointer re-use issue, however that was not the correct fix. Instead, a multimap was incorrectly converted to a regular map during the PI->UR conversion.

This change reverts the rest of 4f86ab (besides the test, which is still valid), and converts NativePrograms back to a multimap.


Thanks to @AlexeySachkov for finding the real issue in #14873 (comment)

4f86ab replaced `insert` with `[]` in order to fix a pointer re-use
issue, however that was not the correct fix. Instead, a multimap was
incorrectly converted to a regular map during the PI->UR conversion.

This change reverts the rest of 4f86ab (besides the test, which is still
valid), and converts NativePrograms back to a multimap.
@RossBrunton RossBrunton requested a review from a team as a code owner August 5, 2024 13:33
Copy link
Contributor

@sergey-semenov sergey-semenov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this approach, NativePrograms could contain invalidated entries if UR program handle happens to be reused for a new program. I think we should at least clear all the existing entries for the UR program handle before inserting new ones (since we know this is a new program whenever we're filling out the NativePrograms map).

@RossBrunton
Copy link
Contributor Author

As far as I can tell, the map is allowed to have multiple programs with the same handle - the code that uses this map will check each program for the correct kernel name. Does the test added in https://github.com/intel/llvm/pull/14873/files cover what you were thinking about, @sergey-semenov ?

I'll be honest, I'm not too familiar with how the NativePrograms map works. However, if we include this change and the previous two changes, the only material change is fixing a failed merge conflict resolution and addition of a test, so in theory the behaviour should be identical to what it was before the PI->UR port.

Shall I create an issue to investigate if this is causing issues, and someone more familiar with the program manager could take a look?

@sergey-semenov
Copy link
Contributor

sergey-semenov commented Aug 5, 2024

As far as I can tell, the map is allowed to have multiple programs with the same handle - the code that uses this map will check each program for the correct kernel name. Does the test added in https://github.com/intel/llvm/pull/14873/files cover what you were thinking about, @sergey-semenov ?

To be more precise, it allows adding multiple device binaries for a single program. If that program gets destroyed, and a new one gets the same UR handle, with this patch we'll keep the old device binaries in addition to new ones. So we could end up fetching information from the wrong device binary.

A test I would expect to fail with this change is if we had two different device binaries with the same kernels but different properties, a program is created from the first binary then destroyed, then another one is created from the second binary and ends up having the same UR handle.

Just like the original issue with the non-multimap NativePrograms, this is something we've had before PI removal, the switch to a multimap just changed how this bug manifests. The multimap's counterpart for the "insert -> []" switch would be clearing existing entries before inserting ones for a new program. I don't mind if it got addressed separately though since the unintentional multimap revert is a bigger issue.

@AlexeySachkov
Copy link
Contributor

A test I would expect to fail with this change is if we had two different device binaries with the same kernels but different properties, a program is created from the first binary then destroyed, then another one is created from the second binary and ends up having the same UR handle.

Just like the original issue with the non-multimap NativePrograms, this is something we've had before PI removal, the switch to a multimap just changed how this bug manifests. The multimap's counterpart for the "insert -> []" switch would be clearing existing entries before inserting ones for a new program. I don't mind if it got addressed separately though since the unintentional multimap revert is a bigger issue.

@sergey-semenov, could you please create a tracker so we don't forget to add the test? I will proceed with merge meanwhile

@AlexeySachkov AlexeySachkov merged commit 81c87e5 into intel:sycl Aug 6, 2024
14 checks passed
AlexeySachkov pushed a commit to AlexeySachkov/llvm that referenced this pull request Nov 26, 2024
4f86ab replaced `insert` with `[]` in order to fix a pointer re-use
issue, however that was not the correct fix. Instead, a multimap was
incorrectly converted to a regular map during the PI->UR conversion.

This change reverts the rest of 4f86ab (besides the test, which is still
valid), and converts NativePrograms back to a multimap.

---

Thanks to @AlexeySachkov for finding the real issue in
intel#14873 (comment)
@RossBrunton RossBrunton deleted the ross/insertunfix branch February 19, 2025 15:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants