Skip to content

Conversation

@0x12CC
Copy link
Contributor

@0x12CC 0x12CC commented Sep 28, 2023

#10334 causes a performance regression since HostPtr can't be reused when it's read-only. This PR fixes the regression by deferring the copy operation to the creation of a writable accessor. It includes following the changes:

  • A new SYCLMemObjT::MCreateShadowCopy to defer allocation. When the HostPtr cannot be reused since it's read-only, SYCLMemObjT::handleHostData sets this member to a function that will allocate the shadow copy.
  • A new SYCLMemObjT::handleWriteAccessorCreation member function. This function calls SYCLMemObjT::MCreateShadowCopy and updates any existing MAllocaCommands if MUserPtr changed.
  • Whenever a writable host or device accessor is created, handleWriteAccessorCreation gets called to ensure that any required memory allocation occurs.

With this change, the allocation and copying overhead occurs during the creation of the first writable accessor. There's no overhead if all of the relevant accessors use sycl::access_mode::read.

intel#10334 causes a performance regression since `HostPtr` cannot be reused
when it is read-only. This change fixes the regression by deferring the
copy operation to the creation of a writable accessor.

Signed-off-by: Michael Aziz <[email protected]>
@aelovikov-intel
Copy link
Contributor

Could there be any race condition if we'd have two concurrent threads with different read/write properties? IIUC, there will be no UB as long as the actual accesses are all read-only.

Signed-off-by: Michael Aziz <[email protected]>
@0x12CC 0x12CC temporarily deployed to WindowsCILock September 28, 2023 19:44 — with GitHub Actions Inactive
@0x12CC 0x12CC temporarily deployed to WindowsCILock September 28, 2023 20:11 — with GitHub Actions Inactive
@0x12CC
Copy link
Contributor Author

0x12CC commented Sep 29, 2023

Could there be any race condition if we'd have two concurrent threads with different read/write properties? IIUC, there will be no UB as long as the actual accesses are all read-only.

That's a good question. I'm not certain if my original implementation could cause race conditions so I've updated it. I moved the shadow copy creation to the Scheduler::GraphBuilder so that it runs within a write lock. This change should prevent any races between different accessors.

@0x12CC 0x12CC temporarily deployed to WindowsCILock September 29, 2023 14:57 — with GitHub Actions Inactive
@0x12CC 0x12CC temporarily deployed to WindowsCILock September 29, 2023 15:16 — with GitHub Actions Inactive
@0x12CC 0x12CC temporarily deployed to WindowsCILock September 29, 2023 16:23 — with GitHub Actions Inactive
Copy link
Contributor

@steffenlarsen steffenlarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! @aelovikov-intel - Would you like another look?

Comment on lines +230 to +236
if (MRecord != nullptr && MUserPtr != InitialUserPtr) {
for (auto &it : MRecord->MAllocaCommands) {
if (it->MMemAllocation == InitialUserPtr) {
it->MMemAllocation = MUserPtr;
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny nit to avoid a bit of indentation. If you prefer the current version I am also okay with sticking with it.

Suggested change
if (MRecord != nullptr && MUserPtr != InitialUserPtr) {
for (auto &it : MRecord->MAllocaCommands) {
if (it->MMemAllocation == InitialUserPtr) {
it->MMemAllocation = MUserPtr;
}
}
}
if (!MRecord || MUserPtr == InitialUserPtr)
return;
for (auto &it : MRecord->MAllocaCommands)
if (it->MMemAllocation == InitialUserPtr)
it->MMemAllocation = MUserPtr;

@aelovikov-intel
Copy link
Contributor

LGTM! @aelovikov-intel - Would you like another look?

I'm good with your review.

@0x12CC , this needs a merge with conflicts resolution.

@0x12CC 0x12CC temporarily deployed to WindowsCILock October 10, 2023 14:09 — with GitHub Actions Inactive
@0x12CC 0x12CC temporarily deployed to WindowsCILock October 10, 2023 14:39 — with GitHub Actions Inactive
@steffenlarsen steffenlarsen merged commit e9b9d45 into intel:sycl Oct 11, 2023
@0x12CC 0x12CC deleted the buffer_defer_copy branch October 11, 2023 14:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants