[SYCL][ESIMD] Add compile time properties overload of USM block store #11641

sarnex · 2023-10-24T14:54:13Z

This change adds the groundwork for adding overloads of the block_store APIs accepting compile time properties. We have 8 overloads total, with various combinations of offset, predicate and simd_view.

Signed-off-by: Sarnie, Nick <[email protected]>

sarnex · 2023-10-26T20:19:31Z

sycl/test-e2e/ESIMD/unified_memory_api/Inputs/block_store.hpp

+      testUSM<T, 33, !CheckMask, CheckProperties>(Q, 2, 4, AlignOnlyProps);
+  // TODO: Enable after failure fixed
+  // Passed &=
+  //    testUSM<T, 67, !CheckMask, CheckProperties>(Q, 1, 4, AlignOnlyProps);


This test case fails even when using the old API, I reproduced it in a standalone test case. I wanted to see what happens before we moved to intrinsics, but we actually assert that the size is a multiple of 16, so we couldn't do it in the old way. I made an internal tracker for this, it should be unrelated to this PR, it just exposed the test case. Maybe the test is wrong and only this case exposes it, but I don't see where.

My first guess is that GPU BE lowers LLVM IR store <T x 67> incorrectly.

That was my guess too but I did not have enough courage to say it :)

We still need to analyse/check it on our side first.

Of course, I have an internal tracker assigned to me for this.

v-klochkov

Looks really good. I have several comments - all of the are minor.

sycl/include/sycl/ext/intel/esimd/detail/memory_intrin.hpp

sycl/include/sycl/ext/intel/experimental/esimd/memory.hpp

v-klochkov · 2023-10-26T21:52:14Z

sycl/test-e2e/ESIMD/unified_memory_api/Inputs/block_store.hpp

+      testUSM<T, 33, !CheckMask, CheckProperties>(Q, 2, 4, AlignOnlyProps);
+  // TODO: Enable after failure fixed
+  // Passed &=
+  //    testUSM<T, 67, !CheckMask, CheckProperties>(Q, 1, 4, AlignOnlyProps);


My first guess is that GPU BE lowers LLVM IR store <T x 67> incorrectly.

sycl/include/sycl/ext/intel/esimd/memory.hpp

v-klochkov · 2023-10-27T01:21:57Z

sycl/include/sycl/ext/intel/esimd/memory.hpp

+/// Alignment: If \p props does not specify the 'alignment' property, then
+/// the default assumed alignment is the minimally required element-size
+/// alignment. Note that additional/temporary restrictions may apply
+/// (see Restrictions below).


Major concern here: if apply the statements here, then the new block_store(usm_ptr, value) call may work slower than the old one because on Gen12 STORE requires 16-bytes alignment to produce block_store operation. Otherwise, scatter will be generated.

Assuming 4-byte alignment for store of simd<int, N> is valid, but will be a regression.
I think we need to raise the expected alignment to 16-bytes if cache-hints are not passed and predicate is not used. That will be consistent with the old block_store that had the default alignment = overaligned_tagdetail::OperandSize::OWORD

I believe I implemented this in my latest commit and updated comments as necessary, but please review carefully.

Signed-off-by: Sarnie, Nick <[email protected]>

v-klochkov · 2023-10-27T16:52:35Z

sycl/include/sycl/ext/intel/esimd/memory.hpp

@@ -1536,14 +1539,25 @@ block_store(T *ptr, simd<T, N> vals, simd_mask<1> pred,
  static_assert(!PropertyListT::template has_property<cache_hint_L3_key>(),
                "L3 cache hint is reserved. The old/experimental L3 LSC cache "
                "hint is cache_level::L2 now.");
+  bool ShouldUseOWordDefaultAlign =


I believe we should not having this dynamic check/jump and generating 2 different versions of block-load for 1 user's block_load() call.
Let's say that because this variant of the function (accepting a predicate) is called, we use the DG2/PVC restrictions.

yeah good feedback, i wasnt sure which direction to go, thanks

addressed in latest commit hopefully

…estrictions Signed-off-by: Sarnie, Nick <[email protected]>

Signed-off-by: Sarnie, Nick <[email protected]>

v-klochkov · 2023-10-31T15:02:13Z

sycl/test-e2e/ESIMD/unified_memory_api/Inputs/block_store.hpp

+         simd<uint32_t, N> PassThruInt(ElemOff, 1);
+         simd<T, N> Vals = PassThruInt;
+         if constexpr (UseMask) {
+           simd_mask<1> Mask = (GlobalID + 1) % 1;


Hi Nick,
I just found an error in block_load.hpp, and will fix it soon and it seems you copy-pasted it to this block_store.hpp.
The code that was supposed to be here and in few other places is: simd_mask<1> Mask = (GlobalID + 1) & 0x1;

(Val % 1) always gives 0.
Can you please fix it in block_store.hpp file and test if your patch still works correctly.

Will do this now, thanks for the heads up

Luckily it passes, I'm making a PR now

…11641) This change adds the groundwork for adding overloads of the block_store APIs accepting compile time properties (L1,L2 cache hints, alignment). We have 8 overloads total, with various combinations of offset, predicate and simd_view. --------- Signed-off-by: Sarnie, Nick <[email protected]>

sarnex temporarily deployed to WindowsCILock October 24, 2023 14:57 — with GitHub Actions Inactive

sarnex had a problem deploying to WindowsCILock October 24, 2023 15:28 — with GitHub Actions Failure

sarnex force-pushed the blockstore branch from e3a4d0e to 64aa375 Compare October 24, 2023 17:38

sarnex temporarily deployed to WindowsCILock October 24, 2023 18:03 — with GitHub Actions Inactive

sarnex had a problem deploying to WindowsCILock October 24, 2023 19:30 — with GitHub Actions Failure

sarnex force-pushed the blockstore branch from 64aa375 to 3abe32e Compare October 25, 2023 15:35

sarnex temporarily deployed to WindowsCILock October 25, 2023 15:47 — with GitHub Actions Inactive

sarnex temporarily deployed to WindowsCILock October 25, 2023 16:23 — with GitHub Actions Inactive

sarnex force-pushed the blockstore branch from 3abe32e to b7a3f68 Compare October 26, 2023 17:01

sarnex temporarily deployed to WindowsCILock October 26, 2023 17:02 — with GitHub Actions Inactive

sarnex force-pushed the blockstore branch from b7a3f68 to 5225a06 Compare October 26, 2023 17:05

sarnex temporarily deployed to WindowsCILock October 26, 2023 17:07 — with GitHub Actions Inactive

[SYCL][ESIMD] Add compile time properties overload of USM block store

45de16e

Signed-off-by: Sarnie, Nick <[email protected]>

sarnex force-pushed the blockstore branch from 5225a06 to 45de16e Compare October 26, 2023 17:07

sarnex temporarily deployed to WindowsCILock October 26, 2023 17:14 — with GitHub Actions Inactive

sarnex temporarily deployed to WindowsCILock October 26, 2023 17:48 — with GitHub Actions Inactive

sarnex commented Oct 26, 2023

View reviewed changes

sarnex marked this pull request as ready for review October 26, 2023 20:22

sarnex requested a review from a team as a code owner October 26, 2023 20:22

v-klochkov reviewed Oct 26, 2023

View reviewed changes

v-klochkov reviewed Oct 27, 2023

View reviewed changes

Address review feedback

746fca6

Signed-off-by: Sarnie, Nick <[email protected]>

sarnex temporarily deployed to WindowsCILock October 27, 2023 16:09 — with GitHub Actions Inactive

sarnex temporarily deployed to WindowsCILock October 27, 2023 16:41 — with GitHub Actions Inactive

v-klochkov reviewed Oct 27, 2023

View reviewed changes

sarnex added 2 commits October 27, 2023 10:26

Make overload accepting predicate require element alignment and PVC r…

3ceb242

…estrictions Signed-off-by: Sarnie, Nick <[email protected]>

typo

94d7132

Signed-off-by: Sarnie, Nick <[email protected]>

sarnex temporarily deployed to WindowsCILock October 27, 2023 17:45 — with GitHub Actions Inactive

sarnex temporarily deployed to WindowsCILock October 27, 2023 18:58 — with GitHub Actions Inactive

sarnex requested a review from v-klochkov October 30, 2023 15:05

v-klochkov approved these changes Oct 30, 2023

View reviewed changes

v-klochkov merged commit d38206c into intel:sycl Oct 30, 2023

v-klochkov reviewed Oct 31, 2023

View reviewed changes

[SYCL][ESIMD] Add compile time properties overload of USM block store #11641

[SYCL][ESIMD] Add compile time properties overload of USM block store #11641

Uh oh!

Conversation

sarnex commented Oct 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sarnex Oct 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

v-klochkov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

v-klochkov Oct 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sarnex commented Oct 24, 2023 •

edited

Loading

sarnex Oct 26, 2023 •

edited

Loading

v-klochkov Oct 31, 2023 •

edited

Loading