-
Notifications
You must be signed in to change notification settings - Fork 788
[SYCL][ESIMD] Add compile time properties overload of USM block store #11641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Sarnie, Nick <[email protected]>
testUSM<T, 33, !CheckMask, CheckProperties>(Q, 2, 4, AlignOnlyProps); | ||
// TODO: Enable after failure fixed | ||
// Passed &= | ||
// testUSM<T, 67, !CheckMask, CheckProperties>(Q, 1, 4, AlignOnlyProps); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test case fails even when using the old API, I reproduced it in a standalone test case. I wanted to see what happens before we moved to intrinsics, but we actually assert that the size is a multiple of 16, so we couldn't do it in the old way. I made an internal tracker for this, it should be unrelated to this PR, it just exposed the test case. Maybe the test is wrong and only this case exposes it, but I don't see where.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My first guess is that GPU BE lowers LLVM IR store <T x 67> incorrectly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was my guess too but I did not have enough courage to say it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still need to analyse/check it on our side first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, I have an internal tracker assigned to me for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really good. I have several comments - all of the are minor.
testUSM<T, 33, !CheckMask, CheckProperties>(Q, 2, 4, AlignOnlyProps); | ||
// TODO: Enable after failure fixed | ||
// Passed &= | ||
// testUSM<T, 67, !CheckMask, CheckProperties>(Q, 1, 4, AlignOnlyProps); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My first guess is that GPU BE lowers LLVM IR store <T x 67> incorrectly.
/// Alignment: If \p props does not specify the 'alignment' property, then | ||
/// the default assumed alignment is the minimally required element-size | ||
/// alignment. Note that additional/temporary restrictions may apply | ||
/// (see Restrictions below). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Major concern here: if apply the statements here, then the new block_store(usm_ptr, value) call may work slower than the old one because on Gen12 STORE requires 16-bytes alignment to produce block_store operation. Otherwise, scatter will be generated.
Assuming 4-byte alignment for store of simd<int, N> is valid, but will be a regression.
I think we need to raise the expected alignment to 16-bytes if cache-hints are not passed and predicate is not used. That will be consistent with the old block_store that had the default alignment = overaligned_tagdetail::OperandSize::OWORD
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe I implemented this in my latest commit and updated comments as necessary, but please review carefully.
Signed-off-by: Sarnie, Nick <[email protected]>
@@ -1536,14 +1539,25 @@ block_store(T *ptr, simd<T, N> vals, simd_mask<1> pred, | |||
static_assert(!PropertyListT::template has_property<cache_hint_L3_key>(), | |||
"L3 cache hint is reserved. The old/experimental L3 LSC cache " | |||
"hint is cache_level::L2 now."); | |||
bool ShouldUseOWordDefaultAlign = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we should not having this dynamic check/jump and generating 2 different versions of block-load for 1 user's block_load() call.
Let's say that because this variant of the function (accepting a predicate) is called, we use the DG2/PVC restrictions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah good feedback, i wasnt sure which direction to go, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed in latest commit hopefully
…estrictions Signed-off-by: Sarnie, Nick <[email protected]>
Signed-off-by: Sarnie, Nick <[email protected]>
simd<uint32_t, N> PassThruInt(ElemOff, 1); | ||
simd<T, N> Vals = PassThruInt; | ||
if constexpr (UseMask) { | ||
simd_mask<1> Mask = (GlobalID + 1) % 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Nick,
I just found an error in block_load.hpp, and will fix it soon and it seems you copy-pasted it to this block_store.hpp.
The code that was supposed to be here and in few other places is: simd_mask<1> Mask = (GlobalID + 1) & 0x1;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Val % 1) always gives 0.
Can you please fix it in block_store.hpp file and test if your patch still works correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do this now, thanks for the heads up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Luckily it passes, I'm making a PR now
…11641) This change adds the groundwork for adding overloads of the block_store APIs accepting compile time properties (L1,L2 cache hints, alignment). We have 8 overloads total, with various combinations of offset, predicate and simd_view. --------- Signed-off-by: Sarnie, Nick <[email protected]>
This change adds the groundwork for adding overloads of the block_store APIs accepting compile time properties. We have 8 overloads total, with various combinations of offset, predicate and simd_view.