Skip to content

[SYCL] Reduce the time to get a kernel from cache #4186

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Aug 11, 2021

Conversation

alexanderfle
Copy link
Contributor

@alexanderfle alexanderfle commented Jul 26, 2021

The main idea of this patch is to fast get information from the cache.

Before this patch:

Regular cache requires too many syncs and many searches across multiple maps to get the information.

At first, regular cache requires to get program from cache by getBuiltPIProgram: in most cases, it requires 3 searches in getKernelSetId then under mutex get the program from the cache, checking atomic variable whether it is already built or in the progress. Then using the program under mutex, doing 2 searches, get the stored info from the cache, also checking the atomic variable whether it is already built or in the progress.

After this patch:

For fast cache, it is required under mutex to make 1 search, and if the info is not found it means that the info has not been prepared yet, and then you will need to follow the standard more expensive way.

In addition, the fast cache keeps the program close to the kernel to return the program and reduce search time spending in one PI call in the most common case in commands.cpp.

Signed-off-by: Alexander Flegontov [email protected]

@alexanderfle
Copy link
Contributor Author

/summary:run

@alexanderfle alexanderfle marked this pull request as ready for review July 27, 2021 15:58
@alexanderfle alexanderfle requested a review from a team as a code owner July 27, 2021 15:58
@alexanderfle alexanderfle requested a review from againull July 27, 2021 15:58
@s-kanaev
Copy link
Contributor

@alexanderfle , could you, please, elaborate in patch description on the idea employed in this patch to enhance run-time of cache operations?

@s-kanaev
Copy link
Contributor

s-kanaev commented Aug 2, 2021

For fast cache, it is required under mutex to make 1 search, and if the info is not found it means that the info has not been prepared yet, and then you will need to follow the standard more expensive way.

Is the following sequence correct here upon request to get a kernel?

  1. Lock fast cache mutex
  2. Check if data available in fast cache:
    2.1 If it is, return the data. FINISH
    2.2 If not, proceed to p.3
  3. Unlock fast cache mutex
  4. Create data bits
  5. Lock regular cache mutex(-es)
  6. Insert data into regular cache
  7. Lock fast cache mutex
  8. Insert data into fast cache
  9. Unlock fast cache mutex
  10. Unlock regular cache mutex(-es)
  11. Return data bits created in p.4

@alexanderfle
Copy link
Contributor Author

Is the following sequence correct here upon request to get a kernel?

No, because the implementation of that requires changes getOrBuild() because the fast cache should store also PiProgram. And, currently, PiProgram is unknown in getOrBuild() in the case when we call getOrBuild() to get the kernel.

@alexanderfle
Copy link
Contributor Author

@s-kanaev , ping

Copy link
Contributor

@s-kanaev s-kanaev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good.
Though, I believe, we need tests for fast cache here.

Also, this sequence involves two mutexes:

  1. Lock fast cache mutex
  2. Check if data available in fast cache:
    2.1 If it is, return the data. FINISH + Unlock fast cache mutex
    2.2 If not, proceed to p.3
  3. Unlock fast cache mutex
  4. Create data bits
  5. Lock regular cache mutex(-es)
  6. Insert data into regular cache
  7. Unlock regular cache mutex(-es)
  8. Lock fast cache mutex
  9. Insert data into fast cache
  10. Unlock fast cache mutex
  11. Return data bits created in p.4

Hence, we need a thread safety test also.

Signed-off-by: Alexander Flegontov <[email protected]>
@alexanderfle alexanderfle requested a review from s-kanaev August 11, 2021 13:46
Copy link
Contributor

@s-kanaev s-kanaev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@alexanderfle
Copy link
Contributor Author

@bader, ping

@bader
Copy link
Contributor

bader commented Aug 11, 2021

@bader, ping

@alexanderfle, pong?

@alexanderfle
Copy link
Contributor Author

@bader, no:) I mean, could please have a look at the PR as only authorized user can handle this.

@bader
Copy link
Contributor

bader commented Aug 11, 2021

could please have a look at the PR as only authorized user can handle this.

What do you mean by "handle this"?

@alexanderfle
Copy link
Contributor Author

What do you mean by "handle this"?

It means: handle the merge

@bader
Copy link
Contributor

bader commented Aug 11, 2021

could please have a look at the PR as only authorized user can handle this.

What do you mean by "handle this"?

It means: handle the merge

FYI: There are at least 8 people who can handle the merge (including @againull, who is assigned to review this pull request).
E.g. #4272 is merged by @vladimirlaz and #4281 is merged by @romanovvlad.

@alexanderfle
Copy link
Contributor Author

FYI: There are at least 8 people who can handle the merge (including @againull, who is assigned to review this pull request).
E.g. #4272 is merged by @vladimirlaz and #4281 is merged by @romanovvlad.

ok, I see.

@alexanderfle
Copy link
Contributor Author

@againull, could you take a look, please? Please, merge if it looks okay to you.

Copy link
Contributor

@againull againull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@againull againull merged commit c16705a into intel:sycl Aug 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants