Skip to content

[Driver][SYCL] Improve integration header usage when performing offload #3471

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Apr 7, 2021

Conversation

mdtoguchi
Copy link
Contributor

When generating the integration header, we were performing an additional
compilation step which is not needed, as we can create the output file
and the integration header at the same time.

Improve this behavior by tracking the integration files based on given
input files, creating and using the files accordingly when needed during
the compilation flow.

When generating the integration header, we were performing an additional
compilation step which is not needed, as we can create the output file
and the integration header at the same time.

Improve this behavior by tracking the integration files based on given
input files, creating and using the files accordingly when needed during
the compilation flow.
@mdtoguchi mdtoguchi requested a review from AGindinson as a code owner April 2, 2021 00:57
@bader
Copy link
Contributor

bader commented Apr 2, 2021

@mdtoguchi, great improvement!
By coincidence, we recently discussed with @Naghasan and @keryell what approach should be upstreamed to llorg.
See these two threads: https://reviews.llvm.org/D99190?id=333780#inline-935611 and https://reviews.llvm.org/D99190?id=333780#inline-935593.

In nutshell, @Naghasan suggests we consider using a host compiler to emit the data from the integration header. OpenMP-offload uses such approach for binding offloaded code with the host application. I assume CUDA and HIP compilers are doing something similar. I was going to check if we can apply this approach to SYCL and gather all pros and cons for both approaches.
This patch eliminates one of the main disadvantages of integration header approach, but we probably should checks with the community if this approach can be accepted to llorg.

Tagging @erichkeane and @AaronBallman.

@@ -5113,6 +5107,21 @@ void Driver::BuildActions(Compilation &C, DerivedArgList &Args,

handleArguments(C, Args, Inputs, Actions);

// When compiling for -fsycl, generate the integration header files that
// will be used during the compilation.
if (Args.hasFlag(options::OPT_fsycl, options::OPT_fno_sycl, false)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be also working when using -fsycl-device-only ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-fsycl-device-only doesn't perform the host compilation, so the generated integration header would not be used. Unless there is a use model for it, I don't think it's necessary.

@erichkeane
Copy link
Contributor

@mdtoguchi, great improvement!
By coincidence, we recently discussed with @Naghasan and @keryell what approach should be upstreamed to llorg.
See these two threads: https://reviews.llvm.org/D99190?id=333780#inline-935611 and https://reviews.llvm.org/D99190?id=333780#inline-935593.

In nutshell, @Naghasan suggests we consider using a host compiler to emit the data from the integration header. OpenMP-offload uses such approach for binding offloaded code with the host application. I assume CUDA and HIP compilers are doing something similar. I was going to check if we can apply this approach to SYCL and gather all pros and cons for both approaches.
This patch eliminates one of the main disadvantages of integration header approach, but we probably should checks with the community if this approach can be accepted to llorg.

Tagging @erichkeane and @AaronBallman.

I don't think we can do it in host mode for 2 reasons:
1- the information about the actual kernels themselves is only guaranteed to be present during device compile
2- We don't necessarily control the host compiler. The SYCL design permits arbitrary host compilers, so doing this during integration header isn't particularly possible.

@bader
Copy link
Contributor

bader commented Apr 2, 2021

I created #3474 to discuss the approach for upstreaming to llorg.

@Naghasan
Copy link
Contributor

Naghasan commented Apr 2, 2021

I don't think we can do it in host mode for 2 reasons:
1- the information about the actual kernels themselves is only guaranteed to be present during device compile

Well if they are missing during the host compilation, I don't see how you can invoke them.

2- We don't necessarily control the host compiler. The SYCL design permits arbitrary host compilers, so doing this during integration header isn't particularly possible.

The design permits this but it doesn't mandate it, this is an implementation design choice not a spec requirement. The 1.2.1 use to describe a single pass compiler as an alternative to the multi-pass approach BTW.

@erichkeane
Copy link
Contributor

I don't think we can do it in host mode for 2 reasons:
1- the information about the actual kernels themselves is only guaranteed to be present during device compile

Well if they are missing during the host compilation, I don't see how you can invoke them.

2- We don't necessarily control the host compiler. The SYCL design permits arbitrary host compilers, so doing this during integration header isn't particularly possible.

The design permits this but it doesn't mandate it, this is an implementation design choice not a spec requirement. The 1.2.1 use to describe a single pass compiler as an alternative to the multi-pass approach BTW.

The whole purpose of the integration header is to provide information to the host compiler that is only available at device compilation though.

As far as a single-pass compiler, great! But we don't have one. As far as a separate host compiler, our implementation has that requirement.

stdcpp_compat.cpp was malformed with the options used
layout_accessors_host.cpp was expecting output from 3 compilations, which
is now down to 2.
@mdtoguchi mdtoguchi requested a review from a team as a code owner April 3, 2021 00:51
@mdtoguchi mdtoguchi requested a review from vladimirlaz April 3, 2021 00:51
Copy link
Contributor

@AGindinson AGindinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally agree with @erichkeane on the subject of host compiler-based strategy - emitting the header during the device FE compilation is the only viable approach if we're to adhere to the "custom host compiler" feature requirement. Not to mention that the chosen approach is a simpler one. In a nutshell, I believe that if there is no strong community pushback against device-side emission of the integration header, we should stick to what this PR currently provides.

Overall, the improving changes look good to me - much cleaner and far less hacky than I personally expected from the filename-mapping approach. I'll need to spend some more time on reading the code itself, hitting "Approve" right after that iteration.

P. S. Coming back from a vacation and seeing an important solution like this one adds up to the positive review experience :)

AGindinson
AGindinson previously approved these changes Apr 6, 2021
Copy link
Contributor

@AGindinson AGindinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation LGTM!

@bader
Copy link
Contributor

bader commented Apr 7, 2021

@vladimirlaz, @intel/llvm-reviewers-runtime, ping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants