Skip to content

[BuildBot] Uplift GPU RT version for Linux CI Process #3386

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

bb-sycl
Copy link
Contributor

@bb-sycl bb-sycl commented Mar 21, 2021

Uplift GPU RT version for Linux to 21.11.19310

@yanfeng3721
Copy link
Contributor

/summary:run

@yanfeng3721
Copy link
Contributor

/summary:run

smaslov-intel
smaslov-intel previously approved these changes Mar 24, 2021
@yanfeng3721 yanfeng3721 marked this pull request as ready for review March 24, 2021 14:50
@yanfeng3721 yanfeng3721 requested a review from bader as a code owner March 24, 2021 14:50
Copy link
Contributor

@bader bader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, pass pre-commit validation.

@yanfeng3721
Copy link
Contributor

yanfeng3721 commented Mar 24, 2021

The following 4 tests unexpected pass on level_zero:gpu
SYCL :: ESIMD/spec_const/spec_const_char.cpp
SYCL :: ESIMD/spec_const/spec_const_short.cpp
SYCL :: ESIMD/spec_const/spec_const_uchar.cpp
SYCL :: ESIMD/spec_const/spec_const_ushort.cpp
The following 2 tests unexpected pass on level_zero:gpu/opencl:gpu
SYCL :: InlineAsm/asm_loop.cpp
SYCL :: InlineAsm/asm_switch.cpp

Need to remove llvm-test-suite "//XFAIL: level_zero" for ESIMD test, and "//XFAIL:*" for InlineAsm test after #3386 is merged.

@vladimirlaz, could you please help to update the llvm-test-suite before/after PR3386 is merged? Thanks.

@vladimirlaz
Copy link
Contributor

The following 4 tests unexpected pass on level_zero:gpu
SYCL :: ESIMD/spec_const/spec_const_char.cpp
SYCL :: ESIMD/spec_const/spec_const_short.cpp
SYCL :: ESIMD/spec_const/spec_const_uchar.cpp
SYCL :: ESIMD/spec_const/spec_const_ushort.cpp
The following 2 tests unexpected pass on level_zero:gpu/opencl:gpu
SYCL :: InlineAsm/asm_loop.cpp
SYCL :: InlineAsm/asm_switch.cpp

Need to remove llvm-test-suite "//XFAIL: level_zero" for ESIMD test, and "//XFAIL:*" for InlineAsm test after #3386 is merged.

@vladimirlaz, could you please help to update the llvm-test-suite before/after PR3386 is merged? Thanks.

@yanfeng3721, can you do it yourself? It is quite trivial change

@yanfeng3721
Copy link
Contributor

The following 4 tests unexpected pass on level_zero:gpu
SYCL :: ESIMD/spec_const/spec_const_char.cpp
SYCL :: ESIMD/spec_const/spec_const_short.cpp
SYCL :: ESIMD/spec_const/spec_const_uchar.cpp
SYCL :: ESIMD/spec_const/spec_const_ushort.cpp
The following 2 tests unexpected pass on level_zero:gpu/opencl:gpu
SYCL :: InlineAsm/asm_loop.cpp
SYCL :: InlineAsm/asm_switch.cpp
Need to remove llvm-test-suite "//XFAIL: level_zero" for ESIMD test, and "//XFAIL:*" for InlineAsm test after #3386 is merged.
@vladimirlaz, could you please help to update the llvm-test-suite before/after PR3386 is merged? Thanks.

@yanfeng3721, can you do it yourself? It is quite trivial change

@vladimirlaz ,sure. I have prepared a PR intel/llvm-test-suite#193, could you please help to review? Thanks.

@bader
Copy link
Contributor

bader commented Mar 25, 2021

intel/llvm-test-suite#193 is merge. Could you restart pre-commit testing to confirm that it passes now, please?

yanfeng3721
yanfeng3721 previously approved these changes Mar 26, 2021
@bader
Copy link
Contributor

bader commented Mar 26, 2021

It look like L0 loader update broke sycl-ls tool on Windows.
Does it require Windows drivers update too?

@yanfeng3721 yanfeng3721 dismissed stale reviews from smaslov-intel and themself via 3b26b00 March 29, 2021 05:49
@yanfeng3721
Copy link
Contributor

yanfeng3721 commented Mar 29, 2021

@bader , yes, sycl-ls crash with the following error info:
'sycl-ls.exe' (Win32): Loaded 'C:\Windows\System32\DriverStore\FileRepository\iigd_dch.inf_amd64_f48be35b3d221b44\igc64.dll'.
'sycl-ls.exe' (Win32): Loaded 'C:\Windows\System32\DriverStore\FileRepository\iigd_dch.inf_amd64_f48be35b3d221b44\ze_intel_gpu64.dll'.
Exception thrown at 0x00007FF845919129 in sycl-ls.exe: Microsoft C++ exception: cl::sycl::runtime_error at memory location 0x000000084A98F650.
Unhandled exception at 0x00007FF845919129 in sycl-ls.exe: Microsoft C++ exception: cl::sycl::runtime_error at memory location 0x000000084A98F650.

@bader
Copy link
Contributor

bader commented Mar 29, 2021

@bader , yes, sycl-ls crash with the following error info:
Unhandled exception at 0x00007FF845919129 in sycl-ls.exe: Microsoft C++ exception: cl::sycl::runtime_error at memory location 0x000000084A98F650.

@intel/llvm-reviewers-runtime, FYI.

@yanfeng3721
Copy link
Contributor

/summary:run

@yanfeng3721
Copy link
Contributor

@bader , yes, sycl-ls crash with the following error info:
Unhandled exception at 0x00007FF845919129 in sycl-ls.exe: Microsoft C++ exception: cl::sycl::runtime_error at memory location 0x000000084A98F650.

@intel/llvm-reviewers-runtime, FYI.

Hi @bader @smaslov-intel , the regression detected in llvm-test-suites is gone, but the sycl-ls check in buildbot/sycl-win-x64-pr is still failed, and the backtrace of the failures is quite the same. May I ask if there is any change has been made during the past few days to resolve the regression? Thanks.
Exp run 3 case(s) failed. Failed case(s): [SYCL :: Plugin/sycl-ls-gpu-default.cpp, SYCL :: Plugin/sycl-ls-gpu-level-zero.cpp, SYCL :: Plugin/sycl-ls.cpp]

@smaslov-intel
Copy link
Contributor

May I ask if there is any change has been made during the past few days to resolve the regression?

I am not aware of any changes that were done to resolve this.

sycl-ls crash with the following error info:
Unhandled exception at 0x00007FF845919129 in sycl-ls.exe: Microsoft C++ exception: cl::sycl::runtime_error at memory location 0x000000084A98F650.

@againull does it bell any rings to you?

@yanfeng3721
Copy link
Contributor

May I ask if there is any change has been made during the past few days to resolve the regression?

I am not aware of any changes that were done to resolve this.

sycl-ls crash with the following error info:
Unhandled exception at 0x00007FF845919129 in sycl-ls.exe: Microsoft C++ exception: cl::sycl::runtime_error at memory location 0x000000084A98F650.

@againull does it bell any rings to you?

Is the following information helpful?

---> piPlatformsGet(
: 0

: 000000C7A258F9A8
) ---> pi_result : -999

@smaslov-intel
Copy link
Contributor

---> piPlatformsGet(
: 0

: 000000C7A258F9A8
) ---> pi_result : -999

Yeah, it looks like zeInit call to initialize Level-Zero driver failed for unknown reason. Are you testing with compatible loader & driver? Which versions? How to reproduce this locally?

@againull
Copy link
Contributor

againull commented Apr 1, 2021

May I ask if there is any change has been made during the past few days to resolve the regression?

I am not aware of any changes that were done to resolve this.

sycl-ls crash with the following error info:
Unhandled exception at 0x00007FF845919129 in sycl-ls.exe: Microsoft C++ exception: cl::sycl::runtime_error at memory location 0x000000084A98F650.

@againull does it bell any rings to you?

Sorry, unfortunately I don't know root cause from the top of my head, needs investigation.

@smaslov-intel
Copy link
Contributor

FWIW, #3470 is going to make SYCL RT to handles more gracefully plugin initialization failure.

@yanfeng3721 yanfeng3721 self-requested a review April 2, 2021 02:02
@yanfeng3721
Copy link
Contributor

yanfeng3721 commented Apr 7, 2021

FWIW, #3470 is going to make SYCL RT to handles more gracefully plugin initialization failure.

@sergey-semenov , thanks for the fix. The crash issue is not reproduce any more. But the sycl-ls check is still failed due to unable to find L0GPU device. It looks like some binary is missing in ~\llvm\build\install\bin folder, because the sycl-ls check can pass if I run it from ~\llvm\build\bin fodler.
Unexpected failure
~\llvm\build\install\bin\sycl-ls.exe --verbose Platforms: 1 Platform [#1]: Version : OpenCL 3.0 Name : Intel(R) OpenCL HD Graphics Vendor : Intel(R) Corporation Devices : 1 Device [#0]: Type : GPU Version : 3.0 Name : Intel(R) UHD Graphics 630 Vendor : Intel(R) Corporation Driver : 27.20.100.9030 default_selector() : GPU : Intel(R) OpenCL HD Graphics 3.0 [27.20.100.9030] host_selector() : No device of requested type available. -1 (CL_DEVI... accelerator_selector() : No device of requested type available. -1 (CL_DEVI... cpu_selector() : No device of requested type available. -1 (CL_DEVI... gpu_selector() : GPU : Intel(R) OpenCL HD Graphics 3.0 [27.20.100.9030] custom_selector(gpu) : GPU : Intel(R) OpenCL HD Graphics 3.0 [27.20.100.9030] custom_selector(cpu) : No device of requested type available. -1 (CL_DEVI... custom_selector(acc) : No device of requested type available. -1 (CL_DEVI...

Expected pass
~\llvm\build\bin\sycl-ls.exe --verbose Platforms: 3 Platform [#1]: Version : OpenCL 3.0 Name : Intel(R) OpenCL HD Graphics Vendor : Intel(R) Corporation Devices : 1 Device [#0]: Type : GPU Version : 3.0 Name : Intel(R) UHD Graphics 630 Vendor : Intel(R) Corporation Driver : 27.20.100.9030 Platform [#2]: Version : 1.0 Name : Intel(R) Level-Zero Vendor : Intel(R) Corporation Devices : 1 Device [#1]: Type : GPU Version : 1.0 Name : Intel(R) Graphics Gen9 [0x3e92] Vendor : Intel(R) Corporation Driver : 1.0.0 Platform [#3]: Version : 1.2 Name : SYCL host platform Vendor : Devices : 1 Device [#2]: Type : HOST Version : 1.2 Name : SYCL host device Vendor : Driver : 1.2 default_selector() : GPU : Intel(R) Level-Zero 1.0 [1.0.0] host_selector() : HOST: SYCL host platform 1.2 [1.2] accelerator_selector() : No device of requested type available. -1 (CL_DEVI... cpu_selector() : No device of requested type available. -1 (CL_DEVI... gpu_selector() : GPU : Intel(R) Level-Zero 1.0 [1.0.0] custom_selector(gpu) : GPU : Intel(R) Level-Zero 1.0 [1.0.0] custom_selector(cpu) : No device of requested type available. -1 (CL_DEVI... custom_selector(acc) : No device of requested type available. -1 (CL_DEVI...

@yanfeng3721
Copy link
Contributor

yanfeng3721 commented Apr 9, 2021

@bader @sergey-semenov , the latest public Windows igfx driver igfx_win10_100.9316( https://downloadcenter.intel.com/download/30266/Intel-Graphics-Windows-10-DCH-Drivers) is not compatible with L0 Loader 1.2.3.
Do you have any suggestion to unblock the current regular GPU RT version uplift?
Since Linux GPU RT version >=21.11.19310 requires L0 loader 1.2.3(otherwise there will be a lot of regression), could we skip the sycl-ls check in buildbot/sycl-win-x64-pr? I expect other tests in pre-commit test and post commit test will not blocked by L0 loader init failure, since the L0 loader is actually loaded from pre installed Windows igfx drivers(v1.1.0).

@bader
Copy link
Contributor

bader commented Apr 9, 2021

@bader @sergey-semenov , the latest public Windows igfx driver igfx_win10_100.9316( https://downloadcenter.intel.com/download/30266/Intel-Graphics-Windows-10-DCH-Drivers) is not compatible with L0 Loader 1.2.3.
Do you have any suggestion to unblock the current regular GPU RT version uplift?
Since Linux GPU RT version >=21.11.19310 requires L0 loader 1.2.3(otherwise there will be a lot of regression), could we skip the sycl-ls check in buildbot/sycl-win-x64-pr? I expect other tests in pre-commit test and post commit test will not blocked by L0 loader init failure, since the L0 loader is actually loaded from pre installed Windows igfx drivers(v1.1.0).

Does L0 loader update break sycl-ls only? If so, I suggest fixing sycl-ls if possible.
I have concerns that this is the case because there are failures in llvm-test-suite and there no results for check-sycl on Windows.

We might consider using different L0 loader version on Windows as a back-up plan.

@smaslov-intel
Copy link
Contributor

Does L0 loader update break sycl-ls only? If so, I suggest fixing sycl-ls if possible.

There is nothing in sycl-ls that can fix this.

We might consider using different L0 loader version on Windows as a back-up plan.

Please consider uplifting the Level-Zero driver along with th1 v1.2.3. loader.

@bader
Copy link
Contributor

bader commented Apr 9, 2021

Please consider uplifting the Level-Zero driver along with th1 v1.2.3. loader.

Do you have a link to the driver we can use?
According to this comment there is no such driver!

@yanfeng3721
Copy link
Contributor

Please consider uplifting the Level-Zero driver along with th1 v1.2.3. loader.

Do you have a link to the driver we can use?
According to this comment there is no such driver!

Can we add some logic to handle L0 Loader version by OS? There currently is no release windows gpu driver that supports v1.1 of the L0 Specification, but Linux does.

Copy link
Contributor

@vladimirlaz vladimirlaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@smaslov-intel smaslov-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yanfeng3721
Copy link
Contributor

The sycl-ls check can pass on Windows now.
The following 6 tests unexpected pass can be resolved by intel/llvm-test-suite#193.
Hi @vladimirlaz , could you please to merge this PR and intel/llvm-test-suite#193 at the same time? Thanks!

@bader
Copy link
Contributor

bader commented Apr 14, 2021

The sycl-ls check can pass on Windows now.
The following 6 tests unexpected pass can be resolved by intel/llvm-test-suite#193.
Hi @vladimirlaz , could you please to merge this PR and intel/llvm-test-suite#193 at the same time? Thanks!

@yanfeng3721, @vladimirlaz, please, open another PR to update llvm-test-suite tests - intel/llvm-test-suite#193 is already merged.

@vladimirlaz
Copy link
Contributor

vladimirlaz commented Apr 14, 2021

The sycl-ls check can pass on Windows now.
The following 6 tests unexpected pass can be resolved by intel/llvm-test-suite#193.
Hi @vladimirlaz , could you please to merge this PR and intel/llvm-test-suite#193 at the same time? Thanks!

@yanfeng3721, @vladimirlaz, please, open another PR to update llvm-test-suite tests - intel/llvm-test-suite#193 is already merged.

it is misprint: intel/llvm-test-suite#232 should be used instead. PR193 was merged and then reverted in PR199

@bader
Copy link
Contributor

bader commented Apr 15, 2021

Hi @vladimirlaz , could you please to merge this PR and intel/llvm-test-suite#193 at the same time? Thanks!

@vladimirlaz, ping.

@vladimirlaz vladimirlaz merged commit 2045052 into intel:sycl Apr 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants