Description
Hi,
following the blog post and the latest instructions from Codeplay here and here, I am trying to use the cuda
branch to compile a simple SYCL/oneAPI application for the NVPTX backend.
Minimal background: this stand-alone application is a very small part of the CMS reconstruction software that we have ported to run on GPUs, and are using to evaluate different performance portability libraries and frameworks.
I have built the toolchain with
SYCL_HOME=$PWD
git clone https://github.com/codeplaysoftware/sycl-for-cuda -b cuda llvm
mkdir build
cd build
cmake $SYCL_HOME/llvm/llvm \
-DCMAKE_BUILD_TYPE=Release \
-DLLVM_TARGETS_TO_BUILD="X86;PowerPC;AArch64;NVPTX" \
-DLLVM_EXTERNAL_PROJECTS="llvm-spirv;sycl" \
-DLLVM_ENABLE_PROJECTS="clang;clang-tools-extra;compiler-rt;lld;openmp;llvm-spirv;sycl;libclc" \
-DLLVM_EXTERNAL_SYCL_SOURCE_DIR=$SYCL_HOME/llvm/sycl \
-DLLVM_EXTERNAL_LLVM_SPIRV_SOURCE_DIR=$SYCL_HOME/llvm/llvm-spirv \
-DLLVM_ENABLE_EH=ON \
-DLLVM_ENABLE_PIC=ON \
-DLLVM_ENABLE_RTTI=ON \
-DSYCL_BUILD_PI_CUDA=ON \
-DLIBCLC_TARGETS_TO_BUILD="nvptx64--;nvptx64--nvidiacl" \
-DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-10.2
make -j`nproc` sycl-toolchain
The build seems successful, and i can use it to compile my test program for the SPIR-V backend:
cd ..
git clone [email protected]:fwyzard/pixel-standalone.git -b oneapi test
cd test
$SYCL_HOME/build/bin/clang++ -fsycl -I/opt/intel/inteloneapi/dpcpp-ct/latest/include -O2 -std=c++14 -DDIGI_ONEAPI -o test-oneapi main_oneapi.cc rawtodigi_oneapi.cc
However, building for the NVPTX backend with
$SYCL_HOME/build/bin/clang++ -fsycl -I/opt/intel/inteloneapi/dpcpp-ct/latest/include -fsycl-targets=nvptx64-nvidia-cuda-sycldevice --cuda-path=/usr/local/cuda -O2 -std=c++14 -DDIGI_ONEAPI -o test-oneapi-cuda main_oneapi.cc rawtodigi_oneapi.cc
fails with
ptxas fatal : Unresolved extern function '_Z18__spirv_AtomicLoadPU3AS1KjN5__spv5ScopeENS1_19MemorySemanticsMaskE'
clang-11: error: ptxas command failed with exit code 255 (use -v to see invocation)
Is this something that is not supported yet (possibly due to the use of the Intel® DPC++ Compatibility Tool header) ?
Or am I missing something while building the toolchain ?
Thank you,
.Andrea