Skip to content

[release/2.6] Change gfx110x BLAS preferred backend #2053

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 13, 2025

Conversation

amd-imilenko
Copy link

@amd-imilenko amd-imilenko commented Apr 25, 2025

Only AMD Instinct GPUs prefer hipblaslt by default, but user can still override using env var.

Cherry-picked to release/2.5 branch via #2169

@amd-imilenko amd-imilenko requested a review from jeffdaily April 25, 2025 14:28
@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Apr 25, 2025

Jenkins build for d9b0d061725412951c47dce318dbf10d4823a297 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

@rocm-repo-management-api
Copy link

Jenkins build for d9b0d061725412951c47dce318dbf10d4823a297 commit is in progress
Links: Blue Ocean view / Build artifacts

@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Apr 28, 2025

Jenkins build for d9b0d061725412951c47dce318dbf10d4823a297 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

@jeffdaily
Copy link
Collaborator

For upstream release/2.7 we applied this patch which adds a Default blas backend which then becomes cublas or cublaslt. pytorch#150212

For release/2.6, it should be as straightforward as this diff:

diff --git a/aten/src/ATen/Context.cpp b/aten/src/ATen/Context.cpp
index a0e3b3d638..fbdbe767e3 100644
--- a/aten/src/ATen/Context.cpp
+++ b/aten/src/ATen/Context.cpp
@@ -320,7 +320,7 @@ at::BlasBackend Context::blasPreferredBackend() {
       static const std::vector<std::string> archs = {
           "gfx90a", "gfx942"
 #if ROCM_VERSION >= 60300
-          , "gfx1100", "gfx1101", "gfx1200", "gfx1201"
+          , "gfx1200", "gfx1201"
 #endif
 #if ROCM_VERSION >= 60500
           , "gfx950"

@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Apr 28, 2025

Jenkins build for d9b0d061725412951c47dce318dbf10d4823a297 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Apr 30, 2025

Jenkins build for d9b0d061725412951c47dce318dbf10d4823a297 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Apr 30, 2025

Jenkins build for d9b0d061725412951c47dce318dbf10d4823a297 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Apr 30, 2025

Jenkins build for d9b0d061725412951c47dce318dbf10d4823a297 commit finished as ABORTED
Links: Blue Ocean view / Build artifacts

@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Apr 30, 2025

Jenkins build for d9b0d061725412951c47dce318dbf10d4823a297 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Apr 30, 2025

Jenkins build for d9b0d061725412951c47dce318dbf10d4823a297 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented May 2, 2025

Jenkins build for d9b0d061725412951c47dce318dbf10d4823a297 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

@amd-imilenko
Copy link
Author

@jeffdaily Wouldn't the change demonstrated in the diff cause that preferred backend for gfx11* always be Cublas, even when environment variable TORCH_BLAS_PREFER_HIPBLASLT is set to true? Idea of the change was to set preferred backend to Cublas for gfx11*, but to be able to change to Cublaslt if TORCH_BLAS_PREFER_HIPBLASLT is explicitly set to true.

@apakbin
Copy link

apakbin commented May 5, 2025

given the widespread regression of hipBLASLt on gfx110x, can we disable it for gfx120x as well on rel/2.6? (CC. @pruthvistony)

@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented May 6, 2025

Jenkins build for d9b0d061725412951c47dce318dbf10d4823a297 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented May 6, 2025

Jenkins build for 744671327dcced25d3f72ab8bc7c86e0385106eb commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented May 6, 2025

Jenkins build for d9b0d061725412951c47dce318dbf10d4823a297 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

@jeffdaily
Copy link
Collaborator

@amd-imilenko @apakbin I update this PR with a slightly different approach. Please review. Env var is respected, only instinct GPUs will default to hipblaslt.

@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented May 7, 2025

Jenkins build for 744671327dcced25d3f72ab8bc7c86e0385106eb commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

@apakbin
Copy link

apakbin commented May 7, 2025

thanks @jeffdaily. Seems to me that the flags TORCH_BLAS_PREFER_HIPBLASLT/TORCH_BLAS_PREFER_CUBLASLT are already being checked in aten/src/ATen/Context.h and what the function in question Context::blasPreferredBackend() in context.cpp does is that it reverts the setting back to rocBLAS if the user has indicated they want hipBLASLt but the system does not support it. So as far as I understand we don't need to check those flags again in Context::blasPreferredBackend().

@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented May 8, 2025

Jenkins build for 744671327dcced25d3f72ab8bc7c86e0385106eb commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

Detected error during Pytorch building:

[5936/8040] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/Dispatch.cpp.o
[5937/8040] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/LegacyVmapMode.cpp.o
[5938/8040] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/ThreadLocalPythonObjects.cpp.o
[5939/8040] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/detail/MAIAHooksInterface.cpp.o
[5940/8040] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/EmptyTensor.cpp.o
FAILED: caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/EmptyTensor.cpp.o 
/opt/cache/bin/sccache /opt/cache/bin/c++ -DAT_PER_OPERATOR_HEADERS -DBUILD_ONEDNN_GRAPH -DCAFFE2_BUILD_MAIN_LIB -DCPUINFO_SUPPORTED_PLATFORM=1 -DFLASHATTENTION_DISABLE_ALIBI -DFMT_HEADER_ONLY=1 -DFXDIV_USE_INLINE_ASSEMBLY=0 -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DIDEEP_USE_MKL -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DNNP_CONVOLUTION_ONLY=0 -DNNP_INFERENCE_ONLY=0 -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DPYTORCH_LAYERNORM_FAST_RECIPROCAL -DROCM_VERSION=60400 -DTORCH_ENABLE_LLVM -DTORCH_HIP_VERSION=604 -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_EXTERNAL_MZCRC -DUSE_ROCM -DUSE_RPC -DUSE_TENSORPIPE -DXNN_LOG_LEVEL=0 -D_FILE_OFFSET_BITS=64 -D__HIP_PLATFORM_AMD__ -Dtorch_cpu_EXPORTS -I/var/lib/jenkins/pytorch/build/aten/src -I/var/lib/jenkins/pytorch/aten/src -I/var/lib/jenkins/pytorch/build -I/var/lib/jenkins/pytorch -I/opt/rocm-6.4.0/include -I/var/lib/jenkins/pytorch/cmake/../third_party/benchmark/include -I/opt/llvm/include -I/var/lib/jenkins/pytorch/third_party/onnx -I/var/lib/jenkins/pytorch/build/third_party/onnx -I/var/lib/jenkins/pytorch/nlohmann -I/var/lib/jenkins/pytorch/torch/csrc/api -I/var/lib/jenkins/pytorch/torch/csrc/api/include -I/var/lib/jenkins/pytorch/caffe2/aten/src/TH -I/var/lib/jenkins/pytorch/build/caffe2/aten/src/TH -I/var/lib/jenkins/pytorch/build/caffe2/aten/src -I/var/lib/jenkins/pytorch/build/caffe2/../aten/src -I/var/lib/jenkins/pytorch/torch/csrc -I/var/lib/jenkins/pytorch/third_party/miniz-3.0.2 -I/var/lib/jenkins/pytorch/third_party/kineto/libkineto/include -I/var/lib/jenkins/pytorch/third_party/kineto/libkineto/src -I/var/lib/jenkins/pytorch/third_party/cpp-httplib -I/var/lib/jenkins/pytorch/aten/src/ATen/.. -I/var/lib/jenkins/pytorch/third_party/FXdiv/include -I/var/lib/jenkins/pytorch/c10/.. -I/var/lib/jenkins/pytorch/third_party/pthreadpool/include -I/var/lib/jenkins/pytorch/third_party/cpuinfo/include -I/var/lib/jenkins/pytorch/aten/src/ATen/native/quantized/cpu/qnnpack/include -I/var/lib/jenkins/pytorch/aten/src/ATen/native/quantized/cpu/qnnpack/src -I/var/lib/jenkins/pytorch/aten/src/ATen/native/quantized/cpu/qnnpack/deps/clog/include -I/var/lib/jenkins/pytorch/third_party/NNPACK/include -I/var/lib/jenkins/pytorch/third_party/fbgemm/include -I/var/lib/jenkins/pytorch/third_party/fbgemm -I/var/lib/jenkins/pytorch/third_party/fbgemm/third_party/asmjit/src -I/var/lib/jenkins/pytorch/third_party/ittapi/src/ittnotify -I/var/lib/jenkins/pytorch/third_party/FP16/include -I/var/lib/jenkins/pytorch/third_party/tensorpipe -I/var/lib/jenkins/pytorch/build/third_party/tensorpipe -I/var/lib/jenkins/pytorch/third_party/tensorpipe/third_party/libnop/include -I/var/lib/jenkins/pytorch/third_party/fmt/include -I/var/lib/jenkins/pytorch/build/third_party/ideep/mkl-dnn/include -I/var/lib/jenkins/pytorch/third_party/ideep/mkl-dnn/src/../include -I/var/lib/jenkins/pytorch/third_party/flatbuffers/include -isystem /var/lib/jenkins/pytorch/build/third_party/gloo -isystem /var/lib/jenkins/pytorch/cmake/../third_party/gloo -isystem /var/lib/jenkins/pytorch/cmake/../third_party/tensorpipe/third_party/libuv/include -isystem /var/lib/jenkins/pytorch/cmake/../third_party/googletest/googlemock/include -isystem /var/lib/jenkins/pytorch/cmake/../third_party/googletest/googletest/include -isystem /var/lib/jenkins/pytorch/third_party/protobuf/src -isystem /opt/conda/envs/py_3.12/include -isystem /var/lib/jenkins/pytorch/third_party/XNNPACK/include -isystem /var/lib/jenkins/pytorch/third_party/ittapi/include -isystem /var/lib/jenkins/pytorch/cmake/../third_party/eigen -isystem /var/lib/jenkins/pytorch/third_party/ideep/mkl-dnn/include/oneapi/dnnl -isystem /var/lib/jenkins/pytorch/third_party/ideep/include -isystem /var/lib/jenkins/pytorch/INTERFACE -isystem /var/lib/jenkins/pytorch/third_party/nlohmann/include -isystem /var/lib/jenkins/pytorch/build/include -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOXPUPTI=ON -DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-error=dangling-reference -Wno-error=redundant-move -Wno-stringop-overflow -DHAVE_AVX512_CPU_DEFINITION -DHAVE_AVX2_CPU_DEFINITION -O3 -DNDEBUG -DNDEBUG -std=gnu++17 -fPIC -DMKL_HAS_SBGEMM -DTORCH_USE_LIBUV -DCAFFE2_USE_GLOO -Wall -Wextra -Wdeprecated -Wno-unused-parameter -Wno-missing-field-initializers -Wno-array-bounds -Wno-unknown-pragmas -Wno-strict-overflow -Wno-strict-aliasing -Wunused-function -Wunused-variable -Wunused-but-set-variable -Wno-maybe-uninitialized -fvisibility=hidden -O2 -pthread -DASMJIT_STATIC -fopenmp -MD -MT caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/EmptyTensor.cpp.o -MF caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/EmptyTensor.cpp.o.d -o caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/EmptyTensor.cpp.o -c /var/lib/jenkins/pytorch/aten/src/ATen/EmptyTensor.cpp
In file included from /var/lib/jenkins/pytorch/aten/src/ATen/EmptyTensor.cpp:5:
/var/lib/jenkins/pytorch/aten/src/ATen/Context.h: In lambda function:
/var/lib/jenkins/pytorch/aten/src/ATen/Context.h:431:45: error: cannot convert ‘const std::vector<std::__cxx11::basic_string<char> >’ to ‘c10::DeviceIndex’ {aka ‘signed char’}
  431 |       if (!detail::getCUDAHooks().isGPUArch(archs, index)) {

@amd-vlarakic
Copy link

Hi @jeffdaily and @apakbin,
Correct me if I am wrong, wouldn't excluding gfx120x from list of architectures that default to rocblaslt prevent fp8 workloads (gemms) from being executed on these devices out of the box, without setting env variable?

@fjankovi
Copy link

fjankovi commented May 8, 2025

@amd-imilenko @apakbin I update this PR with a slightly different approach. Please review. Env var is respected, only instinct GPUs will default to hipblaslt.

@jeffdaily We also want gfx12 to default to hipblaslt (and probably also APUs if added later).

@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented May 8, 2025

Jenkins build for 744671327dcced25d3f72ab8bc7c86e0385106eb commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

Detected error during Pytorch building:

[5933/8040] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/detail/CPUGuardImpl.cpp.o
[5934/8040] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/Dispatch.cpp.o
[5935/8040] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/detail/MetaGuardImpl.cpp.o
[5936/8040] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/LegacyVmapMode.cpp.o
[5937/8040] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/DeviceAccelerator.cpp.o
FAILED: caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/DeviceAccelerator.cpp.o 
/opt/cache/bin/sccache /opt/cache/bin/c++ -DAT_PER_OPERATOR_HEADERS -DBUILD_ONEDNN_GRAPH -DCAFFE2_BUILD_MAIN_LIB -DCPUINFO_SUPPORTED_PLATFORM=1 -DFLASHATTENTION_DISABLE_ALIBI -DFMT_HEADER_ONLY=1 -DFXDIV_USE_INLINE_ASSEMBLY=0 -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DIDEEP_USE_MKL -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DNNP_CONVOLUTION_ONLY=0 -DNNP_INFERENCE_ONLY=0 -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DPYTORCH_LAYERNORM_FAST_RECIPROCAL -DROCM_VERSION=60400 -DTORCH_ENABLE_LLVM -DTORCH_HIP_VERSION=604 -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_EXTERNAL_MZCRC -DUSE_ROCM -DUSE_RPC -DUSE_TENSORPIPE -DXNN_LOG_LEVEL=0 -D_FILE_OFFSET_BITS=64 -D__HIP_PLATFORM_AMD__ -Dtorch_cpu_EXPORTS -I/var/lib/jenkins/pytorch/build/aten/src -I/var/lib/jenkins/pytorch/aten/src -I/var/lib/jenkins/pytorch/build -I/var/lib/jenkins/pytorch -I/opt/rocm-6.4.0/include -I/var/lib/jenkins/pytorch/cmake/../third_party/benchmark/include -I/opt/llvm/include -I/var/lib/jenkins/pytorch/third_party/onnx -I/var/lib/jenkins/pytorch/build/third_party/onnx -I/var/lib/jenkins/pytorch/nlohmann -I/var/lib/jenkins/pytorch/torch/csrc/api -I/var/lib/jenkins/pytorch/torch/csrc/api/include -I/var/lib/jenkins/pytorch/caffe2/aten/src/TH -I/var/lib/jenkins/pytorch/build/caffe2/aten/src/TH -I/var/lib/jenkins/pytorch/build/caffe2/aten/src -I/var/lib/jenkins/pytorch/build/caffe2/../aten/src -I/var/lib/jenkins/pytorch/torch/csrc -I/var/lib/jenkins/pytorch/third_party/miniz-3.0.2 -I/var/lib/jenkins/pytorch/third_party/kineto/libkineto/include -I/var/lib/jenkins/pytorch/third_party/kineto/libkineto/src -I/var/lib/jenkins/pytorch/third_party/cpp-httplib -I/var/lib/jenkins/pytorch/aten/src/ATen/.. -I/var/lib/jenkins/pytorch/third_party/FXdiv/include -I/var/lib/jenkins/pytorch/c10/.. -I/var/lib/jenkins/pytorch/third_party/pthreadpool/include -I/var/lib/jenkins/pytorch/third_party/cpuinfo/include -I/var/lib/jenkins/pytorch/aten/src/ATen/native/quantized/cpu/qnnpack/include -I/var/lib/jenkins/pytorch/aten/src/ATen/native/quantized/cpu/qnnpack/src -I/var/lib/jenkins/pytorch/aten/src/ATen/native/quantized/cpu/qnnpack/deps/clog/include -I/var/lib/jenkins/pytorch/third_party/NNPACK/include -I/var/lib/jenkins/pytorch/third_party/fbgemm/include -I/var/lib/jenkins/pytorch/third_party/fbgemm -I/var/lib/jenkins/pytorch/third_party/fbgemm/third_party/asmjit/src -I/var/lib/jenkins/pytorch/third_party/ittapi/src/ittnotify -I/var/lib/jenkins/pytorch/third_party/FP16/include -I/var/lib/jenkins/pytorch/third_party/tensorpipe -I/var/lib/jenkins/pytorch/build/third_party/tensorpipe -I/var/lib/jenkins/pytorch/third_party/tensorpipe/third_party/libnop/include -I/var/lib/jenkins/pytorch/third_party/fmt/include -I/var/lib/jenkins/pytorch/build/third_party/ideep/mkl-dnn/include -I/var/lib/jenkins/pytorch/third_party/ideep/mkl-dnn/src/../include -I/var/lib/jenkins/pytorch/third_party/flatbuffers/include -isystem /var/lib/jenkins/pytorch/build/third_party/gloo -isystem /var/lib/jenkins/pytorch/cmake/../third_party/gloo -isystem /var/lib/jenkins/pytorch/cmake/../third_party/tensorpipe/third_party/libuv/include -isystem /var/lib/jenkins/pytorch/cmake/../third_party/googletest/googlemock/include -isystem /var/lib/jenkins/pytorch/cmake/../third_party/googletest/googletest/include -isystem /var/lib/jenkins/pytorch/third_party/protobuf/src -isystem /opt/conda/envs/py_3.12/include -isystem /var/lib/jenkins/pytorch/third_party/XNNPACK/include -isystem /var/lib/jenkins/pytorch/third_party/ittapi/include -isystem /var/lib/jenkins/pytorch/cmake/../third_party/eigen -isystem /var/lib/jenkins/pytorch/third_party/ideep/mkl-dnn/include/oneapi/dnnl -isystem /var/lib/jenkins/pytorch/third_party/ideep/include -isystem /var/lib/jenkins/pytorch/INTERFACE -isystem /var/lib/jenkins/pytorch/third_party/nlohmann/include -isystem /var/lib/jenkins/pytorch/build/include -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOXPUPTI=ON -DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-error=dangling-reference -Wno-error=redundant-move -Wno-stringop-overflow -DHAVE_AVX512_CPU_DEFINITION -DHAVE_AVX2_CPU_DEFINITION -O3 -DNDEBUG -DNDEBUG -std=gnu++17 -fPIC -DMKL_HAS_SBGEMM -DTORCH_USE_LIBUV -DCAFFE2_USE_GLOO -Wall -Wextra -Wdeprecated -Wno-unused-parameter -Wno-missing-field-initializers -Wno-array-bounds -Wno-unknown-pragmas -Wno-strict-overflow -Wno-strict-aliasing -Wunused-function -Wunused-variable -Wunused-but-set-variable -Wno-maybe-uninitialized -fvisibility=hidden -O2 -pthread -DASMJIT_STATIC -fopenmp -MD -MT caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/DeviceAccelerator.cpp.o -MF caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/DeviceAccelerator.cpp.o.d -o caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/DeviceAccelerator.cpp.o -c /var/lib/jenkins/pytorch/aten/src/ATen/DeviceAccelerator.cpp
In file included from /var/lib/jenkins/pytorch/aten/src/ATen/DeviceAccelerator.cpp:1:
/var/lib/jenkins/pytorch/aten/src/ATen/Context.h: In lambda function:
/var/lib/jenkins/pytorch/aten/src/ATen/Context.h:431:45: error: cannot convert ‘const std::vector<std::__cxx11::basic_string<char> >’ to ‘c10::DeviceIndex’ {aka ‘signed char’}
  431 |       if (!detail::getCUDAHooks().isGPUArch(archs, index)) {

@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented May 8, 2025

Jenkins build for 744671327dcced25d3f72ab8bc7c86e0385106eb commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented May 8, 2025

Jenkins build for 744671327dcced25d3f72ab8bc7c86e0385106eb commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

@apakbin
Copy link

apakbin commented May 12, 2025

CC. @pruthvistony

@apakbin
Copy link

apakbin commented May 12, 2025

the compile error seems to stem from the PR pytorch#150473 not gotten cherry-picked in rel/2.6. That PR added index to the isGPUArch() function. If we could cherry-pick that here it would go away.

@rocm-repo-management-api
Copy link

Jenkins build for 14341d582f9184f6b9556e4252bbe2ccd921e3c6 commit is in progress
Links: Blue Ocean view / Build artifacts

@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented May 12, 2025

Jenkins build for 14341d582f9184f6b9556e4252bbe2ccd921e3c6 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

Detected error during Pytorch building:

[189/8040] Building CXX object third_party/protobuf/cmake/CMakeFiles/libprotoc.dir/__/src/google/protobuf/compiler/objectivec/objectivec_generator.cc.o
[190/8040] Building CXX object third_party/protobuf/cmake/CMakeFiles/protoc.dir/__/src/google/protobuf/compiler/main.cc.o
[191/8040] Building C object confu-deps/cpuinfo/CMakeFiles/cpuinfo_internals.dir/src/x86/name.c.o
[192/8040] Building C object confu-deps/cpuinfo/CMakeFiles/cpuinfo_internals.dir/src/x86/isa.c.o
[193/8040] Performing download step (download, verify and extract) for 'aotriton_external'
FAILED: aotriton_external-prefix/src/aotriton_external-stamp/aotriton_external-download /var/lib/jenkins/pytorch/build/aotriton_external-prefix/src/aotriton_external-stamp/aotriton_external-download 
cd /var/lib/jenkins/pytorch/build && /opt/conda/envs/py_3.12/bin/cmake -DCMAKE_MESSAGE_LOG_LEVEL=VERBOSE -P /var/lib/jenkins/pytorch/build/aotriton_external-prefix/src/aotriton_external-stamp/download-aotriton_external.cmake && /opt/conda/envs/py_3.12/bin/cmake -DCMAKE_MESSAGE_LOG_LEVEL=VERBOSE -P /var/lib/jenkins/pytorch/build/aotriton_external-prefix/src/aotriton_external-stamp/verify-aotriton_external.cmake && /opt/conda/envs/py_3.12/bin/cmake -DCMAKE_MESSAGE_LOG_LEVEL=VERBOSE -P /var/lib/jenkins/pytorch/build/aotriton_external-prefix/src/aotriton_external-stamp/extract-aotriton_external.cmake && /opt/conda/envs/py_3.12/bin/cmake -E touch /var/lib/jenkins/pytorch/build/aotriton_external-prefix/src/aotriton_external-stamp/aotriton_external-download
-- Downloading...
   dst='/var/lib/jenkins/pytorch/build/aotriton_external-prefix/src/aotriton-0.9.2b-manylinux_2_28_x86_64-rocm6.4-shared.tar.gz'
   timeout='none'
   inactivity timeout='none'

@amd-imilenko amd-imilenko merged commit 1ded221 into release/2.6 May 13, 2025
2 of 6 checks passed
@amd-imilenko amd-imilenko deleted the change_gfx110_blas_preferred_backend branch May 13, 2025 09:44
@fjankovi
Copy link

!cherry-pick --onto release/2.7

Created this PR for 2.7: #2125

@apakbin
Copy link

apakbin commented May 13, 2025

great thanks @fjankovi. Deleted my comment to not apply it twice.

@amd-imilenko
Copy link
Author

!cherry-pick --onto release/2.5

okakarpa pushed a commit that referenced this pull request May 20, 2025
Only AMD Instinct GPUs and Navi 4x prefer hipblaslt by default, but user can still
override using env var.

---------

Co-authored-by: Jeff Daily <[email protected]>
@okakarpa
Copy link
Collaborator

Created branch autogenerated/release/2.5_cherry-pick_pr-2053 and #2169

pruthvistony pushed a commit that referenced this pull request Jun 6, 2025
…2169)

Cherry-pick of #2053

---------

Co-authored-by: Ilija Milenkovic <[email protected]>
Co-authored-by: Jeff Daily <[email protected]>
Co-authored-by: Arash Pakbin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants