-
Notifications
You must be signed in to change notification settings - Fork 69
[release/2.6] Change gfx110x BLAS preferred backend #2053
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[release/2.6] Change gfx110x BLAS preferred backend #2053
Conversation
Jenkins build for d9b0d061725412951c47dce318dbf10d4823a297 commit finished as FAILURE |
Jenkins build for d9b0d061725412951c47dce318dbf10d4823a297 commit is in progress |
Jenkins build for d9b0d061725412951c47dce318dbf10d4823a297 commit finished as FAILURE |
For upstream release/2.7 we applied this patch which adds a Default blas backend which then becomes cublas or cublaslt. pytorch#150212 For release/2.6, it should be as straightforward as this diff: diff --git a/aten/src/ATen/Context.cpp b/aten/src/ATen/Context.cpp
index a0e3b3d638..fbdbe767e3 100644
--- a/aten/src/ATen/Context.cpp
+++ b/aten/src/ATen/Context.cpp
@@ -320,7 +320,7 @@ at::BlasBackend Context::blasPreferredBackend() {
static const std::vector<std::string> archs = {
"gfx90a", "gfx942"
#if ROCM_VERSION >= 60300
- , "gfx1100", "gfx1101", "gfx1200", "gfx1201"
+ , "gfx1200", "gfx1201"
#endif
#if ROCM_VERSION >= 60500
, "gfx950" |
Jenkins build for d9b0d061725412951c47dce318dbf10d4823a297 commit finished as FAILURE |
Jenkins build for d9b0d061725412951c47dce318dbf10d4823a297 commit finished as FAILURE |
Jenkins build for d9b0d061725412951c47dce318dbf10d4823a297 commit finished as FAILURE |
Jenkins build for d9b0d061725412951c47dce318dbf10d4823a297 commit finished as ABORTED |
Jenkins build for d9b0d061725412951c47dce318dbf10d4823a297 commit finished as FAILURE |
Jenkins build for d9b0d061725412951c47dce318dbf10d4823a297 commit finished as FAILURE |
Jenkins build for d9b0d061725412951c47dce318dbf10d4823a297 commit finished as FAILURE |
@jeffdaily Wouldn't the change demonstrated in the diff cause that preferred backend for gfx11* always be Cublas, even when environment variable TORCH_BLAS_PREFER_HIPBLASLT is set to true? Idea of the change was to set preferred backend to Cublas for gfx11*, but to be able to change to Cublaslt if TORCH_BLAS_PREFER_HIPBLASLT is explicitly set to true. |
given the widespread regression of hipBLASLt on gfx110x, can we disable it for gfx120x as well on rel/2.6? (CC. @pruthvistony) |
Jenkins build for d9b0d061725412951c47dce318dbf10d4823a297 commit finished as FAILURE |
Jenkins build for 744671327dcced25d3f72ab8bc7c86e0385106eb commit finished as FAILURE |
Jenkins build for d9b0d061725412951c47dce318dbf10d4823a297 commit finished as FAILURE |
@amd-imilenko @apakbin I update this PR with a slightly different approach. Please review. Env var is respected, only instinct GPUs will default to hipblaslt. |
Jenkins build for 744671327dcced25d3f72ab8bc7c86e0385106eb commit finished as FAILURE |
thanks @jeffdaily. Seems to me that the flags TORCH_BLAS_PREFER_HIPBLASLT/TORCH_BLAS_PREFER_CUBLASLT are already being checked in aten/src/ATen/Context.h and what the function in question Context::blasPreferredBackend() in context.cpp does is that it reverts the setting back to rocBLAS if the user has indicated they want hipBLASLt but the system does not support it. So as far as I understand we don't need to check those flags again in Context::blasPreferredBackend(). |
Jenkins build for 744671327dcced25d3f72ab8bc7c86e0385106eb commit finished as FAILURE Detected error during Pytorch building:
|
Hi @jeffdaily and @apakbin, |
@jeffdaily We also want gfx12 to default to hipblaslt (and probably also APUs if added later). |
Jenkins build for 744671327dcced25d3f72ab8bc7c86e0385106eb commit finished as FAILURE Detected error during Pytorch building:
|
Jenkins build for 744671327dcced25d3f72ab8bc7c86e0385106eb commit finished as FAILURE |
Jenkins build for 744671327dcced25d3f72ab8bc7c86e0385106eb commit finished as FAILURE |
CC. @pruthvistony |
the compile error seems to stem from the PR pytorch#150473 not gotten cherry-picked in rel/2.6. That PR added index to the isGPUArch() function. If we could cherry-pick that here it would go away. |
Jenkins build for 14341d582f9184f6b9556e4252bbe2ccd921e3c6 commit is in progress |
Jenkins build for 14341d582f9184f6b9556e4252bbe2ccd921e3c6 commit finished as FAILURE Detected error during Pytorch building:
|
Created this PR for 2.7: #2125 |
great thanks @fjankovi. Deleted my comment to not apply it twice. |
!cherry-pick --onto release/2.5 |
Only AMD Instinct GPUs and Navi 4x prefer hipblaslt by default, but user can still override using env var. --------- Co-authored-by: Jeff Daily <[email protected]>
Created branch autogenerated/release/2.5_cherry-pick_pr-2053 and #2169 |
…2169) Cherry-pick of #2053 --------- Co-authored-by: Ilija Milenkovic <[email protected]> Co-authored-by: Jeff Daily <[email protected]> Co-authored-by: Arash Pakbin <[email protected]>
Only AMD Instinct GPUs prefer hipblaslt by default, but user can still override using env var.
Cherry-picked to release/2.5 branch via #2169