Commit 90e7375
Unify
Summary:
As `cuBLAS` workspaces are already per-stream, there shouldn't be kernel execution overlap with `cuBLASLt` kernels.
This PR reuses `cuBLAS` workspaces for `cuBLASLt` for the following benefits:
+ caching (`cuBLAS` workspaces were already cached, so now we get that for `cuBLASLt`)
+ "free" workspace size bump for `cuBLASLt` `cuBLASLt` workspace sizes were previously smaller than those for `cuBLAS` by default which potentially hurts performance, and we encountered difficulty in increasing the size due to downstream OOMs , see also #120925
+ fixes behavior broken behavior with the memtracker; pytorch/pytorch#139442 attempted to handle peaky allocation behavior that broke memtracker equivalence tests but it didn't seem to fully work, here the cached/reused `cuBLAS` workspace seems to fix it
+ one environment variable to rule them all: `CUBLAS_WORKSPACE_CONFIG` applies directly to `cuBLASLt` without a confusing `CUBLASLT_WORKSPACE_SIZE` that users would also need to consider
X-link: pytorch/pytorch#145130
Approved by: https://github.com/ngimel
Reviewed By: izaitsevfb
Differential Revision: D71711852
fbshipit-source-id: 4f57539b8f37f1f4c92a57c19276e84f81bffa23cuBLASLt workspaces with cuBLAS workspaces (#145130)1 parent 10a7be3 commit 90e7375
1 file changed
+10
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3592 | 3592 | | |
3593 | 3593 | | |
3594 | 3594 | | |
| 3595 | + | |
| 3596 | + | |
| 3597 | + | |
| 3598 | + | |
| 3599 | + | |
| 3600 | + | |
| 3601 | + | |
| 3602 | + | |
| 3603 | + | |
| 3604 | + | |
3595 | 3605 | | |
3596 | 3606 | | |
3597 | 3607 | | |
| |||
0 commit comments