Skip to content

Fix support for NVVM from conda on Windows + other fixes #563

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Apr 22, 2025

Conversation

leofang
Copy link
Member

@leofang leofang commented Apr 21, 2025

Description

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@leofang leofang added bug Something isn't working P0 High priority - Must do! cuda.bindings Everything related to the cuda.bindings module labels Apr 21, 2025
@leofang leofang added this to the cuda-python 12.9.0 & 11.8.7 milestone Apr 21, 2025
@leofang leofang requested a review from rwgk April 21, 2025 22:05
@leofang leofang self-assigned this Apr 21, 2025
Copy link
Contributor

copy-pr-bot bot commented Apr 21, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@leofang leofang changed the title Fix suport for NVVM from conda on Windows Fix support for NVVM from conda on Windows Apr 21, 2025
@leofang
Copy link
Member Author

leofang commented Apr 21, 2025

@rwgk do you still have your Windows instance alive? If so could you double check this locally (since we don't test against conda yet, #280)?

@rwgk
Copy link
Collaborator

rwgk commented Apr 21, 2025

@rwgk do you still have your Windows instance alive? If so could you double check this locally (since we don't test against conda yet, #280)?

I got myself a new instance (for 8 weeks, as you showed me) but only got as far as installing the CUDA driver. (And conda, but I haven't figured out yet how to activate it.)

This is a good motivation for me to continue working on the setup, I'll let you know by tonight (PT) how far I got.

@leofang leofang force-pushed the fix_win_conda_nvvm branch from b86374a to a14d95e Compare April 22, 2025 01:17
@leofang
Copy link
Member Author

leofang commented Apr 22, 2025

/ok to test a14d95e

@leofang
Copy link
Member Author

leofang commented Apr 22, 2025

/ok to test a14d95e

This comment has been minimized.

@rwgk
Copy link
Collaborator

rwgk commented Apr 22, 2025

I see one of the tests here is failing, but I downloaded the wheel anyway to my Windows machine.

I used this command (miniforge3):

conda create -n ctk128 -c nvidia -c conda-forge python=3.12 cuda-toolkit=12.8.1

This zip file:

After extracting the .whl file:

(ctk128) PS C:\Users\rgrossekunst\Downloads> pip install cuda_bindings-12.8.0-cp312-cp312-win_amd64.whl
Processing c:\users\rgrossekunst\downloads\cuda_bindings-12.8.0-cp312-cp312-win_amd64.whl
Collecting pywin32 (from cuda-bindings==12.8.0)
  Downloading pywin32-310-cp312-cp312-win_amd64.whl.metadata (9.4 kB)
Downloading pywin32-310-cp312-cp312-win_amd64.whl (9.5 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 9.5/9.5 MB 59.2 MB/s eta 0:00:00
Installing collected packages: pywin32, cuda-bindings
Successfully installed cuda-bindings-12.8.0 pywin32-310
(ctk128) PS C:\Users\rgrossekunst\Downloads> pip freeze
cuda-bindings @ file:///C:/Users/rgrossekunst/Downloads/cuda_bindings-12.8.0-cp312-cp312-win_amd64.whl#sha256=134d06689b07782f0fde60aa23f35b69750b77a7c073853195d28f1572b3eff8
pywin32==310
setuptools==79.0.0
wheel==0.45.1
(ctk128) PS C:\Users\rgrossekunst\Downloads> python
Python 3.12.10 | packaged by conda-forge | (main, Apr 10 2025, 22:08:16) [MSC v.1943 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from cuda.bindings import nvvm
>>> nvvm.version()
(2, 0)
>>>

Looks like it works?

@rwgk
Copy link
Collaborator

rwgk commented Apr 22, 2025

Negative test under very similar conditions:

conda create -n ctk128nodev -c nvidia -c conda-forge python=3.12 cuda-toolkit=12.8.1 

Using wheel from current main:

(ctk128nodev) PS C:\Users\rgrossekunst\Downloads\older\cuda-bindings-python312-cuda12.8.0-win-64-d425a8895bd778fffb993263dbf5d9bc631fea22> pip install .\cuda_bindings-12.8.0-cp312-cp312-win_amd64.whl
Processing c:\users\rgrossekunst\downloads\older\cuda-bindings-python312-cuda12.8.0-win-64-d425a8895bd778fffb993263dbf5d9bc631fea22\cuda_bindings-12.8.0-cp312-cp312-win_amd64.whl
Requirement already satisfied: pywin32 in c:\programdata\miniforge3\envs\ctk128nodev\lib\site-packages (from cuda-bindings==12.8.0) (310)
Installing collected packages: cuda-bindings
Successfully installed cuda-bindings-12.8.0
(ctk128nodev) PS C:\Users\rgrossekunst\Downloads\older\cuda-bindings-python312-cuda12.8.0-win-64-d425a8895bd778fffb993263dbf5d9bc631fea22> python
Python 3.12.10 | packaged by conda-forge | (main, Apr 10 2025, 22:08:16) [MSC v.1943 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from cuda.bindings import nvvm
>>> nvvm.version()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "cuda\\bindings\\nvvm.pyx", line 76, in cuda.bindings.nvvm.version
  File "cuda\\bindings\\nvvm.pyx", line 90, in cuda.bindings.nvvm.version
  File "cuda\\bindings\\cynvvm.pyx", line 15, in cuda.bindings.cynvvm.nvvmVersion
  File "cuda\\bindings\\_internal\\nvvm.pyx", line 258, in cuda.bindings._internal.nvvm._nvvmVersion
  File "cuda\\bindings\\_internal\\nvvm.pyx", line 115, in cuda.bindings._internal.nvvm._check_or_init_nvvm
  File "cuda\\bindings\\_internal\\nvvm.pyx", line 87, in cuda.bindings._internal.nvvm.load_library
RuntimeError: Failed to load nvvm
>>> 

I think in combination with the previous comment, that's conclusive.

@rwgk
Copy link
Collaborator

rwgk commented Apr 22, 2025

The path in this PR is definitely correct:

(ctk128) PS C:\Users\rgrossekunst\Downloads> dir C:\ProgramData\miniforge3\envs\ctk128\Library\nvvm\bin

    Directory: C:\ProgramData\miniforge3\envs\ctk128\Library\nvvm\bin

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
-a----         2/21/2025   9:18 PM       67792896 cicc.exe
-a----         2/21/2025   9:18 PM       52873216 nvvm64_40_0.dll

(ctk128) PS C:\Users\rgrossekunst\Downloads> dir C:\ProgramData\miniforge3\envs\ctk128nodev\Library\nvvm\bin

    Directory: C:\ProgramData\miniforge3\envs\ctk128nodev\Library\nvvm\bin

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
-a----         2/21/2025   9:18 PM       67792896 cicc.exe
-a----         2/21/2025   9:18 PM       52873216 nvvm64_40_0.dll

@leofang
Copy link
Member Author

leofang commented Apr 22, 2025

/ok to test 21a7fb4

@rwgk
Copy link
Collaborator

rwgk commented Apr 22, 2025

I didn't push this, to not conflict with your work:

commit ceb62f1db9f07187c859f202b02a1e9910ae296f (HEAD -> fix_win_conda_nvvm)
Author: Ralf W. Grosse-Kunstleve <[email protected]>
Date:   Mon Apr 21 22:03:58 2025 -0700

    Fix two bugs: 1. "conda" needs to be skipped if CONDA_PREFIX is not defined (that is a new bug). 2. Existing indentation error (spotted by ChatGPT).

diff --git a/cuda_bindings/cuda/bindings/_internal/nvvm_windows.pyx b/cuda_bindings/cuda/bindings/_internal/nvvm_windows.pyx
index 9243bf07..44ec16f4 100644
--- a/cuda_bindings/cuda/bindings/_internal/nvvm_windows.pyx
+++ b/cuda_bindings/cuda/bindings/_internal/nvvm_windows.pyx
@@ -62,23 +62,26 @@ cdef load_library(const int driver_ver):
 
         # Next, check if DLLs are installed via pip or conda
         for sp in get_site_packages():
-            if sp == "conda" and "CONDA_PREFIX" in os.environ:
+            if sp == "conda":
                 # nvvm is not under $CONDA_PREFIX/lib, so it's not in the default search path
-                mod_path = os.path.join(os.environ["CONDA_PREFIX"], "Library", "nvvm", "bin")
+                conda_prefix = os.environ.get("CONDA_PREFIX")
+                if conda_prefix is None:
+                    continue
+                mod_path = os.path.join(conda_prefix, "Library", "nvvm", "bin")
             else:
                 mod_path = os.path.join(sp, "nvidia", "cuda_nvcc", "nvvm", "bin")
             if not os.path.isdir(mod_path):
                 continue
             os.add_dll_directory(mod_path)
-        try:
-            handle = win32api.LoadLibraryEx(
-                # Note: LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR needs an abs path...
-                os.path.join(mod_path, dll_name),
-                0, LOAD_LIBRARY_SEARCH_DEFAULT_DIRS | LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR)
-        except:
-            pass
-        else:
-            break
+            try:
+                handle = win32api.LoadLibraryEx(
+                    # Note: LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR needs an abs path...
+                    os.path.join(mod_path, dll_name),
+                    0, LOAD_LIBRARY_SEARCH_DEFAULT_DIRS | LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR)
+            except:
+                pass
+            else:
+                break
 
         # Finally, try default search
         try:

@leofang
Copy link
Member Author

leofang commented Apr 22, 2025

@rwgk Yes. I fixed conda #563 (comment), but then broke wheels #563 (comment) 😓 It turns out that I actually introduced a pedantic bug in all DLL loading logic (to cuda-bindings and nvmath-python) that's not discovered until we apply it to NVVM... Both should be fixed now with commit 21a7fb4.

We keep looping over all possible mod_path but we did not stop when hitting a valid one. Then, by the time we use it to assemble an absolute path to the DLL, it might not be a valid path. This is exposed because we added conda to the end of the loop only for NVVM, and the previous found (valid) wheel path got messed up. It's now fixed for all places where we do DLL loading. I think it's a bug unlikely to hit because without conda we only search among user/global site-packages, and it's unlikely. though not impossible, to find multiple copies of CUDA wheels there.

@leofang
Copy link
Member Author

leofang commented Apr 22, 2025

I didn't push this, to not conflict with your work

Feel free to push and I'll retest locally! @rwgk I think your patch is better in that we actually try to load in every loop iteration, instead of trying it after the first hit.

@leofang
Copy link
Member Author

leofang commented Apr 22, 2025

CI is green now. Let me apply your patch and then we can test/merge.

@rwgk
Copy link
Collaborator

rwgk commented Apr 22, 2025

/ok to test a0baf71

@rwgk
Copy link
Collaborator

rwgk commented Apr 22, 2025

To explain commit a0baf71:

  • Common to all cases (nvrtc, nvJitLink, nvvm): Ensure that mod_path is used only when it is defined. — Note that the loop over site-packages may have no iterations.

  • For nvrtc: The DLL search order is now consistent with that of nvJitLink, nvvm.

  • For nvvm: Avoid probing a non-sensical mod_path: os.path.join("cuda", "nvidia", "cuda_nvcc", "nvvm", "bin")

Comment on lines +78 to +81
# Note: nvrtc64_120_0.dll calls into nvrtc-builtins64_*.dll which is
# located in the same mod_path.
# Update PATH environ so that the two dlls can find each other
os.environ["PATH"] = os.pathsep.join((os.environ.get("PATH", ""), mod_path))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be manually loading this dll instead of modifying PATH? Regardless, we don't need to fix this in this PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvrtc is very special, because of the builtins runtime dependency.

In one of my ChatGPT chats about it a few days ago, it suggested strongly to do both, os.environ["PATH"] update, and os.add_dll_directory(mod_path). Therefore I'm carrying that into the path_finder.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to add: I somehow have it in my mind that @leofang made a remark that we shouldn't load the builtins ourselves. Leo, does that make sense?

Copy link
Member Author

@leofang leofang Apr 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-loading a DLL works (it's what I did in nvmath). My reason against pre-loading is not because it does not work, but because I am not willing to maintain other libraries' implementation details (dlopen, which has no DT_NEEDED entry or package dependency metadata for us to inspect). This can easily go out-of-date as these libraries evolve.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Leo, I linked your comment here:

1344621

We can weigh the pros-and-cons of "soft dependency pre-loading" vs os.environ["PATH"] & os.add_dll_directory() management when the path_finder code is complete.

…site-package DLL was found. Using the most obvious approach to solve this problem: return immediately on success.
@rwgk rwgk force-pushed the fix_win_conda_nvvm branch from f88cbf3 to 03e6e4a Compare April 22, 2025 16:03
@rwgk
Copy link
Collaborator

rwgk commented Apr 22, 2025

/ok to test 03e6e4a

@leofang leofang changed the title Fix support for NVVM from conda on Windows Fix support for NVVM from conda on Windows + Fix DLL loading inconsistency for NVRTC Apr 22, 2025
@rwgk
Copy link
Collaborator

rwgk commented Apr 22, 2025

/ok to test 1a05efc

@leofang leofang changed the title Fix support for NVVM from conda on Windows + Fix DLL loading inconsistency for NVRTC Fix support for NVVM from conda on Windows + other fixes Apr 22, 2025
@leofang
Copy link
Member Author

leofang commented Apr 22, 2025

FWIW I updated the PR title/description since we fixed multiple issues in this PR, which now LGTM but I should not self-approve (and perhaps Ralf shouldn't, either?), so perhaps @kkraus14 or @vzhurba01 can approve/merge?

@leofang leofang merged commit 0dfae43 into NVIDIA:main Apr 22, 2025
75 checks passed
@leofang leofang deleted the fix_win_conda_nvvm branch April 22, 2025 20:03
@leofang
Copy link
Member Author

leofang commented Apr 22, 2025

Thanks, Ralf/Keith!

Copy link

Doc Preview CI
Preview removed because the pull request was closed or merged.

@leofang leofang added the to-be-backported Trigger the bot to raise a backport PR upon merge label Apr 22, 2025
Copy link

Backport failed for 11.8.x, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin 11.8.x
git worktree add -d .worktree/backport-563-to-11.8.x origin/11.8.x
cd .worktree/backport-563-to-11.8.x
git switch --create backport-563-to-11.8.x
git cherry-pick -x 0dfae43ac05f4520b3f9a800f6571704ea43842c

leofang added a commit to leofang/cuda-python that referenced this pull request Apr 23, 2025
Co-authored-by: Ralf W. Grosse-Kunstleve <[email protected]>
(cherry picked from commit 0dfae43)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cuda.bindings Everything related to the cuda.bindings module P0 High priority - Must do! to-be-backported Trigger the bot to raise a backport PR upon merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NVVM bindings not working on Windows + CUDA conda packages
3 participants