Fix support for NVVM from conda on Windows + other fixes #563

leofang · 2025-04-21T22:05:50Z

Description

Fix support for NVVM from conda on Windows
- closes NVVM bindings not working on Windows + CUDA conda packages #453.
- xref: https://github.com/NVIDIA/nvmath-python/blob/02d18a3b88c91432b9a4beef48c116c10121166c/nvmath/_utils.py#L177-L183
Fix DLL loading inconsistency for NVRTC (wheels should be tried first)
Fix potential loading logic mishandling in NVVM/nvJitLink (Fix support for NVVM from conda on Windows + other fixes #563 (comment))

Checklist

New or existing tests cover these changes.
The documentation is up to date with these changes.

copy-pr-bot · 2025-04-21T22:05:54Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

leofang · 2025-04-21T22:07:21Z

@rwgk do you still have your Windows instance alive? If so could you double check this locally (since we don't test against conda yet, #280)?

rwgk · 2025-04-21T22:16:56Z

@rwgk do you still have your Windows instance alive? If so could you double check this locally (since we don't test against conda yet, #280)?

I got myself a new instance (for 8 weeks, as you showed me) but only got as far as installing the CUDA driver. (And conda, but I haven't figured out yet how to activate it.)

This is a good motivation for me to continue working on the setup, I'll let you know by tonight (PT) how far I got.

leofang · 2025-04-22T01:18:22Z

/ok to test a14d95e

leofang · 2025-04-22T01:19:04Z

/ok to test a14d95e

rwgk · 2025-04-22T04:21:06Z

I see one of the tests here is failing, but I downloaded the wheel anyway to my Windows machine.

I used this command (miniforge3):

conda create -n ctk128 -c nvidia -c conda-forge python=3.12 cuda-toolkit=12.8.1

This zip file:

https://github.com/NVIDIA/cuda-python/actions/runs/14584592014/artifacts/2982569144

After extracting the .whl file:

(ctk128) PS C:\Users\rgrossekunst\Downloads> pip install cuda_bindings-12.8.0-cp312-cp312-win_amd64.whl
Processing c:\users\rgrossekunst\downloads\cuda_bindings-12.8.0-cp312-cp312-win_amd64.whl
Collecting pywin32 (from cuda-bindings==12.8.0)
  Downloading pywin32-310-cp312-cp312-win_amd64.whl.metadata (9.4 kB)
Downloading pywin32-310-cp312-cp312-win_amd64.whl (9.5 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 9.5/9.5 MB 59.2 MB/s eta 0:00:00
Installing collected packages: pywin32, cuda-bindings
Successfully installed cuda-bindings-12.8.0 pywin32-310
(ctk128) PS C:\Users\rgrossekunst\Downloads> pip freeze
cuda-bindings @ file:///C:/Users/rgrossekunst/Downloads/cuda_bindings-12.8.0-cp312-cp312-win_amd64.whl#sha256=134d06689b07782f0fde60aa23f35b69750b77a7c073853195d28f1572b3eff8
pywin32==310
setuptools==79.0.0
wheel==0.45.1
(ctk128) PS C:\Users\rgrossekunst\Downloads> python
Python 3.12.10 | packaged by conda-forge | (main, Apr 10 2025, 22:08:16) [MSC v.1943 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from cuda.bindings import nvvm
>>> nvvm.version()
(2, 0)
>>>

Looks like it works?

rwgk · 2025-04-22T04:36:50Z

Negative test under very similar conditions:

conda create -n ctk128nodev -c nvidia -c conda-forge python=3.12 cuda-toolkit=12.8.1

Using wheel from current main:

(ctk128nodev) PS C:\Users\rgrossekunst\Downloads\older\cuda-bindings-python312-cuda12.8.0-win-64-d425a8895bd778fffb993263dbf5d9bc631fea22> pip install .\cuda_bindings-12.8.0-cp312-cp312-win_amd64.whl
Processing c:\users\rgrossekunst\downloads\older\cuda-bindings-python312-cuda12.8.0-win-64-d425a8895bd778fffb993263dbf5d9bc631fea22\cuda_bindings-12.8.0-cp312-cp312-win_amd64.whl
Requirement already satisfied: pywin32 in c:\programdata\miniforge3\envs\ctk128nodev\lib\site-packages (from cuda-bindings==12.8.0) (310)
Installing collected packages: cuda-bindings
Successfully installed cuda-bindings-12.8.0
(ctk128nodev) PS C:\Users\rgrossekunst\Downloads\older\cuda-bindings-python312-cuda12.8.0-win-64-d425a8895bd778fffb993263dbf5d9bc631fea22> python
Python 3.12.10 | packaged by conda-forge | (main, Apr 10 2025, 22:08:16) [MSC v.1943 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from cuda.bindings import nvvm
>>> nvvm.version()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "cuda\\bindings\\nvvm.pyx", line 76, in cuda.bindings.nvvm.version
  File "cuda\\bindings\\nvvm.pyx", line 90, in cuda.bindings.nvvm.version
  File "cuda\\bindings\\cynvvm.pyx", line 15, in cuda.bindings.cynvvm.nvvmVersion
  File "cuda\\bindings\\_internal\\nvvm.pyx", line 258, in cuda.bindings._internal.nvvm._nvvmVersion
  File "cuda\\bindings\\_internal\\nvvm.pyx", line 115, in cuda.bindings._internal.nvvm._check_or_init_nvvm
  File "cuda\\bindings\\_internal\\nvvm.pyx", line 87, in cuda.bindings._internal.nvvm.load_library
RuntimeError: Failed to load nvvm
>>>

I think in combination with the previous comment, that's conclusive.

rwgk · 2025-04-22T04:46:10Z

The path in this PR is definitely correct:

(ctk128) PS C:\Users\rgrossekunst\Downloads> dir C:\ProgramData\miniforge3\envs\ctk128\Library\nvvm\bin

    Directory: C:\ProgramData\miniforge3\envs\ctk128\Library\nvvm\bin

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
-a----         2/21/2025   9:18 PM       67792896 cicc.exe
-a----         2/21/2025   9:18 PM       52873216 nvvm64_40_0.dll

(ctk128) PS C:\Users\rgrossekunst\Downloads> dir C:\ProgramData\miniforge3\envs\ctk128nodev\Library\nvvm\bin

    Directory: C:\ProgramData\miniforge3\envs\ctk128nodev\Library\nvvm\bin

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
-a----         2/21/2025   9:18 PM       67792896 cicc.exe
-a----         2/21/2025   9:18 PM       52873216 nvvm64_40_0.dll

leofang · 2025-04-22T04:59:07Z

/ok to test 21a7fb4

rwgk · 2025-04-22T05:04:58Z

I didn't push this, to not conflict with your work:

commit ceb62f1db9f07187c859f202b02a1e9910ae296f (HEAD -> fix_win_conda_nvvm)
Author: Ralf W. Grosse-Kunstleve <[email protected]>
Date:   Mon Apr 21 22:03:58 2025 -0700

    Fix two bugs: 1. "conda" needs to be skipped if CONDA_PREFIX is not defined (that is a new bug). 2. Existing indentation error (spotted by ChatGPT).

diff --git a/cuda_bindings/cuda/bindings/_internal/nvvm_windows.pyx b/cuda_bindings/cuda/bindings/_internal/nvvm_windows.pyx
index 9243bf07..44ec16f4 100644
--- a/cuda_bindings/cuda/bindings/_internal/nvvm_windows.pyx
+++ b/cuda_bindings/cuda/bindings/_internal/nvvm_windows.pyx
@@ -62,23 +62,26 @@ cdef load_library(const int driver_ver):
 
         # Next, check if DLLs are installed via pip or conda
         for sp in get_site_packages():
-            if sp == "conda" and "CONDA_PREFIX" in os.environ:
+            if sp == "conda":
                 # nvvm is not under $CONDA_PREFIX/lib, so it's not in the default search path
-                mod_path = os.path.join(os.environ["CONDA_PREFIX"], "Library", "nvvm", "bin")
+                conda_prefix = os.environ.get("CONDA_PREFIX")
+                if conda_prefix is None:
+                    continue
+                mod_path = os.path.join(conda_prefix, "Library", "nvvm", "bin")
             else:
                 mod_path = os.path.join(sp, "nvidia", "cuda_nvcc", "nvvm", "bin")
             if not os.path.isdir(mod_path):
                 continue
             os.add_dll_directory(mod_path)
-        try:
-            handle = win32api.LoadLibraryEx(
-                # Note: LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR needs an abs path...
-                os.path.join(mod_path, dll_name),
-                0, LOAD_LIBRARY_SEARCH_DEFAULT_DIRS | LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR)
-        except:
-            pass
-        else:
-            break
+            try:
+                handle = win32api.LoadLibraryEx(
+                    # Note: LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR needs an abs path...
+                    os.path.join(mod_path, dll_name),
+                    0, LOAD_LIBRARY_SEARCH_DEFAULT_DIRS | LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR)
+            except:
+                pass
+            else:
+                break
 
         # Finally, try default search
         try:

leofang · 2025-04-22T05:07:06Z

@rwgk Yes. I fixed conda #563 (comment), but then broke wheels #563 (comment) 😓 It turns out that I actually introduced a pedantic bug in all DLL loading logic (to cuda-bindings and nvmath-python) that's not discovered until we apply it to NVVM... Both should be fixed now with commit 21a7fb4.

We keep looping over all possible mod_path but we did not stop when hitting a valid one. Then, by the time we use it to assemble an absolute path to the DLL, it might not be a valid path. This is exposed because we added conda to the end of the loop only for NVVM, and the previous found (valid) wheel path got messed up. It's now fixed for all places where we do DLL loading. I think it's a bug unlikely to hit because without conda we only search among user/global site-packages, and it's unlikely. though not impossible, to find multiple copies of CUDA wheels there.

leofang · 2025-04-22T05:09:03Z

I didn't push this, to not conflict with your work

Feel free to push and I'll retest locally! @rwgk I think your patch is better in that we actually try to load in every loop iteration, instead of trying it after the first hit.

leofang · 2025-04-22T14:23:31Z

CI is green now. Let me apply your patch and then we can test/merge.

…nsistent between all three cases.

rwgk · 2025-04-22T15:23:00Z

/ok to test a0baf71

rwgk · 2025-04-22T15:29:11Z

To explain commit a0baf71:

Common to all cases (nvrtc, nvJitLink, nvvm): Ensure that mod_path is used only when it is defined. — Note that the loop over site-packages may have no iterations.
For nvrtc: The DLL search order is now consistent with that of nvJitLink, nvvm.
For nvvm: Avoid probing a non-sensical mod_path: os.path.join("cuda", "nvidia", "cuda_nvcc", "nvvm", "bin")

kkraus14 · 2025-04-22T15:37:49Z

cuda_bindings/cuda/bindings/_bindings/cynvrtc.pyx.in

+                        # Note: nvrtc64_120_0.dll calls into nvrtc-builtins64_*.dll which is
+                        # located in the same mod_path.
+                        # Update PATH environ so that the two dlls can find each other
+                        os.environ["PATH"] = os.pathsep.join((os.environ.get("PATH", ""), mod_path))


Should we be manually loading this dll instead of modifying PATH? Regardless, we don't need to fix this in this PR.

nvrtc is very special, because of the builtins runtime dependency.

In one of my ChatGPT chats about it a few days ago, it suggested strongly to do both, os.environ["PATH"] update, and os.add_dll_directory(mod_path). Therefore I'm carrying that into the path_finder.

I forgot to add: I somehow have it in my mind that @leofang made a remark that we shouldn't load the builtins ourselves. Leo, does that make sense?

Pre-loading a DLL works (it's what I did in nvmath). My reason against pre-loading is not because it does not work, but because I am not willing to maintain other libraries' implementation details (dlopen, which has no DT_NEEDED entry or package dependency metadata for us to inspect). This can easily go out-of-date as these libraries evolve.

Thanks Leo, I linked your comment here:

1344621

We can weigh the pros-and-cons of "soft dependency pre-loading" vs os.environ["PATH"] & os.add_dll_directory() management when the path_finder code is complete.

…site-package DLL was found. Using the most obvious approach to solve this problem: return immediately on success.

rwgk · 2025-04-22T16:04:07Z

/ok to test 03e6e4a

cuda_bindings/cuda/bindings/_bindings/cynvrtc.pyx.in

cuda_bindings/cuda/bindings/_internal/nvvm_windows.pyx

rwgk · 2025-04-22T16:51:55Z

/ok to test 1a05efc

leofang · 2025-04-22T17:26:39Z

FWIW I updated the PR title/description since we fixed multiple issues in this PR, which now LGTM but I should not self-approve (and perhaps Ralf shouldn't, either?), so perhaps @kkraus14 or @vzhurba01 can approve/merge?

leofang · 2025-04-22T20:04:10Z

Thanks, Ralf/Keith!

github-actions · 2025-04-22T20:22:43Z

Doc Preview CI
Preview removed because the pull request was closed or merged.

github-actions · 2025-04-22T21:50:20Z

Backport failed for 11.8.x, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin 11.8.x
git worktree add -d .worktree/backport-563-to-11.8.x origin/11.8.x
cd .worktree/backport-563-to-11.8.x
git switch --create backport-563-to-11.8.x
git cherry-pick -x 0dfae43ac05f4520b3f9a800f6571704ea43842c

Co-authored-by: Ralf W. Grosse-Kunstleve <[email protected]> (cherry picked from commit 0dfae43)

leofang added bug Something isn't working P0 High priority - Must do! cuda.bindings Everything related to the cuda.bindings module labels Apr 21, 2025

leofang added this to the cuda-python 12.9.0 & 11.8.7 milestone Apr 21, 2025

leofang requested a review from rwgk April 21, 2025 22:05

leofang self-assigned this Apr 21, 2025

leofang changed the title ~~Fix suport for NVVM from conda on Windows~~ Fix support for NVVM from conda on Windows Apr 21, 2025

leofang added 2 commits April 22, 2025 01:16

also search for conda nvvm on windows

0c71c35

fix comment for conda nvvm on linux

a14d95e

leofang force-pushed the fix_win_conda_nvvm branch from b86374a to a14d95e Compare April 22, 2025 01:17

This comment has been minimized.

Sign in to view

fix path loop & dll name

21a7fb4

Ensure mod_path is always defined when used. Make DLL search order co…

a0baf71

…nsistent between all three cases.

kkraus14 reviewed Apr 22, 2025

View reviewed changes

Fix bug in previous commit: need to break out of loop over suffix if …

03e6e4a

…site-package DLL was found. Using the most obvious approach to solve this problem: return immediately on success.

rwgk force-pushed the fix_win_conda_nvvm branch from f88cbf3 to 03e6e4a Compare April 22, 2025 16:03

leofang commented Apr 22, 2025

View reviewed changes

cuda_bindings/cuda/bindings/_bindings/cynvrtc.pyx.in Outdated Show resolved Hide resolved

cuda_bindings/cuda/bindings/_internal/nvvm_windows.pyx Outdated Show resolved Hide resolved

leofang changed the title ~~Fix support for NVVM from conda on Windows~~ Fix support for NVVM from conda on Windows + Fix DLL loading inconsistency for NVRTC Apr 22, 2025

rwgk added 2 commits April 22, 2025 09:24

Move LOAD_LIBRARY_SEARCH_* constants outside loop.

98d7a91

Fix oversight (forgot to replace one assignment with return)

1a05efc

leofang changed the title ~~Fix support for NVVM from conda on Windows + Fix DLL loading inconsistency for NVRTC~~ Fix support for NVVM from conda on Windows + other fixes Apr 22, 2025

kkraus14 approved these changes Apr 22, 2025

View reviewed changes

leofang merged commit 0dfae43 into NVIDIA:main Apr 22, 2025
75 checks passed

leofang deleted the fix_win_conda_nvvm branch April 22, 2025 20:03

leofang added the to-be-backported Trigger the bot to raise a backport PR upon merge label Apr 22, 2025

leofang added a commit to leofang/cuda-python that referenced this pull request Apr 23, 2025

Fix support for NVVM from conda on Windows + other fixes (NVIDIA#563)

e027b05

Co-authored-by: Ralf W. Grosse-Kunstleve <[email protected]> (cherry picked from commit 0dfae43)

leofang mentioned this pull request Apr 23, 2025

[Backport] Fix support for NVVM from conda on Windows + other fixes #574

Merged

2 tasks

Fix support for NVVM from conda on Windows + other fixes #563

Fix support for NVVM from conda on Windows + other fixes #563

Uh oh!

Conversation

leofang commented Apr 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

copy-pr-bot bot commented Apr 21, 2025

Uh oh!

leofang commented Apr 21, 2025

Uh oh!

rwgk commented Apr 21, 2025

Uh oh!

leofang commented Apr 22, 2025

Uh oh!

leofang commented Apr 22, 2025

Uh oh!

This comment has been minimized.

rwgk commented Apr 22, 2025

Uh oh!

rwgk commented Apr 22, 2025

Uh oh!

rwgk commented Apr 22, 2025

Uh oh!

leofang commented Apr 22, 2025

Uh oh!

rwgk commented Apr 22, 2025

Uh oh!

leofang commented Apr 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leofang commented Apr 22, 2025

Uh oh!

leofang commented Apr 22, 2025

Uh oh!

rwgk commented Apr 22, 2025

Uh oh!

rwgk commented Apr 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kkraus14 Apr 22, 2025

Choose a reason for hiding this comment

Uh oh!

rwgk Apr 22, 2025

Choose a reason for hiding this comment

Uh oh!

rwgk Apr 22, 2025

Choose a reason for hiding this comment

Uh oh!

leofang Apr 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rwgk Apr 22, 2025

Choose a reason for hiding this comment

Uh oh!

rwgk commented Apr 22, 2025

Uh oh!

Uh oh!

Uh oh!

rwgk commented Apr 22, 2025

Uh oh!

leofang commented Apr 22, 2025

Uh oh!

Uh oh!

leofang commented Apr 22, 2025

Uh oh!

github-actions bot commented Apr 22, 2025

Uh oh!

github-actions bot commented Apr 22, 2025

Uh oh!

Uh oh!

leofang commented Apr 21, 2025 •

edited

Loading

leofang commented Apr 22, 2025 •

edited

Loading

rwgk commented Apr 22, 2025 •

edited

Loading

leofang Apr 22, 2025 •

edited

Loading