-
Notifications
You must be signed in to change notification settings - Fork 162
Fix support for NVVM from conda on Windows + other fixes #563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I got myself a new instance (for 8 weeks, as you showed me) but only got as far as installing the CUDA driver. (And conda, but I haven't figured out yet how to activate it.) This is a good motivation for me to continue working on the setup, I'll let you know by tonight (PT) how far I got. |
b86374a
to
a14d95e
Compare
/ok to test a14d95e |
/ok to test a14d95e |
This comment has been minimized.
This comment has been minimized.
I see one of the tests here is failing, but I downloaded the wheel anyway to my Windows machine. I used this command (miniforge3):
This zip file: After extracting the .whl file:
Looks like it works? |
Negative test under very similar conditions:
Using wheel from current
I think in combination with the previous comment, that's conclusive. |
The path in this PR is definitely correct:
|
/ok to test 21a7fb4 |
I didn't push this, to not conflict with your work: commit ceb62f1db9f07187c859f202b02a1e9910ae296f (HEAD -> fix_win_conda_nvvm)
Author: Ralf W. Grosse-Kunstleve <[email protected]>
Date: Mon Apr 21 22:03:58 2025 -0700
Fix two bugs: 1. "conda" needs to be skipped if CONDA_PREFIX is not defined (that is a new bug). 2. Existing indentation error (spotted by ChatGPT).
diff --git a/cuda_bindings/cuda/bindings/_internal/nvvm_windows.pyx b/cuda_bindings/cuda/bindings/_internal/nvvm_windows.pyx
index 9243bf07..44ec16f4 100644
--- a/cuda_bindings/cuda/bindings/_internal/nvvm_windows.pyx
+++ b/cuda_bindings/cuda/bindings/_internal/nvvm_windows.pyx
@@ -62,23 +62,26 @@ cdef load_library(const int driver_ver):
# Next, check if DLLs are installed via pip or conda
for sp in get_site_packages():
- if sp == "conda" and "CONDA_PREFIX" in os.environ:
+ if sp == "conda":
# nvvm is not under $CONDA_PREFIX/lib, so it's not in the default search path
- mod_path = os.path.join(os.environ["CONDA_PREFIX"], "Library", "nvvm", "bin")
+ conda_prefix = os.environ.get("CONDA_PREFIX")
+ if conda_prefix is None:
+ continue
+ mod_path = os.path.join(conda_prefix, "Library", "nvvm", "bin")
else:
mod_path = os.path.join(sp, "nvidia", "cuda_nvcc", "nvvm", "bin")
if not os.path.isdir(mod_path):
continue
os.add_dll_directory(mod_path)
- try:
- handle = win32api.LoadLibraryEx(
- # Note: LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR needs an abs path...
- os.path.join(mod_path, dll_name),
- 0, LOAD_LIBRARY_SEARCH_DEFAULT_DIRS | LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR)
- except:
- pass
- else:
- break
+ try:
+ handle = win32api.LoadLibraryEx(
+ # Note: LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR needs an abs path...
+ os.path.join(mod_path, dll_name),
+ 0, LOAD_LIBRARY_SEARCH_DEFAULT_DIRS | LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR)
+ except:
+ pass
+ else:
+ break
# Finally, try default search
try: |
@rwgk Yes. I fixed conda #563 (comment), but then broke wheels #563 (comment) 😓 It turns out that I actually introduced a pedantic bug in all DLL loading logic (to cuda-bindings and nvmath-python) that's not discovered until we apply it to NVVM... Both should be fixed now with commit 21a7fb4. We keep looping over all possible |
Feel free to push and I'll retest locally! @rwgk I think your patch is better in that we actually try to load in every loop iteration, instead of trying it after the first hit. |
CI is green now. Let me apply your patch and then we can test/merge. |
…nsistent between all three cases.
/ok to test a0baf71 |
To explain commit a0baf71:
|
# Note: nvrtc64_120_0.dll calls into nvrtc-builtins64_*.dll which is | ||
# located in the same mod_path. | ||
# Update PATH environ so that the two dlls can find each other | ||
os.environ["PATH"] = os.pathsep.join((os.environ.get("PATH", ""), mod_path)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be manually loading this dll instead of modifying PATH
? Regardless, we don't need to fix this in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvrtc is very special, because of the builtins runtime dependency.
In one of my ChatGPT chats about it a few days ago, it suggested strongly to do both, os.environ["PATH"]
update, and os.add_dll_directory(mod_path)
. Therefore I'm carrying that into the path_finder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot to add: I somehow have it in my mind that @leofang made a remark that we shouldn't load the builtins
ourselves. Leo, does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pre-loading a DLL works (it's what I did in nvmath). My reason against pre-loading is not because it does not work, but because I am not willing to maintain other libraries' implementation details (dlopen, which has no DT_NEEDED
entry or package dependency metadata for us to inspect). This can easily go out-of-date as these libraries evolve.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Leo, I linked your comment here:
We can weigh the pros-and-cons of "soft dependency pre-loading" vs os.environ["PATH"]
& os.add_dll_directory()
management when the path_finder code is complete.
…site-package DLL was found. Using the most obvious approach to solve this problem: return immediately on success.
f88cbf3
to
03e6e4a
Compare
/ok to test 03e6e4a |
/ok to test 1a05efc |
FWIW I updated the PR title/description since we fixed multiple issues in this PR, which now LGTM but I should not self-approve (and perhaps Ralf shouldn't, either?), so perhaps @kkraus14 or @vzhurba01 can approve/merge? |
Thanks, Ralf/Keith! |
|
Backport failed for Please cherry-pick the changes locally and resolve any conflicts. git fetch origin 11.8.x
git worktree add -d .worktree/backport-563-to-11.8.x origin/11.8.x
cd .worktree/backport-563-to-11.8.x
git switch --create backport-563-to-11.8.x
git cherry-pick -x 0dfae43ac05f4520b3f9a800f6571704ea43842c |
Co-authored-by: Ralf W. Grosse-Kunstleve <[email protected]> (cherry picked from commit 0dfae43)
Description
Checklist