Skip to content

clean up device initialization in test #507

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Mar 12, 2025

Conversation

ksimpson-work
Copy link
Contributor

@ksimpson-work ksimpson-work commented Mar 11, 2025

I suspect the issue was related to the fact that we had a dangling device initialization. In any case we should test against our actual guarantee which is that Device() is current device when device_ordinal is unsupplied

Copy link
Contributor

copy-pr-bot bot commented Mar 11, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@ksimpson-work
Copy link
Contributor Author

/ok to test

rwgk
rwgk previously approved these changes Mar 11, 2025

This comment has been minimized.

@leofang leofang added bug Something isn't working P0 High priority - Must do! test Improvements or additions to tests cuda.core Everything related to the cuda.core module labels Mar 12, 2025
@leofang leofang added this to the cuda.core beta 3 milestone Mar 12, 2025
@leofang
Copy link
Member

leofang commented Mar 12, 2025

I think we miss an import.

@rwgk
Copy link
Collaborator

rwgk commented Mar 12, 2025

I only have a minute right now. Just quick, I saw this in one of the failing tests:

______________________________ test_device_alloc _______________________________

deinit_cuda = None

    def test_device_alloc(deinit_cuda):
        device = Device()
        device.set_current()
        buffer = device.allocate(1024)
        device.sync()
        assert buffer.handle != 0
        assert buffer.size == 1024
>       assert buffer.device_id == cuda.cudaGetDevice()

tests/test_device.py:53: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

name = 'cudaGetDevice'

    def __getattr__(name):
        if name == "__version__":
            import warnings
            warnings.warn("accessing cuda.__version__ is deprecated, "
                          "please switch to use cuda.bindings.__version__ instead",
                          DeprecationWarning, stacklevel=2)
            from . import bindings
            return bindings.__version__
    
>       raise AttributeError(f"module {__name__!r} has no attribute {name!r}")
E       AttributeError: module 'cuda' has no attribute 'cudaGetDevice'

/opt/hostedtoolcache/Python/3.9.21/x64/lib/python3.9/site-packages/cuda/__init__.py:10: AttributeError

I see this elsewhere:

handle_return(runtime.cudaGetDevice())

We have the runtime and handle_return imports already in test_device.py

Maybe @leofang saw something else?

@ksimpson-work
Copy link
Contributor Author

@rwgk You're totally right, I didn't test locally before running the CI since it was so small but I missed some obvious issues

@ksimpson-work
Copy link
Contributor Author

/ok to test

rwgk
rwgk previously approved these changes Mar 12, 2025
buffer = device.allocate(1024)
device.sync()
assert buffer.handle != 0
assert buffer.size == 1024
assert buffer.device_id == 0
assert buffer.device_id == handle_return(runtime.cudaGetDevice())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this with a fresh eye, I'm wondering:

  • We're in cuda.core but we're reaching back into a cuda.bindings API (runtime). Do we have to?
  • Could/should we do this instead? (does it matter?)
    assert buffer.device_id == int(device)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that also works. I'll add it, and also a new test for the device id against our guarantee

@ksimpson-work
Copy link
Contributor Author

/ok to test

@ksimpson-work ksimpson-work enabled auto-merge (squash) March 12, 2025 19:02
Copy link
Collaborator

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reapproving, assuming tests pass.

The new test looks great.

@ksimpson-work ksimpson-work merged commit 8cda903 into NVIDIA:main Mar 12, 2025
73 checks passed
Copy link

Doc Preview CI
Preview removed because the pull request was closed or merged.

@ksimpson-work ksimpson-work deleted the bug-device-ordinal branch March 12, 2025 22:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cuda.core Everything related to the cuda.core module P0 High priority - Must do! test Improvements or additions to tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants