Skip to content

Conversation

@Andy-Jost
Copy link
Contributor

@Andy-Jost Andy-Jost commented Oct 31, 2025

Major refactoring of the memory package.

Overview
This PR refactors the _memory.pyx module into a dedicated package (_memory/) to address its growing size and complexity, which were hindering further development. The primary goals are to physically separate the code into more manageable submodules, simplify the internal logic, and enhance the overall structure, including the addition of .pxd headers for better Cython integration.

Major Changes

  • Split _memory.pyx into submodules, the major ones being the following:
    • Buffers: _buffer.*
    • Device memory resources: _dmr.*
    • IPC (Inter-Process Communication): _ipc.*
    • Virtual memory management: _vmm.*
  • Introduced Cython headers (.pxd) for public definitions to improve modularity and type safety.
  • Refactored DeviceMemoryResource to isolate IPC-related code, reducing coupling.
  • Simplified IPC implementation by adding an IPCData class to encapsulate relevant data members and eliminating a redundant uuid field.
  • Streamlined the class hierarchy by removing unnecessary classes.
  • Simplified the Cython interface for memory allocation and deallocation operations.

Minor Improvements

  • Added __all__ lists to modules for explicit control over exports.
  • Extracted long implementation functions from class definitions to make classes more concise and readable.
  • Renamed various private attributes and methods for consistency (e.g., _handle instead of _mempool_handle).
  • Consolidated and alphabetized property definitions for better organization.
  • Converted additional classes and functions to Cython for performance gains.

@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Oct 31, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@Andy-Jost Andy-Jost requested review from cpcloud, leofang and mdboom and removed request for cpcloud October 31, 2025 22:41
@Andy-Jost Andy-Jost force-pushed the memory-refactor branch 2 times, most recently from feda70e to 52164c0 Compare November 3, 2025 17:30
@Andy-Jost
Copy link
Contributor Author

/ok to test f13a44e

@github-actions
Copy link

github-actions bot commented Nov 3, 2025

@Andy-Jost
Copy link
Contributor Author

/ok to test 7c97d22

@rparolin
Copy link
Collaborator

rparolin commented Nov 4, 2025

@Andy-Jost To frame the code review, can you fill in the PR description with more details about what the goals of the refactor were.

Copy link
Contributor Author

@Andy-Jost Andy-Jost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is a beast. I tried to leave some helpful comments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file assembles the _memory package by combining the public elements of each submodule. This should match the public interface of the old _memory.pyx module.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file contains Cython declarations related to Buffer. I also put the declaration of MemoryResource here because I couldn't find a better place.

Classes _cyBuffer and _cyMemoryResource were eliminated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file contains the implementation of Buffer.

Comment on lines 44 to 49
def _clear(self):
self._ptr = 0
self._size = 0
self._mr = None
self._ptr_obj = None
self._alloc_stream = None
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used by VMM. I did not try to change the logic of that code.

stream: Stream = None
) -> Buffer:
"""Import a buffer that was exported from another process."""
return _ipc.Buffer_from_ipc_descriptor(cls, mr, ipc_buffer, stream)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation of IPC functions has been moved to the _ipc module.

raise_if_driver_error(res2)

# Invalidate the old buffer so its destructor won't try to free again
buf._clear()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the only change to the VMM code. Cf. _memory.pyx:1432-5

)
)
if attr == 1:
from cuda.core.experimental._memory import DeviceMemoryResource
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this import should be delayed (like this) to avoid a circular dependency.

from cuda.core.experimental._memory import DeviceMemoryResource
device._mr = DeviceMemoryResource(dev_id)
else:
from cuda.core.experimental._memory import _SynchronousMemoryResource
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto


@memory_resource.setter
def memory_resource(self, mr):
from cuda.core.experimental._memory import MemoryResource
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

# Receive the memory resource.
handle = mp.reduction.recv_handle(conn)
mr = DeviceMemoryResource.from_allocation_handle(device, handle)
os.close(handle)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small functional change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 0d5f08b

@Andy-Jost
Copy link
Contributor Author

/ok to test 0fac800

@Andy-Jost
Copy link
Contributor Author

/ok to test 0d5f08b

@Andy-Jost
Copy link
Contributor Author

/ok to test 567ea2c

@Andy-Jost Andy-Jost added the cuda.core Everything related to the cuda.core module label Nov 4, 2025
def _clear(self):
self._ptr = 0
self._size = 0
self._mr = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Consider renaming _mr -> _memory_resource or mem_resource.

stream: Stream | None = None
):
cdef Buffer self = Buffer.__new__(cls)
self._ptr = <intptr_t>(int(ptr))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be a uintptr_t or a uint64_t?

@Andy-Jost
Copy link
Contributor Author

/ok to test cf4dc9d

@leofang leofang added this to the cuda.core beta 9 milestone Nov 10, 2025
@leofang leofang added enhancement Any code-related improvements P0 High priority - Must do! labels Nov 10, 2025
@Andy-Jost
Copy link
Contributor Author

/ok to test 19e4b8f

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.core Everything related to the cuda.core module enhancement Any code-related improvements P0 High priority - Must do!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants