Support CUDA pinned memory in DataLoader

CUDA pinned memory is important for efficient execution because it allows for faster data transfers and non-blocking CUDA copies.

The copy from normal memory to pinned memory can take significant time. A batch of `256x3x224x224` FloatTensor takes about 110ms on my computer to copy. Currently we can only do the copy on the main process because inter-process shared Tensor/Storages are copied to non-page locked shared memory. For small conv nets on fast GPUs, we probably need to do the copy in the background.

I believe we can page-lock the shared memory via `cudaHostRegister`. We would probably need to unregister it via `cudaHostUnregister` before freeing the memory.

This would require some knowledge of CUDA in the shared memory code or at least a free hooks to call `cudaHostUnregister`. 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support CUDA pinned memory in DataLoader #139

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support CUDA pinned memory in DataLoader #139

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions