-
Notifications
You must be signed in to change notification settings - Fork 162
[DOCS] Clarify that cooperative is for device, parallel for host #522
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I'm not sure if this helps move things in the right direction towards a clear and succinct summary of these two modules, but I'm interested in others' thoughts.
Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
README.md
Outdated
* [cuda.cooperative](https://nvidia.github.io/cccl/cuda_cooperative/): Pythonic exposure of CUB cooperative algorithms | ||
* [cuda.parallel](https://nvidia.github.io/cccl/cuda_parallel/): Pythonic exposure of Thrust parallel algorithms | ||
* [cuda.cooperative](https://nvidia.github.io/cccl/cuda_cooperative/): Exposure of CUB cooperative algorithms for use in Python kernels | ||
* [cuda.parallel](https://nvidia.github.io/cccl/cuda_parallel/): Pythonic exposure of Thrust parallel algorithms launched from the host |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question, based on this observation:
$ cd cccl/c/parallel
$ git grep -i '#include.*cub' | wc -l
44
$ git grep -i '#include.*thrust' | wc -l
0
I.e., currently thrust headers aren't used at all in cuda.parallel.
Could it be better to write
Pythonic exposure of CCCL parallel algorithms launched from the host
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about being a bit more explicit here?
-
cuda.parallel: A Python package offering highly efficient and customizable implementations of parallel algorithms like sort, reduce, scan, transform, etc.,
-
cuda.cooperative: A Python package providing CUB's reusable block-wide and warp-wide primitives for use within Numba CUDA kernels.
I like putting cuda.parallel
first as it's probably a more recognizable feature set compared to cuda.cooperative
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like Ashwin's version. Maybe replace "offering" with "for easy access to"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind for easy access to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currently thrust headers aren't used at all in cuda.parallel
An accurate reflection of implementation detail in the package name was not the intent. On the GPU side, everything is actually CUB and Thrust is a sort of C++ friendly wrappers on top, so the intent was to capture the spirit that we get to answer the question "I need a sort that can be launched from host like thrust::sort
" with the answer "no problem, use cuda.parallel.sort
."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a big improvement!
cuda_python/docs/source/index.rst
Outdated
- `cuda.cooperative`_: Exposure of CUB cooperative algorithms for use in Python kernels | ||
- `cuda.parallel`_: Pythonic exposure of Thrust parallel algorithms launched from the host |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we adopt the new README.md wording here, too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adopted.
README.md
Outdated
* [cuda.cooperative](https://nvidia.github.io/cccl/cuda_cooperative/): Pythonic exposure of CUB cooperative algorithms | ||
* [cuda.parallel](https://nvidia.github.io/cccl/cuda_parallel/): Pythonic exposure of Thrust parallel algorithms | ||
* [cuda.cooperative](https://nvidia.github.io/cccl/cuda_cooperative/): A Python package for easy access to highly efficient and customizable parallel algorithms, like `sort`, `scan`, `reduce`, `transform`, etc. | ||
* [cuda.parallel](https://nvidia.github.io/cccl/cuda_parallel/): A Python package providing CUB's reusable block-wide and warp-wide primitives for use within Numba CUDA kernels. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No dot at end (for self-consistency within this file)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
Since we are touching these two files, I am thinking to also add numba-cuda to the list. Thoughts? |
/ok to test |
This comment has been minimized.
This comment has been minimized.
|
I'm not sure if this helps move things in the right direction towards a clear and succinct summary of these two modules, but I'm interested in others' thoughts.