Skip to content

Feature request: allow to compile CuPy code #2075

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
certik opened this issue Jul 2, 2023 · 2 comments
Open

Feature request: allow to compile CuPy code #2075

certik opened this issue Jul 2, 2023 · 2 comments

Comments

@certik
Copy link
Contributor

certik commented Jul 2, 2023

See #1964 (comment).

The idea would be to support CuPy just like NumPy. The missing feature is probably some kind of a "device" annotation for arrays, we would represent them in ASR and then in the backend we would use Cuda, probably using the mechanism from #1996.

Relevant: #1458

@namannimmo10
Copy link
Collaborator

Hi Ondrej,

sorry for the delay. Here is a very simple comparison that I did on a Tesla V100S-PCIE-32GB GPU that shows the power of cuda kernels over large arrays.

$ cat np_vs_cp.py 
import numpy as np
import cupy as cp
import time

# NumPy implementation
x = np.random.rand(1_000_000_000)
y = np.random.rand(1_000_000_000)
start = time.time()
result = np.sum((x - y) ** 2)
end = time.time()
print("NumPy time:", end - start)

# CuPy implementation
x = cp.random.rand(1_000_000_000)
y = cp.random.rand(1_000_000_000)
start = time.time()
result = cp.sum((x - y) ** 2)
end = time.time()
print("CuPy time: ", end - start)


squared_diff_kernel = cp.ReductionKernel(
    "T x, T y",
    "T result",
    "(x - y) * (x - y)",
    "a + b",
    "result = a",
    "0",
    "squared_diff_kernel",
)
start = time.time()
result = squared_diff_kernel(x, y)
end = time.time()
print("CuPy kernel time:", end - start)
$
$ python np_vs_cp.py
NumPy time: 4.179391860961914
CuPy time:  0.3070840835571289
CuPy kernel time: 0.06063723564147949
$

A simple reduction kernel implementation is almost 70 times faster than the numpy implementation. Hope this helps.

@certik
Copy link
Contributor Author

certik commented Jul 17, 2023

Nice. I think our (to be written) CUDA backend should be able to create an equivalent of the custom kernel, since it knows all the information at compile time to do it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants