You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After merging #1446, dpt.tensor.sum became significantly slow (observing when running L2-norm benchmark for dpnp on PVC).
Before the PR:
importdpctl, dpctl.tensorasdpt, numpydpctl.__version__# Out: '0.15.1dev0+62.g2eba93eac'sh= (134217728, 3)
dt=numpy.float32a=dpt.ones(sh, dtype=dt)
%timeit_=dpt.sum(a, axis=1, dtype=dt)
# 6.67 ms ± 9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)%timeit_=dpt.sum(a, axis=1, dtype=dt)
# 6.64 ms ± 11.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
The new times:
importdpctl, dpctl.tensorasdpt, numpydpctl.__version__# Out: '0.15.1dev0+63.g03fd73794'sh= (134217728, 3)
dt=numpy.float32a=dpt.ones(sh, dtype=dt)
%timeit_=dpt.sum(a, axis=1, dtype=dt)
# 2.35 s ± 3.68 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)%timeit_=dpt.sum(a, axis=1, dtype=dt)
# 2.35 s ± 6.04 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Devices info:
$ python -m dpctl -f
Platform 0 ::
Name Intel(R) OpenCL
Version OpenCL 3.0 LINUX
Vendor Intel(R) Corporation
Backend opencl
Num Devices 1
# 0
Name Intel(R) Xeon(R) Platinum 8469 CPU @2.00GHz
Version 2023.16.6.0.22_223734
Filter string opencl:cpu:0
Platform 1 ::
Name Intel(R) OpenCL Graphics
Version OpenCL 3.0
Vendor Intel(R) Corporation
Backend opencl
Num Devices 1
# 0
Name Intel(R) Data Center GPU Max 1100
Version 23.35.27191.25
Filter string opencl:gpu:0
Platform 2 ::
Name Intel(R) FPGA Emulation Platform for OpenCL(TM)
Version OpenCL 1.2 Intel(R) FPGA SDK for OpenCL(TM), Version 20.3
Vendor Intel(R) Corporation
Backend opencl
Num Devices 1
# 0
Name Intel(R) FPGA Emulation Device
Version 2023.16.6.0.22_223734
Filter string opencl:accelerator:0
Platform 3 ::
Name Intel(R) Level-Zero
Version 1.3
Vendor Intel(R) Corporation
Backend ext_oneapi_level_zero
Num Devices 1
# 0
Name Intel(R) Data Center GPU Max 1100
Version 1.3.27191
Filter string level_zero:gpu:0
Host info:
$ uname -a
Linux DUT7050PVC 5.15.0-73-generic #80-Ubuntu SMP Mon May 15 15:18:26 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
The text was updated successfully, but these errors were encountered:
After merging #1446,
dpt.tensor.sum
became significantly slow (observing when running L2-norm benchmark for dpnp on PVC).Before the PR:
The new times:
Devices info:
Host info:
$ uname -a Linux DUT7050PVC 5.15.0-73-generic #80-Ubuntu SMP Mon May 15 15:18:26 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
The text was updated successfully, but these errors were encountered: