Skip to content

Sensible performance degradation in dpt.tensor.sum #1461

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
antonwolfy opened this issue Oct 30, 2023 · 0 comments · Fixed by #1463
Closed

Sensible performance degradation in dpt.tensor.sum #1461

antonwolfy opened this issue Oct 30, 2023 · 0 comments · Fixed by #1463

Comments

@antonwolfy
Copy link
Collaborator

After merging #1446, dpt.tensor.sum became significantly slow (observing when running L2-norm benchmark for dpnp on PVC).
Before the PR:

import dpctl, dpctl.tensor as dpt, numpy

dpctl.__version__
# Out: '0.15.1dev0+62.g2eba93eac'

sh = (134217728, 3)
dt = numpy.float32
a = dpt.ones(sh, dtype=dt)

%timeit _ = dpt.sum(a, axis=1, dtype=dt)
# 6.67 ms ± 9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit _ = dpt.sum(a, axis=1, dtype=dt)
# 6.64 ms ± 11.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

The new times:

import dpctl, dpctl.tensor as dpt, numpy

dpctl.__version__
# Out: '0.15.1dev0+63.g03fd73794'

sh = (134217728, 3)
dt = numpy.float32
a = dpt.ones(sh, dtype=dt)

%timeit _ = dpt.sum(a, axis=1, dtype=dt)
# 2.35 s ± 3.68 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit _ = dpt.sum(a, axis=1, dtype=dt)
# 2.35 s ± 6.04 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Devices info:

$ python -m dpctl -f
Platform  0 ::
    Name        Intel(R) OpenCL
    Version     OpenCL 3.0 LINUX
    Vendor      Intel(R) Corporation
    Backend     opencl
    Num Devices 1
      # 0
        Name                Intel(R) Xeon(R) Platinum 8469 CPU @2.00GHz
        Version             2023.16.6.0.22_223734
        Filter string       opencl:cpu:0
Platform  1 ::
    Name        Intel(R) OpenCL Graphics
    Version     OpenCL 3.0
    Vendor      Intel(R) Corporation
    Backend     opencl
    Num Devices 1
      # 0
        Name                Intel(R) Data Center GPU Max 1100
        Version             23.35.27191.25
        Filter string       opencl:gpu:0
Platform  2 ::
    Name        Intel(R) FPGA Emulation Platform for OpenCL(TM)
    Version     OpenCL 1.2 Intel(R) FPGA SDK for OpenCL(TM), Version 20.3
    Vendor      Intel(R) Corporation
    Backend     opencl
    Num Devices 1
      # 0
        Name                Intel(R) FPGA Emulation Device
        Version             2023.16.6.0.22_223734
        Filter string       opencl:accelerator:0
Platform  3 ::
    Name        Intel(R) Level-Zero
    Version     1.3
    Vendor      Intel(R) Corporation
    Backend     ext_oneapi_level_zero
    Num Devices 1
      # 0
        Name                Intel(R) Data Center GPU Max 1100
        Version             1.3.27191
        Filter string       level_zero:gpu:0

Host info:

$ uname -a
Linux DUT7050PVC 5.15.0-73-generic #80-Ubuntu SMP Mon May 15 15:18:26 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant