Skip to content

Test array-api-tests and add ones_like #23

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Jun 28, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions _doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ well as to execute it.

tutorial/index
api/index
tech/index
auto_examples/index
../CHANGELOGS

Expand Down
113 changes: 113 additions & 0 deletions _doc/tech/aapi.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@

Difficulty to implement an an Array API for ONNX
================================================

Implementing the full array API is not always easy with :epkg:`onnx`.
Python is not strongly typed and many different types can be used
to represent a value. Argument *axis* can be an integer or a tuple
(see `min from Array API
<https://data-apis.org/array-api/2022.12/API_specification/
generated/array_api.min.html>`
for example). On the other side, `ReduceMin from ONNX
<https://onnx.ai/onnx/operators/onnx__ReduceMin.html>`_
is considered as a tensor.

Performance
+++++++++++

The Array API must work in eager mode and for every operation,
it generates an ONNX graph and executes it with a specific
backend. It can be :epkg:`numpy`, :epkg:`onnxruntime` or any other
backend. The generation of every graph takes a significant amount of time.
It must be avoided. These graphs are cached. But a graph can be reused
only if the inputs - by ONNX semantic - change. If a parameter change,
a new graph must be cached. Method :meth:`JitEager.make_key`
generates a unique key based on the input it receives,
the signature of the function to call. If the key is the same,
a cached onnx can be reused on the second call.

However, eager mode - use a small single onnx graph for every operation -
is not the most efficient one. At the same time, the design must allow
to merge every needed operation into a bigger graph.
Bigger graphs can be more easily optimized by the backend.

Input vs parameter
++++++++++++++++++

An input is a tensor or array, a parameter is any other type.
Following onnx semantic, an input is variable, a parameter is frozen
cannot be changed. It is a constant. A good design would be
to considered any named input (`**kwargs`) a parameter and
any input (`*args`) a tensor. But the Array API does not follow that
design. Function `astype
<https://data-apis.org/array-api/2022.12/API_specification/
generated/array_api.astype.html>_`
takes two inputs. Operator `Cast
<https://onnx.ai/onnx/operators/onnx__Cast.html>_`
takes one input and a frozen parameter `to`.
And python allows `astype(x, dtype)` as well as `astype(x, dtype=dtype)`
unless the signature enforces one call over another type.
There may be ambiguities from time to time.
Beside, from onnx point of view, argument dtype should be named.

Tensor type
+++++++++++

An :class:`EagerTensor` must be used to represent any tensor.
This class defines the backend to use as well.
`EagerNumpyTensor` for :epkg:`numpy`, `EagerOrtTensor`
for :epkg:`onnxruntime`. Since the Array API is new,
existing packages do not fully support the API if they support it
(:epkg:`scikit-learn`). Some numpy array may still be used.

Inplace
+++++++

ONNX has no notion of inplace computation. Therefore something
like `coefs[:, 1] = 1` is not valid unless some code is written
to create another tensor. The current design supports some of these
by storing every call to `__setitem__`. The user sees `coefs`
but the framework sees that `coefs` holds a reference to another
tensor. That's the one the framework needs to use. However, since
`__setitem__` is used for efficiency, it becomes less than efficient
with this design and should be avoided. This assumption may be true
when the backend is relying on CPU but not on GPU.
A function such as `empty
<https://data-apis.org/array-api/2022.12/API_specification/
generated/array_api.astype.html>`_ should be avoided as it
has to be followed by calls to `__setitem__`.

Eager or compilation
++++++++++++++++++++

Eager mode is what the Array API implies.
Every function is converted into an ONNX graph based
on its inputs without any knownledge of how these inputs
were obtained. This graph is then executed before going
to the next call of a function from the API.
The conversion of a machine learned model
into ONNX implies the gathering of all these operations
into a graph. It means using a mode that records all the function
calls to compile every tiny onnx graph into a unique graph.

Iterators and Reduction
+++++++++++++++++++++++

An efficient implementation of function
:func:`numpy.any` or :func:`numpy.all` returns
as soon as the result is known. :func:`numpy.all` is
false whenever the first false condition is met.
Same goes for :func:`numpy.any` which is true
whenever the first true condition is met.
There is no such operator in ONNX (<= 20) because
it is unlikely to appear in a machine learned model.
However, it is highly used when two results are
compared in unit tests. The ONNX implementation is
not efficient due to that reason but it only impacts
the unit tests.

Types
+++++

:epkg:`onnx` supports more types than :epkg:`numpy` does.
It is not always easy to deal with bfloat16 or float8 types.
7 changes: 7 additions & 0 deletions _doc/tech/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Technical Details
=================

.. toctree::
:maxdepth: 2

aapi
5 changes: 3 additions & 2 deletions _unittests/onnx-numpy-skips.txt
Original file line number Diff line number Diff line change
@@ -1,13 +1,14 @@
# API failures
# see https://github.com/data-apis/array-api-tests/blob/master/numpy-skips.txt
array_api_tests/test_creation_functions.py::test_asarray_scalars
# array_api_tests/test_creation_functions.py::test_arange
array_api_tests/test_creation_functions.py::test_arange
array_api_tests/test_creation_functions.py::test_asarray_arrays
array_api_tests/test_creation_functions.py::test_empty
array_api_tests/test_creation_functions.py::test_empty_like
array_api_tests/test_creation_functions.py::test_eye
array_api_tests/test_creation_functions.py::test_full_like
array_api_tests/test_creation_functions.py::test_linspace
array_api_tests/test_creation_functions.py::test_meshgrid
array_api_tests/test_creation_functions.py::test_ones_like
# Issue with CastLike and bfloat16 on onnx <= 1.15.0
# array_api_tests/test_creation_functions.py::test_ones_like
array_api_tests/test_creation_functions.py::test_zeros_like
4 changes: 2 additions & 2 deletions _unittests/test_array_api.sh
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
export ARRAY_API_TESTS_MODULE=onnx_array_api.array_api.onnx_numpy
pytest ../array-api-tests/array_api_tests/test_creation_functions.py::test_arange || exit 1
pytest -v -rxXfE ../array-api-tests/array_api_tests/test_creation_functions.py::test_ones_like || exit 1
# pytest ../array-api-tests/array_api_tests/test_creation_functions.py --help
pytest ../array-api-tests/array_api_tests/test_creation_functions.py --hypothesis-explain --skips-file=_unittests/onnx-numpy-skips.txt || exit 1
pytest -v -rxXfE ../array-api-tests/array_api_tests/test_creation_functions.py --hypothesis-explain --skips-file=_unittests/onnx-numpy-skips.txt || exit 1
2 changes: 1 addition & 1 deletion _unittests/ut_array_api/test_array_apis.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ def test_zeros_numpy_1(self):
def test_zeros_ort_1(self):
c = xpo.zeros(1)
d = c.numpy()
self.assertEqualArray(np.array([0], dtype=np.float32), d)
self.assertEqualArray(np.array([0], dtype=np.float64), d)

def test_ffinfo(self):
dt = np.float32
Expand Down
120 changes: 120 additions & 0 deletions _unittests/ut_array_api/test_hypothesis_array_api.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
import unittest
import warnings
from os import getenv
from functools import reduce
from operator import mul
from hypothesis import given
from onnx_array_api.ext_test_case import ExtTestCase
from onnx_array_api.array_api import onnx_numpy as onxp
from hypothesis import strategies
from hypothesis.extra import array_api


def prod(seq):
return reduce(mul, seq, 1)


@strategies.composite
def array_api_kwargs(draw, **kw):
result = {}
for k, strat in kw.items():
if draw(strategies.booleans()):
result[k] = draw(strat)
return result


def shapes(xp, **kw):
kw.setdefault("min_dims", 0)
kw.setdefault("min_side", 0)

def sh(x):
return x

return xp.array_shapes(**kw).filter(
lambda shape: prod(i for i in sh(shape) if i)
< TestHypothesisArraysApis.MAX_ARRAY_SIZE
)


class TestHypothesisArraysApis(ExtTestCase):
MAX_ARRAY_SIZE = 10000
VERSION = "2021.12"

@classmethod
def setUpClass(cls):
with warnings.catch_warnings():
warnings.simplefilter("ignore")
from numpy import array_api as xp

api_version = getenv(
"ARRAY_API_TESTS_VERSION",
getattr(xp, "__array_api_version__", TestHypothesisArraysApis.VERSION),
)
cls.xps = array_api.make_strategies_namespace(xp, api_version=api_version)
api_version = getenv(
"ARRAY_API_TESTS_VERSION",
getattr(onxp, "__array_api_version__", TestHypothesisArraysApis.VERSION),
)
cls.onxps = array_api.make_strategies_namespace(onxp, api_version=api_version)

def test_strategies(self):
self.assertNotEmpty(self.xps)
self.assertNotEmpty(self.onxps)

def test_scalar_strategies(self):
dtypes = dict(
integer_dtypes=self.xps.integer_dtypes(),
uinteger_dtypes=self.xps.unsigned_integer_dtypes(),
floating_dtypes=self.xps.floating_dtypes(),
numeric_dtypes=self.xps.numeric_dtypes(),
boolean_dtypes=self.xps.boolean_dtypes(),
scalar_dtypes=self.xps.scalar_dtypes(),
)

dtypes_onnx = dict(
integer_dtypes=self.onxps.integer_dtypes(),
uinteger_dtypes=self.onxps.unsigned_integer_dtypes(),
floating_dtypes=self.onxps.floating_dtypes(),
numeric_dtypes=self.onxps.numeric_dtypes(),
boolean_dtypes=self.onxps.boolean_dtypes(),
scalar_dtypes=self.onxps.scalar_dtypes(),
)

for k, vnp in dtypes.items():
vonxp = dtypes_onnx[k]
anp = self.xps.arrays(dtype=vnp, shape=shapes(self.xps))
aonxp = self.onxps.arrays(dtype=vonxp, shape=shapes(self.onxps))
self.assertNotEmpty(anp)
self.assertNotEmpty(aonxp)

args_np = []

@given(
x=self.xps.arrays(dtype=dtypes["integer_dtypes"], shape=shapes(self.xps)),
kw=array_api_kwargs(dtype=strategies.none() | self.xps.scalar_dtypes()),
)
def fct(x, kw):
args_np.append((x, kw))

fct()
self.assertEqual(len(args_np), 100)

args_onxp = []

xshape = shapes(self.onxps)
xx = self.onxps.arrays(dtype=dtypes_onnx["integer_dtypes"], shape=xshape)
kw = array_api_kwargs(dtype=strategies.none() | self.onxps.scalar_dtypes())

@given(x=xx, kw=kw)
def fctonx(x, kw):
args_onxp.append((x, kw))

fctonx()
self.assertEqual(len(args_onxp), len(args_np))


if __name__ == "__main__":
cl = TestHypothesisArraysApis()
cl.setUpClass()
cl.test_scalar_strategies()
unittest.main(verbosity=2)
30 changes: 29 additions & 1 deletion _unittests/ut_array_api/test_onnx_numpy.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,11 @@
import sys
import unittest
from packaging.version import Version
import numpy as np
from onnx import TensorProto, __version__ as onnx_ver
from onnx_array_api.ext_test_case import ExtTestCase
from onnx_array_api.array_api import onnx_numpy as xp
from onnx_array_api.npx.npx_types import DType
from onnx_array_api.npx.npx_numpy_tensors import EagerNumpyTensor as EagerTensor


Expand Down Expand Up @@ -52,6 +55,13 @@ def test_ones_none(self):
self.assertNotEmpty(matnp[0, 0])
self.assertEqualArray(matnp, np.ones((4, 5)))

def test_ones_like(self):
x = np.array([5, 6], dtype=np.int8)
y = np.ones_like(x)
a = EagerTensor(x)
b = xp.ones_like(a)
self.assertEqualArray(y, b.numpy())

def test_full(self):
c = EagerTensor(np.array([4, 5], dtype=np.int64))
mat = xp.full(c, fill_value=5, dtype=xp.int64)
Expand Down Expand Up @@ -89,7 +99,25 @@ def test_arange_int00(self):
expected = expected.astype(np.int64)
self.assertEqualArray(matnp, expected)

@unittest.skipIf(
Version(onnx_ver) < Version("1.15.0"),
reason="Reference implementation of CastLike is bugged.",
)
def test_ones_like_uint16(self):
x = EagerTensor(np.array(0, dtype=np.uint16))
y = np.ones_like(x.numpy())
z = xp.ones_like(x)
self.assertEqual(y.dtype, x.numpy().dtype)
self.assertEqual(x.dtype, z.dtype)
self.assertEqual(x.dtype, DType(TensorProto.UINT16))
self.assertEqual(z.dtype, DType(TensorProto.UINT16))
self.assertEqual(x.numpy().dtype, np.uint16)
self.assertEqual(z.numpy().dtype, np.uint16)
self.assertNotIn("bfloat16", str(z.numpy().dtype))
expected = np.array(1, dtype=np.uint16)
self.assertEqualArray(expected, z.numpy())


if __name__ == "__main__":
TestOnnxNumpy().test_arange_int00()
# TestOnnxNumpy().test_ones_like()
unittest.main(verbosity=2)
37 changes: 37 additions & 0 deletions _unittests/ut_array_api/test_onnx_ort.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
import numpy as np
from onnx_array_api.ext_test_case import ExtTestCase
from onnx_array_api.array_api import onnx_ort as xp
from onnx_array_api.npx.npx_numpy_tensors import EagerNumpyTensor
from onnx_array_api.ort.ort_tensors import EagerOrtTensor as EagerTensor


Expand All @@ -15,6 +16,42 @@ def test_abs(self):
a = xp.absolute(mat)
self.assertEqualArray(np.absolute(mat.numpy()), a.numpy())

def test_matmul(self):
for cls in [EagerTensor, EagerNumpyTensor]:
for dtype in (np.float32, np.float64):
X = cls(
np.array(
[[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]],
dtype=dtype,
)
)
coef = cls(np.array([[1e-13, 8]], dtype=dtype).T)
self.assertEqualArray(
np.array([[1e-13, 8]], dtype=dtype), coef.numpy().T
)
expected = X.numpy() @ coef.numpy()
got = X @ coef
try:
self.assertEqualArray(expected, got.numpy())
except AssertionError as e:
raise AssertionError(
f"Discrepancies (1) with cls={cls.__name__}, dtype={dtype}"
) from e

coef = np.array([[1e-13, 8]], dtype=dtype).T
expected = X.numpy() @ coef
got = X @ coef
try:
self.assertEqualArray(expected, got.numpy())
except AssertionError as e:
raise AssertionError(
f"Discrepancies (2) with cls={cls.__name__}, dtype={dtype}"
) from e


if __name__ == "__main__":
# import logging

# logging.basicConfig(level=logging.DEBUG)
# TestOnnxOrt().test_matmul()
unittest.main(verbosity=2)
Loading