assert types at runtime #114

twoertwein · 2022-07-08T00:22:57Z

replace assert_type(...) with assert isinstance(assert_type(...), ...) or assert assert_type(...) is None

Dr-Irv

A couple of ideas here to enhance this:

I think we need to be consistent with how we use assert_type() in terms of the second argument being in quotes or not. For certain types, like "Series[bool]", we have to put that in quotes because Series[bool] is not known to pandas. But the current code is inconsistent, so maybe we should always put the types in quotes as the second argument to assert_type().
In some of my comments you'll see that I suggested that we could enhance the test. For example, if we expect Series[bool], we could not only check isinstance(assert_type(result, "Series[bool]"), pd.Series), but also check the dtype of the result. Can also check dtype of some numpy results as well.

For (2), I didn't catch them all, but can you see if you can add that to this PR?

tests/test_indexes.py

tests/test_series.py

tests/test_timefuncs.py

Dr-Irv

So I like this change, but I'm wondering if we should create some standard functions in tests/__init__.py that help shorten up the assert statements. For example, we could have

def check_series(s: pd.Series, dtype: Optional[DTypeArg] = None):
    assert isinstance(s, pd.Series)
    if dtype is not None:
        assert s.dtype is dtype

then instead of

assert isinstance(
        assert_type(series, "pd.Series[bool]"), pd.Series
    ) and series.dtype is np.dtype(bool)
assert isinstance(assert_type(s, pd.Series), pd.Series)

could do this:

    check_series(assert_type(series, "pd.Series[bool]"), np.dtype(bool))
    check_series(assert_type(s, pd.Series))

Then we have a standard way of checking series with types. Could do same with the generic Interval as well.

Before assert_type() was introduced, I did have something like that in the MS stubs. See
https://github.com/microsoft/python-type-stubs/blob/6b800063bde687cd1846122431e2a729a9de625a/tests/pandas/__init__.py

But now that we know that assert_type() returns the value, we could combine that with the ideas there.

I could let this PR go as it is, and we make my suggested change in a later PR. Open to your opinion.

twoertwein · 2022-07-08T13:40:50Z

I like your suggestion! Let's keep this PR open for now. Could probably even have only one check function and use isinstance calls inside of it.

twoertwein · 2022-07-08T17:26:25Z

tests/__init__.py

+    if hasattr(actual, "__iter__"):
+        value = next(iter(actual))  # type: ignore[call-overload]
+    else:
+        value = actual.left  # type: ignore[attr-defined]


checking the actual value is imho better than checking .dtype:

users care about the actual values and

it simplified the type checking (otherwise we have sometimes a np.dtype object, sometimes an actual type, and sometimes a function)

Maybe we should check value and .dtype. For things like Series[bool], we want to make sure that the dtype correspods to the generic attribute.

I will try it but I think that dtype is often a numpy dtype while the values (when retrieving them) are builtin types

I prefer to revisit that maybe in the future: dtype is often a numpy dtype which is not compatible with the type of the value (str vs. numpy.dtype[object_], ...).

Dr-Irv

Suggested improvements to check()

Dr-Irv · 2022-07-09T14:43:36Z

tests/__init__.py

+    if hasattr(actual, "__iter__"):
+        value = next(iter(actual))  # type: ignore[call-overload]
+    else:
+        value = actual.left  # type: ignore[attr-defined]


The .left attribute is arbitrary. So could add parameter to check which is attr: Optional[str] = None and then do:

if attr is None: value = next(iter(actual)) # type: ignore[call-overload] else: value = actual.__getattr__(attr)

Then the caller can decide which attribute to check.

Added the attr argument. I gave it the default value "left" to keep the check(...) calls short.

Dr-Irv · 2022-07-09T14:46:27Z

tests/__init__.py

+    if hasattr(actual, "__iter__"):
+        value = next(iter(actual))  # type: ignore[call-overload]
+    else:
+        value = actual.left  # type: ignore[attr-defined]


Maybe we should check value and .dtype. For things like Series[bool], we want to make sure that the dtype correspods to the generic attribute.

Dr-Irv · 2022-07-09T14:48:08Z

tests/__init__.py

+from typing import Callable
+
+
+def check(actual: object, klass: type, dtype: type | None = None) -> None:


One advantage of having separate check functions for each type (as was done prior to the introduction of assert_type()) is that we get extra type checking. So check_dataframe_result(result: pd.DataFrame) is then checked by the type checker if we do check_dataframe_result(pd.DataFrame({"x": [1,2,3]})

If the check functions becomes too large (maybe when checking the dtype), it would definitely be good to have dedicated check functions. At the moment I slightly prefer one function with a short name to keep the tests somewhat more skimmable (avoiding more line break).

Dr-Irv

Thanks @twoertwein

assert types at runtime

70a3734

Dr-Irv requested changes Jul 8, 2022

View reviewed changes

tests/test_indexes.py Outdated Show resolved Hide resolved

tests/test_series.py Outdated Show resolved Hide resolved

tests/test_timefuncs.py Outdated Show resolved Hide resolved

tests/test_timefuncs.py Outdated Show resolved Hide resolved

twoertwein added 3 commits July 7, 2022 22:12

use classes isntead of strings where possible

d5999bf

check dtype

4b947bd

Merge remote-tracking branch 'upstream/main' into assert

5599afa

Dr-Irv reviewed Jul 8, 2022

View reviewed changes

twoertwein added 2 commits July 8, 2022 13:08

check()

848155d

unused imports

4793a27

twoertwein requested a review from Dr-Irv July 8, 2022 17:19

twoertwein commented Jul 8, 2022

View reviewed changes

check a few unused variables

0278b64

Dr-Irv requested changes Jul 9, 2022

View reviewed changes

twoertwein added 2 commits July 9, 2022 12:42

Merge remote-tracking branch 'upstream/main' into assert

c7473d1

attr

2bf9b64

Dr-Irv approved these changes Jul 10, 2022

View reviewed changes

Dr-Irv merged commit 2fd9697 into pandas-dev:main Jul 10, 2022

twoertwein deleted the assert branch September 21, 2022 15:27

		from typing import Callable


		def check(actual: object, klass: type, dtype: type \| None = None) -> None:

Uh oh!

assert types at runtime #114

assert types at runtime #114

Uh oh!

Conversation

twoertwein commented Jul 8, 2022

Uh oh!

Dr-Irv left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Dr-Irv left a comment

Choose a reason for hiding this comment

Uh oh!

twoertwein commented Jul 8, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Dr-Irv left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Dr-Irv left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!