-
-
Notifications
You must be signed in to change notification settings - Fork 143
assert types at runtime #114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of ideas here to enhance this:
- I think we need to be consistent with how we use
assert_type()
in terms of the second argument being in quotes or not. For certain types, like"Series[bool]"
, we have to put that in quotes becauseSeries[bool]
is not known to pandas. But the current code is inconsistent, so maybe we should always put the types in quotes as the second argument toassert_type()
. - In some of my comments you'll see that I suggested that we could enhance the test. For example, if we expect
Series[bool]
, we could not only checkisinstance(assert_type(result, "Series[bool]"), pd.Series)
, but also check thedtype
of the result. Can also checkdtype
of some numpy results as well.
For (2), I didn't catch them all, but can you see if you can add that to this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I like this change, but I'm wondering if we should create some standard functions in tests/__init__.py
that help shorten up the assert
statements. For example, we could have
def check_series(s: pd.Series, dtype: Optional[DTypeArg] = None):
assert isinstance(s, pd.Series)
if dtype is not None:
assert s.dtype is dtype
then instead of
assert isinstance(
assert_type(series, "pd.Series[bool]"), pd.Series
) and series.dtype is np.dtype(bool)
assert isinstance(assert_type(s, pd.Series), pd.Series)
could do this:
check_series(assert_type(series, "pd.Series[bool]"), np.dtype(bool))
check_series(assert_type(s, pd.Series))
Then we have a standard way of checking series with types. Could do same with the generic Interval
as well.
Before assert_type()
was introduced, I did have something like that in the MS stubs. See
https://github.com/microsoft/python-type-stubs/blob/6b800063bde687cd1846122431e2a729a9de625a/tests/pandas/__init__.py
But now that we know that assert_type()
returns the value, we could combine that with the ideas there.
I could let this PR go as it is, and we make my suggested change in a later PR. Open to your opinion.
I like your suggestion! Let's keep this PR open for now. Could probably even have only one check function and use |
tests/__init__.py
Outdated
if hasattr(actual, "__iter__"): | ||
value = next(iter(actual)) # type: ignore[call-overload] | ||
else: | ||
value = actual.left # type: ignore[attr-defined] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checking the actual value is imho better than checking .dtype
:
- users care about the actual values and
- it simplified the type checking (otherwise we have sometimes a np.dtype object, sometimes an actual type, and sometimes a function)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should check value and .dtype
. For things like Series[bool]
, we want to make sure that the dtype
correspods to the generic attribute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try it but I think that dtype is often a numpy dtype while the values (when retrieving them) are builtin types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to revisit that maybe in the future: dtype is often a numpy dtype which is not compatible with the type of the value (str vs. numpy.dtype[object_], ...).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested improvements to check()
tests/__init__.py
Outdated
if hasattr(actual, "__iter__"): | ||
value = next(iter(actual)) # type: ignore[call-overload] | ||
else: | ||
value = actual.left # type: ignore[attr-defined] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The .left
attribute is arbitrary. So could add parameter to check
which is attr: Optional[str] = None
and then do:
if attr is None:
value = next(iter(actual)) # type: ignore[call-overload]
else:
value = actual.__getattr__(attr)
Then the caller can decide which attribute to check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the attr
argument. I gave it the default value "left"
to keep the check(...)
calls short.
tests/__init__.py
Outdated
if hasattr(actual, "__iter__"): | ||
value = next(iter(actual)) # type: ignore[call-overload] | ||
else: | ||
value = actual.left # type: ignore[attr-defined] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should check value and .dtype
. For things like Series[bool]
, we want to make sure that the dtype
correspods to the generic attribute.
tests/__init__.py
Outdated
from typing import Callable | ||
|
||
|
||
def check(actual: object, klass: type, dtype: type | None = None) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One advantage of having separate check
functions for each type (as was done prior to the introduction of assert_type()
) is that we get extra type checking. So check_dataframe_result(result: pd.DataFrame)
is then checked by the type checker if we do check_dataframe_result(pd.DataFrame({"x": [1,2,3]})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the check
functions becomes too large (maybe when checking the dtype), it would definitely be good to have dedicated check functions. At the moment I slightly prefer one function with a short name to keep the tests somewhat more skimmable (avoiding more line break).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @twoertwein
replace
assert_type(...)
withassert isinstance(assert_type(...), ...)
orassert assert_type(...) is None