Skip to content

assert types at runtime #114

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jul 10, 2022
Merged

assert types at runtime #114

merged 9 commits into from
Jul 10, 2022

Conversation

twoertwein
Copy link
Member

replace assert_type(...) with assert isinstance(assert_type(...), ...) or assert assert_type(...) is None

Copy link
Collaborator

@Dr-Irv Dr-Irv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of ideas here to enhance this:

  1. I think we need to be consistent with how we use assert_type() in terms of the second argument being in quotes or not. For certain types, like "Series[bool]", we have to put that in quotes because Series[bool] is not known to pandas. But the current code is inconsistent, so maybe we should always put the types in quotes as the second argument to assert_type().
  2. In some of my comments you'll see that I suggested that we could enhance the test. For example, if we expect Series[bool], we could not only check isinstance(assert_type(result, "Series[bool]"), pd.Series), but also check the dtype of the result. Can also check dtype of some numpy results as well.

For (2), I didn't catch them all, but can you see if you can add that to this PR?

Copy link
Collaborator

@Dr-Irv Dr-Irv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I like this change, but I'm wondering if we should create some standard functions in tests/__init__.py that help shorten up the assert statements. For example, we could have

def check_series(s: pd.Series, dtype: Optional[DTypeArg] = None):
    assert isinstance(s, pd.Series)
    if dtype is not None:
        assert s.dtype is dtype

then instead of

assert isinstance(
        assert_type(series, "pd.Series[bool]"), pd.Series
    ) and series.dtype is np.dtype(bool)
assert isinstance(assert_type(s, pd.Series), pd.Series)

could do this:

    check_series(assert_type(series, "pd.Series[bool]"), np.dtype(bool))
    check_series(assert_type(s, pd.Series))

Then we have a standard way of checking series with types. Could do same with the generic Interval as well.

Before assert_type() was introduced, I did have something like that in the MS stubs. See
https://github.com/microsoft/python-type-stubs/blob/6b800063bde687cd1846122431e2a729a9de625a/tests/pandas/__init__.py

But now that we know that assert_type() returns the value, we could combine that with the ideas there.

I could let this PR go as it is, and we make my suggested change in a later PR. Open to your opinion.

@twoertwein
Copy link
Member Author

I like your suggestion! Let's keep this PR open for now. Could probably even have only one check function and use isinstance calls inside of it.

@twoertwein twoertwein requested a review from Dr-Irv July 8, 2022 17:19
if hasattr(actual, "__iter__"):
value = next(iter(actual)) # type: ignore[call-overload]
else:
value = actual.left # type: ignore[attr-defined]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checking the actual value is imho better than checking .dtype:

  • users care about the actual values and
  • it simplified the type checking (otherwise we have sometimes a np.dtype object, sometimes an actual type, and sometimes a function)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should check value and .dtype. For things like Series[bool], we want to make sure that the dtype correspods to the generic attribute.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try it but I think that dtype is often a numpy dtype while the values (when retrieving them) are builtin types

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to revisit that maybe in the future: dtype is often a numpy dtype which is not compatible with the type of the value (str vs. numpy.dtype[object_], ...).

Copy link
Collaborator

@Dr-Irv Dr-Irv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested improvements to check()

if hasattr(actual, "__iter__"):
value = next(iter(actual)) # type: ignore[call-overload]
else:
value = actual.left # type: ignore[attr-defined]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .left attribute is arbitrary. So could add parameter to check which is attr: Optional[str] = None and then do:

if attr is None:
    value = next(iter(actual)) # type: ignore[call-overload]
else:
    value = actual.__getattr__(attr)

Then the caller can decide which attribute to check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the attr argument. I gave it the default value "left" to keep the check(...) calls short.

if hasattr(actual, "__iter__"):
value = next(iter(actual)) # type: ignore[call-overload]
else:
value = actual.left # type: ignore[attr-defined]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should check value and .dtype. For things like Series[bool], we want to make sure that the dtype correspods to the generic attribute.

from typing import Callable


def check(actual: object, klass: type, dtype: type | None = None) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One advantage of having separate check functions for each type (as was done prior to the introduction of assert_type()) is that we get extra type checking. So check_dataframe_result(result: pd.DataFrame) is then checked by the type checker if we do check_dataframe_result(pd.DataFrame({"x": [1,2,3]})

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the check functions becomes too large (maybe when checking the dtype), it would definitely be good to have dedicated check functions. At the moment I slightly prefer one function with a short name to keep the tests somewhat more skimmable (avoiding more line break).

Copy link
Collaborator

@Dr-Irv Dr-Irv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @twoertwein

@Dr-Irv Dr-Irv merged commit 2fd9697 into pandas-dev:main Jul 10, 2022
@twoertwein twoertwein deleted the assert branch September 21, 2022 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants