Turn isna() and notna() into TypeGuards #339

gandhis1 · 2022-09-28T02:01:21Z

Tests added: Please use assert_type() to assert the type of any return value

Putting up a draft first because I actually am not quite sure what the best way to go about this is. I wrote the tests first, as to what the behavior intuitively should be, but getting the annotations to match is the challenging part.

Further, I'm still not perfectly happy with these annotations, because I have seen tons of code use pd.isnull on any random object, and the annotations currently do not support this, they only really allow pandas objects and a curated list of Scalar objects.

gandhis1 · 2022-09-28T02:03:23Z

pandas-stubs/core/dtypes/missing.pyi

+@overload
+def isna(obj: Scalar | NAType) -> TypeGuard[NAType]: ...
+@overload
+def isna(obj: Scalar | None) -> TypeGuard[None]: ...


These are overlapping overloads. But what's the right way to do this? If you pass a ScalarT | None, the type guard should only ever give you a None, it should never give you a NAType back.

If I remove the last 3 overloads, then you always get back a TypeGuard[NaTType | NAType | None]....which isn't correct

I think, (but not sure), based on my reading of the docs, that you don't need all the overloads. If you do

def isna( obj: NaTType | NAType | None, ) -> TypeGuard[NaTType | NAType | None]: ...

that tells the type checker that isna() returns True for just those types, and False for anything else.

What's not clear to me is how to use (and whether we can use) assert_type() here. Functions annotated as TypeGuard always return bool, but what will assert_type() do, both for the type checker and at runtime?

Keep experimenting.

What's not clear to me is how to use (and whether we can use) assert_type() here. Functions annotated as TypeGuard always return bool, but what will assert_type() do, both for the type checker and at runtime?

Could probably have a test like

if not pd.isna(None): some_terrible_type_error # type: ignore[some-unused-code-error] # but not the terrible error which should not be analyzed by mypy/pyright

I did the following simple test:

from typing_extensions import assert_type, TypeGuard def isastr(s: str) -> TypeGuard[str]: return isinstance(s, str) assert_type(isastr("abc"), bool) assert_type(isastr(3.4), bool) print(assert_type(isastr("asdf"), bool))

Both pyright and mypy indicate that isastr(3.4) is invalid.

At runtime, assert_type(isastr("asdf"), bool) returns True

So for testing purposes, we can use check(assert_type(pd.isna(foo), bool), True) where foo are None, pd.NA, pd.NaT, np.nan.

Then could do something like check(assert_type(pd.isna("abc"), bool), False) # type: ignore[arg-type] to check that we are catching invalid values.

To start, the part I'm not following is why we are trying to check the type of the return of the function itself. The new tests I have added to this PR don't bother doing this because that's more or less already covered by existing tests. Instead, they check the type of the object that was passed to pd.isna. Because that is the object whose type should be influenced by the TypeGuard.

The limiting factor I see right now is that yes, pd.isna has a variable return type, and TypeGuard always needs to return bool. But this seems to me a glaring gap in the capability of the type checker - we should be able to narrow types.

If I have this code, which is essentially the same as the test I added:

x = random.choice("value", None) # Optional[str] if pd.notnull(x): ... # x is guaranteed to be a str

We know for a fact that x is a str in the body of the if statement. I am not aware of any exceptions to that. Anyway, rather than giving up here due to type checking limitations, I'm wondering if this is something worth escalating to the typing mailing list. Is the restriction that TypeGuard always returns a bool actually necessary? Or can we support @overload with TypeGuard?

The second, somewhat separate issue, is how the narrowing works. I have a test case illustrating this as well. But to copy the example, this first block of code already works as intended and passes MyPy at least:

x = random.choice("value", None, pd.NA) # Union[str, None, pd.NA] if pd.isnull(x): ... # x is Union[None, pd.NA]

The above code works because I annotated all of the null-equivalent types in a single Union. However, if you do that, then this following example does NOT work. It should work. But it doesn't:

x = random.choice("value", None) # Union[str, None] if pd.isnull(x): ... # x should be None, but because we had a single TypeGuard overload that covered pd.NA as well, this is actually Union[None, pd.NA]

Now the question I was asking myself, so is the solution to just have separate overloads for each? But you can't, because that overlaps with the original broader overload, and pyright complains about overlapping overloads.

So that's essentially where I left off.

To start, the part I'm not following is why we are trying to check the type of the return of the function itself.

This is the methodology we've come up with to test the stubs. Especially in the case of overloaded functions/methods, we want to make sure that we are getting the expected result for each possible input.

The limiting factor I see right now is that yes, pd.isna has a variable return type, and TypeGuard always needs to return bool. But this seems to me a glaring gap in the capability of the type checker - we should be able to narrow types.

Is it the capability of the type checker, or the capability of the spec for TypeGuard ?

We know for a fact that x is a str in the body of the if statement. I am not aware of any exceptions to that. Anyway, rather than giving up here due to type checking limitations, I'm wondering if this is something worth escalating to the typing mailing list. Is the restriction that TypeGuard always returns a bool actually necessary? Or can we support @overload with TypeGuard?

Yes, worth bringing up with the typing gods.

This is the methodology we've come up with to test the stubs. Especially in the case of overloaded functions/methods, we want to make sure that we are getting the expected result for each possible input.

I don't think that methodology is by itself sufficient for testing a TypeGuard. TypeGuard changes the type of the object passed to it - and that's what I've been trying to test here. It's arguably the most relevant aspect to test, otherwise a TypeGuard is no better than a bool return. The existing tests, which are associated with the previous bool return, still work. To the extent that the previous tests were incomplete I can fill in the gaps. Although at this point seems like this whole thing is on hold since TypeGuard in its current state doesn't seem to work here.

Is it the capability of the type checker, or the capability of the spec for TypeGuard ?

Seems like the spec to me.

Yes, worth bringing up with the typing gods.

I'm not sure how quick the typing mailing list is, but the discussion page of pyright might be a first start where you get very quickly feedback.

pandas-stubs/core/dtypes/missing.pyi

gandhis1 · 2022-09-29T17:32:00Z

Closing this for now, until TypeGuard is compatible with overloads, this is not viable.

gandhis1 · 2022-10-05T15:47:25Z

Reopening

Dr-Irv · 2022-10-06T01:22:16Z

tests/test_pandas.py

-    assert check(assert_type(pd.isna(None), Literal[True]), bool)
-    assert not check(assert_type(pd.notna(None), Literal[False]), bool)
+    assert check(assert_type(pd.isna(None), bool), bool)
+    assert not check(assert_type(pd.notna(None), bool), bool)

    check(assert_type(pd.isna(2.5), bool), bool)


can you make this assert not check(assert_type(pd.isna(2.5), bool), bool), and something similar on the next line.

Just want to make sure we check all the results given the use of TypeGuard

gandhis1 · 2022-10-06T03:14:23Z

tests/test_pandas.py

+        assert_type(nullable3, Union[bool, NAType, None])  # TODO: Mypy result
+        assert_type(
+            nullable3, Union[bool, NaTType, NAType, None]
+        )  # TODO: Pyright result


I added a lot more test cases and tested the negative condition as well. I also documented what the correct type should be - which isn't always supported by the way TypeGuard currently works (see comments on StrictTypeGuard.

However, this uncovered conflicting behavior in Mypy and Pyright. My 2 cents on this, is that I believe Mypy is incorrect, and Pyright is correct. This is just an extension on how TypeGuard generally behaves - a simple type ~~narrowing~~ translation from A to B, with no regard for unioned types and set arithmetic, and no requirement that the type be strictly narrowed. And as long as that's true, then Mypy shouldn't be narrowing beyond that, as it does on line 203 and 183 for example.

Dr-Irv · 2022-10-06T12:35:47Z

tests/test_pandas.py

+    # and as a result the type narrowing does not always work as it intuitively should
+    # There is a proposal being floated for a StrictTypeGuard that will have more rigid narrowing semantics
+    # In the test cases below, a commented out assertion will be included to document the optimal test result
+    nullable1 = random.choice(["value", None, pd.NA, pd.NaT])


I'm not sure that using random.choice is the right thing to use here. Consider this simple script:

import random x = random.choice([1, "abc", None]) reveal_type(x)

pyright reports int | str | None whereas mypy report builtins.object

So already you have a different kind of test happening based on whether pyright or mypy is being used.

Maybe you have to add a type to nullable1 as in:

nulllable1: Union[str, None, pd.NAType, pd.NaType] = random.choice(["value", None, pd.NA, pd.NaT])

to avoid this.

So I do see that behavior in your example, I am assuming probably due to the two types of built-in scalars involved, however the examples I have chosen do not have this issue:

MyPy:

tests/test_pandas.py:383: note: Revealed type is "Union[pandas._libs.tslibs.nattype.NaTType, pandas._libs.missing.NAType, builtins.str, None]" tests/test_pandas.py:385: note: Revealed type is "Union[builtins.int, None]" tests/test_pandas.py:387: note: Revealed type is "Union[pandas._libs.missing.NAType, builtins.bool, None]"

Pyright:

tests/test_pandas.py:383:17 - information: Type of "nullable1" is "str | NAType | NaTType | None" tests/test_pandas.py:385:17 - information: Type of "nullable2" is "int | None" tests/test_pandas.py:387:17 - information: Type of "nullable3" is "bool | NAType | None"

I can still annotate it explicitly if that makes the test a bit more clear, but the conflict between MyPy / Pyright is after this

I think an explicit annotation would be best here, because my simple example shows something funky going on, and I'm concerned if we make a modification in the future, we won't catch it.

gandhis1 · 2022-10-07T23:39:46Z

TODO: One, I probably will have to just silence the conflicting test by commenting it out / removing it. That said, MyPy and Pyright ought to have the same behavior, so I'll bring it up on Pyright's forums first. Given that I suspect Pyright is right here, may end up posting it as a Mypy issue. In any case, whichever type checker is wrong, probably isn't going to get fixed any soon, hence why I will silence it.

Dr-Irv · 2022-10-08T02:04:49Z

tests/test_pandas.py

+        # assert_type(nullable3, Union[NAType, None])
+        assert_type(nullable3, Union[bool, NAType, None])  # TODO: Mypy result
+        assert_type(
+            nullable3, Union[bool, NaTType, NAType, None]


Currently failing here because nullable3 is of type Union[bool, NAType, None]

Dr-Irv · 2022-10-08T02:06:14Z

tests/test_pandas.py

+        # check(assert_type(nullable2, None), type(None))
+        assert_type(nullable2, Union[int, None])  # TODO: MyPy result
+        assert_type(
+            nullable2, Union[int, NaTType, NAType, None]


currently failing here because nullable2 is Union[int, None] (although it should be None based on the test of pd.notna() )

…rowing

…yright Keep the tests as comments in order to document the behavior for future reference

gandhis1 · 2022-10-12T01:18:00Z

So I ended up commenting out the tests that had conflicting behavior. In summary:

The current spec of TypeGuard does not enable as precise type narrowing as would be ideal. This is unlikely to be remedied anytime soon, as I'd imagine this is something a PEP would be needed to address.
I took the step of annotating as a comment what the correct/optimal type assertion should be. These remain commented solely for documentation for now, although if and when in the future they become viable, they can be used as actual assertions.
There are also certain cases in which MyPy and Pyright produce conflicting behavior. These have likewise been commented out for documentation purposes. If, in the future, the two type checkers converge to the same behavior, we may want to enable these assertions.

Dr-Irv

Thanks @gandhis1

Just a note. Pep 647 says the following about the negative case: "User-defined type guards apply narrowing only in the positive case (the if clause). The type is not narrowed in the negative case."

So that explains why our "ideal" test in the else clauses isn't happening.

gandhis1 · 2022-10-25T14:31:59Z

So I upgraded to the latest version in conda today. It's definitely an improvement as this uncovered a new legitimate type error in my code base. Unfortunately, it's not perfect due to the lack of strict type narrowing semantics, as well as lack of type narrowing in the negative case. I hope StrictTypeGuard gains some traction to improve this.

gandhis1 commented Sep 28, 2022

View reviewed changes

twoertwein reviewed Sep 28, 2022

View reviewed changes

pandas-stubs/core/dtypes/missing.pyi Show resolved Hide resolved

gandhis1 closed this Sep 29, 2022

gandhis1 reopened this Oct 5, 2022

gandhis1 force-pushed the typeguard branch from e06a33b to 1a87dae Compare October 5, 2022 15:47

gandhis1 marked this pull request as ready for review October 5, 2022 15:48

Dr-Irv requested changes Oct 6, 2022

View reviewed changes

gandhis1 force-pushed the typeguard branch from 16677ca to 39a4606 Compare October 6, 2022 02:22

gandhis1 commented Oct 6, 2022

View reviewed changes

Dr-Irv requested changes Oct 6, 2022

View reviewed changes

Dr-Irv requested changes Oct 8, 2022

View reviewed changes

gandhis1 added 5 commits October 11, 2022 21:07

Turn isna() and notna() into TypeGuards

7226c9d

Assert the true/false return of isna/notna on a scalar value

4a329f3

Adjust TypeGuard tests to document limitations of notna/isna type nar…

534f518

…rowing

Document conflicting MyPy and pyright results

7ffcf82

Comment out the tests which produce conflicting results in Mypy and P…

6a2e123

…yright Keep the tests as comments in order to document the behavior for future reference

gandhis1 force-pushed the typeguard branch from bec9763 to 6a2e123 Compare October 12, 2022 01:13

Use explicit type annotations per feedback

39e4311

Dr-Irv approved these changes Oct 13, 2022

View reviewed changes

Dr-Irv merged commit 617529a into pandas-dev:main Oct 13, 2022

gandhis1 mentioned this pull request Apr 24, 2024

Use the new TypeIs feature to update the notna/notnull/isna/isnull type guards #911

Closed

gandhis1 deleted the typeguard branch April 24, 2024 14:47

Uh oh!

Turn isna() and notna() into TypeGuards #339

Turn isna() and notna() into TypeGuards #339

Uh oh!

Conversation

gandhis1 commented Sep 28, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gandhis1 Sep 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gandhis1 commented Sep 29, 2022

Uh oh!

gandhis1 commented Oct 5, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gandhis1 Oct 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gandhis1 commented Oct 7, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gandhis1 commented Oct 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dr-Irv left a comment

Choose a reason for hiding this comment

Uh oh!

gandhis1 commented Oct 25, 2022

Uh oh!

Uh oh!

gandhis1 Sep 29, 2022 •

edited

Loading

gandhis1 Oct 6, 2022 •

edited

Loading

gandhis1 commented Oct 12, 2022 •

edited

Loading