Skip to content

Use TypeIs in is_dataclass #11929

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 26, 2024
Merged

Use TypeIs in is_dataclass #11929

merged 1 commit into from
May 26, 2024

Conversation

NeilGirdhar
Copy link
Contributor

Fixes #9723

@NeilGirdhar NeilGirdhar force-pushed the isdataclass branch 2 times, most recently from 7721a02 to 3f47361 Compare May 17, 2024 10:37

This comment has been minimized.

@NeilGirdhar
Copy link
Contributor Author

@erictraut Is this a problem with Pyright?

@erictraut
Copy link
Contributor

@NeilGirdhar, can you be more specific? I'm not sure what you're asking.

@JelleZijlstra
Copy link
Member

I think it's about this CI failure:

/home/runner/work/typeshed/typeshed/stdlib/@tests/test_cases/check_dataclasses.py
  /home/runner/work/typeshed/typeshed/stdlib/@tests/test_cases/check_dataclasses.py:60:21 - error: "assert_type" mismatch: expected "type[DataclassInstance]" but received "<subclass of type and DataclassInstance>" (reportAssertTypeFailure)

I haven't looked at this code in detail to figure out whether pyright is correct here.

@NeilGirdhar
Copy link
Contributor Author

NeilGirdhar commented May 17, 2024

So, I removed one of the overloads, but from my testing it didn't seem necessary anymore. I'll quickly try replacing it to see if it makes a difference. (Edit: still broken. Removing it since it doesn't belong anymore IMO.)

The failure in PyRIght seems to be because Pyright isn't evaluating the overloads for TypeIs functions? Or else it's picking the wrong one. But even if it's picking the wrong one, why is type & (DataclassInstance | type[DataclassInstance]) equal to type & DataclassInstanc (or Pyright's approximation to the intersection). Seems odd to me.

This comment has been minimized.

This comment has been minimized.

@erictraut
Copy link
Contributor

@NeilGirdhar, PEP 742 allows some variation between type checkers. What you're seeing here is an example of this. Pyright's evaluation is arguably more accurate (narrower) and more consistent than mypy's, but both results are defensible. I think you'll need to change your test case to accommodate this variation.

PEP 742 says:

As a rule of thumb, a type checker should use the same type narrowing logic – and get results that are consistent with – its handling of isinstance().

If you try this self-contained code sample in both pyright and mypy, you'll see that mypy is inconsistent but pyright is consistent.

from dataclasses import Field
from typing import Any, ClassVar, Protocol, runtime_checkable
from typing_extensions import TypeIs

@runtime_checkable
class DataclassInstance(Protocol):
    __dataclass_fields__: ClassVar[dict[str, Field[Any]]]

def is_dataclass(obj: type) -> TypeIs[type[DataclassInstance]]: ...

def test1(x: type) -> None:
    if isinstance(x, DataclassInstance):
        reveal_type(x) # Both mypy and pyright reveal <subclass of "type" and "DataclassInstance">

def test2(x: type) -> None:
    if is_dataclass(x):
        reveal_type(x) # Mypy reveals "DataclassInstance" but pyright gives results consistent with "isinstance"

@NeilGirdhar
Copy link
Contributor Author

NeilGirdhar commented May 18, 2024

@erictraut Sorry, but I don't understand how these two functions are supposed to be the same. In the first function, you are checking against DataclassInstance whereas in the second you are checking against type[DataclassInstance].

If you change is_dataclass to return TypeIs[DataclassInstance], then MyPy and PyRight agree with each other and are consistent between the two functions.

Similarly, if you change the isinstance line to issubclass(x, DataclassInstance), then you will see that MyPy is consistent between the two functions whereas Pyright is not.

Therefore, I believe this is a bug in Pyright even if as you say PEP 742 doesn't specify this case.

@erictraut
Copy link
Contributor

Ah yes, you're correct. This is a bug in pyright.

@erictraut
Copy link
Contributor

@NeilGirdhar, this will be fixed in the next release of pyright. I typically publish a new version every Tuesday evening PST. You could either wait until then to land this PR or you could temporarily comment out this test case.

@NeilGirdhar NeilGirdhar force-pushed the isdataclass branch 2 times, most recently from 27bb3ff to e48608a Compare May 22, 2024 09:06

This comment has been minimized.

@NeilGirdhar
Copy link
Contributor Author

@JelleZijlstra There appear to be some errors with updating PyRight. Would you mind updating PyRight when you have a chance?

@srittau
Copy link
Collaborator

srittau commented May 22, 2024

Dependencies are updated automatically by renovate each UTC night.

Copy link
Contributor

Diff from mypy_primer, showing the effect of this PR on open source code:

pydantic (https://github.com/samuelcolvin/pydantic)
+ pydantic/v1/json.py:80: error: No overload variant of "asdict" matches argument type "type[DataclassInstance]"  [call-overload]
+ pydantic/v1/json.py:80: note: Possible overload variants:
+ pydantic/v1/json.py:80: note:     def asdict(obj: DataclassInstance) -> dict[str, Any]
+ pydantic/v1/json.py:80: note:     def [_T] asdict(obj: DataclassInstance, *, dict_factory: Callable[[list[tuple[str, Any]]], _T]) -> _T
+ pydantic/deprecated/json.py:98: error: No overload variant of "asdict" matches argument type "type[DataclassInstance]"  [call-overload]
+ pydantic/deprecated/json.py:98: note: Possible overload variants:
+ pydantic/deprecated/json.py:98: note:     def asdict(obj: DataclassInstance) -> dict[str, Any]
+ pydantic/deprecated/json.py:98: note:     def [_T] asdict(obj: DataclassInstance, *, dict_factory: Callable[[list[tuple[str, Any]]], _T]) -> _T

hydra-zen (https://github.com/mit-ll-responsible-ai/hydra-zen)
- src/hydra_zen/structured_configs/_implementations.py:1129: error: Argument 2 to "builds" of "BuildsFn" has incompatible type "**dict[str, int | float | Path | DataClass_ | type[DataClass_] | Enum | Any | Sequence[HydraSupportedType] | Mapping[Any, HydraSupportedType] | None]"; expected "Literal[False] | None"  [arg-type]
- src/hydra_zen/structured_configs/_implementations.py:1129: error: Argument 2 to "builds" of "BuildsFn" has incompatible type "**dict[str, int | float | Path | DataClass_ | type[DataClass_] | Enum | Any | Sequence[HydraSupportedType] | Mapping[Any, HydraSupportedType] | None]"; expected "bool"  [arg-type]
- src/hydra_zen/structured_configs/_implementations.py:1129: error: Argument 2 to "builds" of "BuildsFn" has incompatible type "**dict[str, int | float | Path | DataClass_ | type[DataClass_] | Enum | Any | Sequence[HydraSupportedType] | Mapping[Any, HydraSupportedType] | None]"; expected "Builds[Callable[[Callable[..., Any]], Callable[..., Any]]] | ZenPartialBuilds[Callable[[Callable[..., Any]], Callable[..., Any]]] | HydraPartialBuilds[Callable[[Callable[..., Any]], Callable[..., Any]]] | Just[Callable[[Callable[..., Any]], Callable[..., Any]]] | type[Builds[Callable[[Callable[..., Any]], Callable[..., Any]]]] | type[ZenPartialBuilds[Callable[[Callable[..., Any]], Callable[..., Any]]]] | type[HydraPartialBuilds[Callable[[Callable[..., Any]], Callable[..., Any]]]] | type[Just[Callable[[Callable[..., Any]], Callable[..., Any]]]] | Callable[[Callable[..., Any]], Callable[..., Any]] | str | None | Sequence[Builds[Callable[[Callable[..., Any]], Callable[..., Any]]] | ZenPartialBuilds[Callable[[Callable[..., Any]], Callable[..., Any]]] | HydraPartialBuilds[Callable[[Callable[..., Any]], Callable[..., Any]]] | Just[Callable[[Callable[..., Any]], Callable[..., Any]]] | type[Builds[Callable[[Callable[..., Any]], Callable[..., Any]]]] | type[ZenPartialBuilds[Callable[[Callable[..., Any]], Callable[..., Any]]]] | type[HydraPartialBuilds[Callable[[Callable[..., Any]], Callable[..., Any]]]] | type[Just[Callable[[Callable[..., Any]], Callable[..., Any]]]] | Callable[[Callable[..., Any]], Callable[..., Any]] | str | None]"  [arg-type]
- src/hydra_zen/structured_configs/_implementations.py:1129: error: Argument 2 to "builds" of "BuildsFn" has incompatible type "**dict[str, int | float | Path | DataClass_ | type[DataClass_] | Enum | Any | Sequence[HydraSupportedType] | Mapping[Any, HydraSupportedType] | None]"; expected "Mapping[str, SupportedPrimitive] | None"  [arg-type]
- src/hydra_zen/structured_configs/_implementations.py:1129: error: Argument 2 to "builds" of "BuildsFn" has incompatible type "**dict[str, int | float | Path | DataClass_ | type[DataClass_] | Enum | Any | Sequence[HydraSupportedType] | Mapping[Any, HydraSupportedType] | None]"; expected "list[str | DataClass_ | type[DataClass_] | Mapping[str, str | Sequence[str] | None]] | None"  [arg-type]
- src/hydra_zen/structured_configs/_implementations.py:1129: error: Argument 2 to "builds" of "BuildsFn" has incompatible type "**dict[str, int | float | Path | DataClass_ | type[DataClass_] | Enum | Any | Sequence[HydraSupportedType] | Mapping[Any, HydraSupportedType] | None]"; expected "str | None"  [arg-type]
- src/hydra_zen/structured_configs/_implementations.py:1129: error: Argument 2 to "builds" of "BuildsFn" has incompatible type "**dict[str, int | float | Path | DataClass_ | type[DataClass_] | Enum | Any | Sequence[HydraSupportedType] | Mapping[Any, HydraSupportedType] | None]"; expected "tuple[type[DataClass_], ...]"  [arg-type]
- src/hydra_zen/structured_configs/_implementations.py:1129: error: Argument 2 to "builds" of "BuildsFn" has incompatible type "**dict[str, int | float | Path | DataClass_ | type[DataClass_] | Enum | Any | Sequence[HydraSupportedType] | Mapping[Any, HydraSupportedType] | None]"; expected "ZenConvert | None"  [arg-type]
- src/hydra_zen/structured_configs/_implementations.py:1129: error: Argument 2 to "builds" of "BuildsFn" has incompatible type "**dict[str, int | float | Path | DataClass_ | type[DataClass_] | Enum | Any | Sequence[HydraSupportedType] | Mapping[Any, HydraSupportedType] | None]"; expected "T"  [arg-type]
- src/hydra_zen/structured_configs/_implementations.py:1161: error: Incompatible types in assignment (expression has type "Just", variable has type "type[Builds[Any]]")  [assignment]

pytest (https://github.com/pytest-dev/pytest)
+ src/_pytest/_io/pprint.py:117: error: Right operand of "and" is never evaluated  [unreachable]
+ src/_pytest/_io/pprint.py:125: error: Statement is unreachable  [unreachable]

streamlit (https://github.com/streamlit/streamlit)
+ lib/streamlit/runtime/caching/hashing.py: note: In member "_to_bytes" of class "_CacheFuncHasher":
+ lib/streamlit/runtime/caching/hashing.py:409:34: error: No overload variant of "asdict" matches argument type "Type[DataclassInstance]"  [call-overload]
+ lib/streamlit/runtime/caching/hashing.py:409:34: note: Possible overload variants:
+ lib/streamlit/runtime/caching/hashing.py:409:34: note:     def asdict(obj: DataclassInstance) -> Dict[str, Any]
+ lib/streamlit/runtime/caching/hashing.py:409:34: note:     def [_T] asdict(obj: DataclassInstance, *, dict_factory: Callable[[List[Tuple[str, Any]]], _T]) -> _T

@NeilGirdhar
Copy link
Contributor Author

NeilGirdhar commented May 24, 2024

Updated. Ready for pull.

@srittau
Copy link
Collaborator

srittau commented May 24, 2024

Looking at the primer output: pydantic and streamlit are true positives. They use is_dataclass() and immediately call asdict() if true. This can lead to errors if is_dataclass() is a dataclass and not a dataclass instance. (Weird design choice to have one function to check for both types and instances ...)

pytest is a false positive:

https://github.com/pytest-dev/pytest/blob/889d9b28d786c75b66f4d1acb80123bdb341639c/src/_pytest/_io/pprint.py#L117C13-L118C45

It's basically complaining about this code (where object is Any):

if is_dataclass(object) and not isinstance(object, type): ...

I'm unsure why that is. Maybe it's assuming that is_dataclass(Any) is always True?

@NeilGirdhar
Copy link
Contributor Author

(Weird design choice to have one function to check for both types and instances ...)

Definitely a weird choice.

So, can the PR be pulled, or should we wait for the dependent projects to fix bugs?

@srittau
Copy link
Collaborator

srittau commented May 25, 2024

I'm fine with merging this, despite the false positive, but I'll leave it open for a day or two if someone has an idea how to fix/work around it.

@NeilGirdhar
Copy link
Contributor Author

Sounds good 😄 !

@srittau srittau merged commit 6246a38 into python:main May 26, 2024
63 checks passed
@NeilGirdhar NeilGirdhar deleted the isdataclass branch May 26, 2024 17:09
@NeilGirdhar
Copy link
Contributor Author

Thanks for the quick merge. If anyone has time, it may be worth mentioning that most of the other uses of TypeGuard should actually be changed to TypeIs.

@hauntsaninja
Copy link
Collaborator

hauntsaninja commented Jul 19, 2024

I'm seeing a lot of new errors on work codebase from code like the case srittau mentions:

if dataclasses.is_dataclass(value):
    dataclasses.asdict(value)

Not sure if anyone has clever suggestions here

@DanielNoord
Copy link

I also see a lot of errors on this at work and it has been reported for mypy at python/mypy#17550.

I also see (what I would call) weird behaviour between pyright and mypy.

Code sample in pyright playground

import dataclasses
from typing import reveal_type
def func(x: object) -> None:
    if dataclasses.is_dataclass(x):
        reveal_type(x)
        print(dataclasses.asdict(x))

Gives:

Type of "x" is "DataclassInstance | type[DataclassInstance]"
Argument of type "DataclassInstance | type[DataclassInstance]" cannot be assigned to parameter "obj" of type "DataclassInstance" in function "asdict"
  Type "DataclassInstance | type[DataclassInstance]" is incompatible with type "DataclassInstance"
    "__dataclass_fields__" is defined as a ClassVar in protocol  (reportArgumentType)

That seems fair and is in line with mypy at https://mypy-play.net/?mypy=latest&python=3.12&gist=93cb73d00a2865787ca2763123ecde67

Which gives (on 1.11):

main.py:5: note: Revealed type is "Union[_typeshed.DataclassInstance, type[_typeshed.DataclassInstance]]"
main.py:6: error: Argument 1 to "asdict" has incompatible type "DataclassInstance | type[DataclassInstance]"; expected "DataclassInstance"  [arg-type]
Found 1 error in 1 file (checked 1 source file)

Now I want to narrow x down to DataclassInstance. Based on the overload that doesn't seem possible since isinstance(type, object) is True and there isn't an overload which only gives DataclassInstance.
However, pyright does allow me to do this.

Code sample in pyright playground

import dataclasses
from typing import reveal_type
def func(x: object) -> None:
    if dataclasses.is_dataclass(x) and isinstance(x, object):
        reveal_type(x)
        print(dataclasses.asdict(x))

This has no errors on pyright. Which seems like a bug, but @erictraut please correct me if I'm wrong here. Shouldn't x already be inferred to be object based on the function signature? Why does the isinstance check change the behaviour?

mypy on 1.11 at https://mypy-play.net/?mypy=latest&python=3.12&gist=895d35ce66419cb70dd64ef4c5d900bd gives

main.py:5: note: Revealed type is "Union[_typeshed.DataclassInstance, type[_typeshed.DataclassInstance]]"
main.py:6: error: Argument 1 to "asdict" has incompatible type "DataclassInstance | type[DataclassInstance]"; expected "DataclassInstance"  [arg-type]
Found 1 error in 1 file (checked 1 source file)

Even if it changes the behaviour I don't really understand why narrowing x to be object should work since a type[DataclassInstance] should also pass that check.

I don't want to point fingers as I have deep respect for the much more extensive knowledge that you maintainers have of the Python typing system but if I were to summarize this I would say:

  1. pyright has some strange behaviour where adding a seemingly spurious and isinstance(x, object) after dataclasses.is_dataclass(x) narrows x down to DataclassInstance
  2. After this PR there doesn't seem to be a good way to narrow down an assignment to something that is acceptable for dataclasses.asdict (basically to DataclassInstance), which seems like a usability regression and is at least for us at work a reason why we can't bump mypy and experiment with the support for PEP 695.

@erictraut
Copy link
Contributor

@DanielNoord, I agree this looks like a bug in pyright. An isinstance(x, object) should have no effect on type narrowing because all objects in Python derive from object (including type objects)`. I've filed a bug.

@erictraut
Copy link
Contributor

erictraut commented Jul 24, 2024

There is a second bug in pyright here as well. This one is related to its overload matching logic. When overload matching is ambiguous because of an Any argument, pyright should fall back on the union of all potential overloads rather than picking the first matching overload. It's not doing that in this case because it is treating the two TypeIs[x] return types as equivalent. They are both subtypes of bool, but they are distinct types. I've filed another bug.

Mypy appears to have the same bug here. I've filed a separate bug in the mypy issue tracker.

@DanielNoord
Copy link

Thanks @erictraut (I must say I'm always amazed at the speed at which you and the pyright team triage and fix bugs).

I'm wondering what you think of the second issue I "identified". With the fixes marked for "addressed in next version" merged it seems we still don't have a good way to type guard an object to be passable to asdict(). Is this something that can be easily fixed? Like I said, the typing is correct but adding an additional utility function to only work on instances instead of types feels a bit convoluted...

@erictraut
Copy link
Contributor

Once these bugs are fixed in pyright, you will be able to check for a TypedDict instance in a type-safe manner by verifying that x isn't an instance of type, like this:

import dataclasses

def func(x: object) -> None:
    if dataclasses.is_dataclass(x) and not isinstance(x, type):
        print(dataclasses.asdict(x))

It's unfortunate that is_dataclass returns True for both TypedDict class objects and TypedDict instances. I think that was a poor API choice, but I don't see a way to fix it at this point. That means checking for dataclasses in a type-safe manner will often require this extra isinstance check.

@JelleZijlstra
Copy link
Member

We could consider adding is_dataclass_instance and is_dataclass_type to https://github.com/hauntsaninja/useful_types.

@DanielNoord
Copy link

Thanks Eric, that does indeed seem to work in the playgrounds.

I don't think I'll be able to convince people to add an additional dependency to our stack for such a small utility. It is indeed too bad that the API is like this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Repair is_dataclass definition?
6 participants