Skip to content

dataclasses: Synthetic __init__ method on dataclass has broken ClassVar annotation under some conditions. #133956

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
emcd opened this issue May 12, 2025 · 12 comments
Labels
stdlib Python modules in the Lib dir topic-dataclasses type-bug An unexpected behavior, bug, or error

Comments

@emcd
Copy link

emcd commented May 12, 2025

Bug report

Bug description:

When using automatically-stringified annotations (from __future__ import annotations), typing_extensions.ClassVar is not treated the same as typing.ClassVar even though those two objects are identical. Specifically, the dataclasses.dataclass-produced __init__ method does not exclude typing_extensions.ClassVar-wrapped variables from its annotations, but does exclude typing.ClassVar-wrapped variables from its annotations. Among other problems, this causes typing.get_type_hints to choke.

Reproducer below:

from __future__ import annotations

import dataclasses as dcls
import inspect
import typing
import typing_extensions as typx


assert typing.ClassVar is typx.ClassVar


@dcls.dataclass
class Foo:

    y: int
    x: typing.ClassVar[ int ] = 1


print( "\n=== typing.ClassVar (ok) ===" )

print( 'anno', Foo.__init__.__annotations__ )
print( 'inspect', inspect.get_annotations( Foo.__init__, eval_str = True ) )
print( 'typing', typing.get_type_hints( Foo.__init__ ) )
print( 'typx', typx.get_type_hints( Foo.__init__ ) )


@dcls.dataclass
class Bar:

    y: int
    x: typx.ClassVar[ int ] = 1


print( "\n=== typing_extensions.ClassVar (breaks) ===" )

print( 'anno', Bar.__init__.__annotations__ )
print( 'inspect', inspect.get_annotations( Bar.__init__, eval_str = True ) )
print( 'typing', typing.get_type_hints( Bar.__init__ ) )
print( 'typx', typx.get_type_hints( Bar.__init__ ) )

This results in the following output:

=== typing.ClassVar (ok) ===
anno {'y': 'int', 'return': None}
inspect {'y': <class 'int'>, 'return': None}
typing {'y': <class 'int'>, 'return': <class 'NoneType'>}
typx {'y': <class 'int'>, 'return': <class 'NoneType'>}

=== typing_extensions.ClassVar (breaks) ===
anno {'y': 'int', 'x': 'typx.ClassVar[int]', 'return': None}
inspect {'y': <class 'int'>, 'x': typing.ClassVar[int], 'return': None}
Traceback (most recent call last):
  File "/home/me/src/somepkg/bugs/dcls-classvar-hints.py", line 38, in <module>
    print( 'typing', typing.get_type_hints( Bar.__init__ ) )
  File "/usr/lib/python3.10/typing.py", line 1871, in get_type_hints
    value = _eval_type(value, globalns, localns)
  File "/usr/lib/python3.10/typing.py", line 327, in _eval_type
    return t._evaluate(globalns, localns, recursive_guard)
  File "/usr/lib/python3.10/typing.py", line 693, in _evaluate
    type_ = _type_check(
  File "/usr/lib/python3.10/typing.py", line 167, in _type_check
    raise TypeError(f"{arg} is not valid as type argument")
TypeError: typing.ClassVar[int] is not valid as type argument

(Credit to @Daraan for helping me pinpoint the problem.)

This is reproducible on both Python 3.10 and 3.13.

Possibly related to #89687.

CPython versions tested on:

3.10, 3.13

Operating systems tested on:

No response

Linked PRs

@emcd emcd added the type-bug An unexpected behavior, bug, or error label May 12, 2025
@JelleZijlstra
Copy link
Member

This isn't just about the annotations, the dataclass also generates the wrong parameters. The Bar class in your example is inferred as having two parameters:

>>> Bar(1, 2)
Bar(y=1, x=2)

dataclasses does some complicated dance to identify typing.ClassVar in string annotations; see the dataclasses._is_type function. That function doesn't appear to support cases where typing.ClassVar is re-exported from another module. Maybe it could be fixed.

My general recommendation though is to not use from __future__ import annotations if you want your annotations to be introspected reliably.

@JelleZijlstra
Copy link
Member

cc @ericvsmith @carljm for dataclasses

@emcd
Copy link
Author

emcd commented May 13, 2025

My general recommendation though is to not use from __future__ import annotations if you want your annotations to be introspected reliably.

Not to distract from the topic at hand, but can you elaborate on this advice? Not using stringified forward references can impose a lot of additional positioning constraints on code. I already have to dance around the fact that the values, which are bound to TypeAlias variables, are not forward references. Would prefer to not to make the dance any more complicated.

I have seen comments about how a change in Python 3.14+ is supposed to fix this, but that is years away from being a baseline version in terms of what needs to be supported. What is the advice in the meantime?

(For the use case which triggered the original report, I am just going to fall back to get_annotations and walk the MRO of classes myself to merge inherited annotations. I cannot control how users of my library may surface annotations on their objects.)

@JelleZijlstra
Copy link
Member

can you elaborate on this advice? Not using stringified forward references can impose a lot of additional positioning constraints on code.

Manually add strings only for specific types that need it.

@emcd
Copy link
Author

emcd commented May 13, 2025

Manually add strings only for specific types that need it.

Fair. That is cognitive overhead during development, but... if one has a type checker's language server hooked up to their development environment, then it is probably bearable.

@ericvsmith
Copy link
Member

I'd prefer not to add workarounds in _is_type for third party packages. I think that going forward, the best approach is @JelleZijlstra ’s suggestion to not use from __future__ import annotations and to use explicit strings where necessary. Maybe we should discuss this on d.p.o. and come up with a policy on it.

@emcd
Copy link
Author

emcd commented May 15, 2025

I'd prefer not to add workarounds in _is_type for third party packages.

I've had a chance to look at the code and the old b.p.o. issue mentioned in the comments. While this does give me some sympathy for your position, it really does not change the fact that this is a sharp edge on which developers can easily slice their hands. If I'm someone who has not inspected the internals and historical discussions of dataclasses and just go by the Python documentation, this behavior is very surprising. Why?

  • Most people have become quite accustomed to using the PEP 563 functionality (from __future__ import annotations) by now and are probably blissfully unaware that PEP 749 was accepted only mere days ago or what that means for their code in the longer term. (I was not aware of this until I looked for more context around your and Jelle's advice in this thread.)
  • The typing_extensions third-party package is mentioned in three different places within the official Python standard library documentation on typing, including once in the opening paragraphs. Why wouldn't a developer want to use this forward-compatibility layer instead of typing? (Especially as it seems to be officially blessed.)

Also, this jumped out at me while reading the old discussion:

Maybe we could do some additional checking if typing has been imported? If we see "T.ClassVar" ("[identifier].ClassVar" appears at the beginning of the string) then if typing has been imported, further check that "T is typing"? Although this isn't going to help with "from typing import ClassVar as CV" (but only 0.00004% of users would do that [3]_) and I don't want to use regex's for this. str.partition() is likely good enough, if we decide to go this route.

What happened to the str.partition idea? Check to see if the module name (possibly an import alias) exists in the module namespace of the dataclass. Then, if that name points to a module, inspect that module's __dict__ to see if ClassVar/InitVar exists in that namespace and is exactly typing.ClassVar/dataclasses.InitVar?

Also, was the performance difference between a regex match and an eval ever quantified? What if the code did annotation.split('[', maxsplit=1)[0].strip() on string annotations and used that as a cache key for eval-resolved objects? While this might make the dataclass-with-few-fields case marginally slower, it could actually boost the performance of dataclass-with-many-fields case. We're talking about an interpreted language here - is the performance difference that sensitive of an issue?

I think that going forward, the best approach is @JelleZijlstra ’s suggestion to not use from __future__ import annotations and to use explicit strings where necessary. Maybe we should discuss this on d.p.o. and come up with a policy on it.

If multiple CPython maintainers are saying do not use from __future__ import annotations and that is the official position of the steering council, or whoever holds the "sovereignty" of Python these days, then it could definitely use better publicizing. (Marking PEP 563 as Withdrawn is not going to be good enough.)

@JelleZijlstra
Copy link
Member

I think the particular issue here is probably fixable, something like this:

diff --git a/Lib/dataclasses.py b/Lib/dataclasses.py
index 86d29df0639..91cbe06342c 100644
--- a/Lib/dataclasses.py
+++ b/Lib/dataclasses.py
@@ -763,8 +763,8 @@ def _is_type(annotation, cls, a_module, a_type, is_type_predicate):
         else:
             # Look up module_name in the class's module.
             module = sys.modules.get(cls.__module__)
-            if module and module.__dict__.get(module_name) is a_module:
-                ns = sys.modules.get(a_type.__module__).__dict__
+            if module and isinstance(module.__dict__.get(module_name), types.ModuleType):
+                ns = module.__dict__.get(module_name).__dict__
         if ns and is_type_predicate(ns.get(match.group(2)), a_module):
             return True
     return False

If someone is interested in confirming this works, writing tests, and putting up a PR, I'd be happy to review it.

PEP 749 was accepted only mere days ago

PEP-649, which already said that the future import would be deprecated, was accepted two years ago. PEP 749 merely made the deprecation more concrete.

@dzherb
Copy link

dzherb commented May 15, 2025

I'd like to work on this

@sobolevn
Copy link
Member

@dzherb thanks! You can experiment with #133956 (comment) and add tests for this case. Probably, some other corner-cases can be found :)

@emcd
Copy link
Author

emcd commented May 15, 2025

I think the particular issue here is probably fixable, something like this:

-            if module and module.__dict__.get(module_name) is a_module:
-                ns = sys.modules.get(a_type.__module__).__dict__
+            if module and isinstance(module.__dict__.get(module_name), types.ModuleType):
+                ns = module.__dict__.get(module_name).__dict__

Yes, looks good. This is what I had in mind when I wrote "if that name points to a module, inspect that module's __dict__ ..." in my previous response. I can work on this in a few days from now, but am focused on getting a project across the line currently.

I'd like to work on this

Thanks. Go for it. If you don't get to it in a few days, then I can pick it up.

Not sure what the policy is on back-porting fixes, but it would be nice to see this in the next patch/micro releases of 3.9 - 3.13 too.

@sobolevn
Copy link
Member

We can backport fixes to 3.13 and 3.14 only :(
Others are security-only at the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir topic-dataclasses type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

7 participants