`dataclasses`: Synthetic `init` method on dataclass has broken `ClassVar` annotation under some conditions. #133956

emcd · 2025-05-12T23:49:56Z

Bug report

Bug description:

When using automatically-stringified annotations (from __future__ import annotations), typing_extensions.ClassVar is not treated the same as typing.ClassVar even though those two objects are identical. Specifically, the dataclasses.dataclass-produced __init__ method does not exclude typing_extensions.ClassVar-wrapped variables from its annotations, but does exclude typing.ClassVar-wrapped variables from its annotations. Among other problems, this causes typing.get_type_hints to choke.

Reproducer below:

from __future__ import annotations

import dataclasses as dcls
import inspect
import typing
import typing_extensions as typx


assert typing.ClassVar is typx.ClassVar


@dcls.dataclass
class Foo:

    y: int
    x: typing.ClassVar[ int ] = 1


print( "\n=== typing.ClassVar (ok) ===" )

print( 'anno', Foo.__init__.__annotations__ )
print( 'inspect', inspect.get_annotations( Foo.__init__, eval_str = True ) )
print( 'typing', typing.get_type_hints( Foo.__init__ ) )
print( 'typx', typx.get_type_hints( Foo.__init__ ) )


@dcls.dataclass
class Bar:

    y: int
    x: typx.ClassVar[ int ] = 1


print( "\n=== typing_extensions.ClassVar (breaks) ===" )

print( 'anno', Bar.__init__.__annotations__ )
print( 'inspect', inspect.get_annotations( Bar.__init__, eval_str = True ) )
print( 'typing', typing.get_type_hints( Bar.__init__ ) )
print( 'typx', typx.get_type_hints( Bar.__init__ ) )

This results in the following output:

=== typing.ClassVar (ok) ===
anno {'y': 'int', 'return': None}
inspect {'y': <class 'int'>, 'return': None}
typing {'y': <class 'int'>, 'return': <class 'NoneType'>}
typx {'y': <class 'int'>, 'return': <class 'NoneType'>}

=== typing_extensions.ClassVar (breaks) ===
anno {'y': 'int', 'x': 'typx.ClassVar[int]', 'return': None}
inspect {'y': <class 'int'>, 'x': typing.ClassVar[int], 'return': None}
Traceback (most recent call last):
  File "/home/me/src/somepkg/bugs/dcls-classvar-hints.py", line 38, in <module>
    print( 'typing', typing.get_type_hints( Bar.__init__ ) )
  File "/usr/lib/python3.10/typing.py", line 1871, in get_type_hints
    value = _eval_type(value, globalns, localns)
  File "/usr/lib/python3.10/typing.py", line 327, in _eval_type
    return t._evaluate(globalns, localns, recursive_guard)
  File "/usr/lib/python3.10/typing.py", line 693, in _evaluate
    type_ = _type_check(
  File "/usr/lib/python3.10/typing.py", line 167, in _type_check
    raise TypeError(f"{arg} is not valid as type argument")
TypeError: typing.ClassVar[int] is not valid as type argument

(Credit to @Daraan for helping me pinpoint the problem.)

This is reproducible on both Python 3.10 and 3.13.

Possibly related to #89687.

CPython versions tested on:

3.10, 3.13

Operating systems tested on:

No response

Linked PRs

gh-133956 fix bug where ClassVar string annotation in @dataclass caused incorrect __init__ generation #134073

The text was updated successfully, but these errors were encountered:

JelleZijlstra · 2025-05-13T00:02:41Z

This isn't just about the annotations, the dataclass also generates the wrong parameters. The Bar class in your example is inferred as having two parameters:

>>> Bar(1, 2)
Bar(y=1, x=2)

dataclasses does some complicated dance to identify typing.ClassVar in string annotations; see the dataclasses._is_type function. That function doesn't appear to support cases where typing.ClassVar is re-exported from another module. Maybe it could be fixed.

My general recommendation though is to not use from __future__ import annotations if you want your annotations to be introspected reliably.

JelleZijlstra · 2025-05-13T00:03:00Z

cc @ericvsmith @carljm for dataclasses

emcd · 2025-05-13T00:30:07Z

My general recommendation though is to not use from __future__ import annotations if you want your annotations to be introspected reliably.

Not to distract from the topic at hand, but can you elaborate on this advice? Not using stringified forward references can impose a lot of additional positioning constraints on code. I already have to dance around the fact that the values, which are bound to TypeAlias variables, are not forward references. Would prefer to not to make the dance any more complicated.

I have seen comments about how a change in Python 3.14+ is supposed to fix this, but that is years away from being a baseline version in terms of what needs to be supported. What is the advice in the meantime?

(For the use case which triggered the original report, I am just going to fall back to get_annotations and walk the MRO of classes myself to merge inherited annotations. I cannot control how users of my library may surface annotations on their objects.)

JelleZijlstra · 2025-05-13T00:32:19Z

can you elaborate on this advice? Not using stringified forward references can impose a lot of additional positioning constraints on code.

Manually add strings only for specific types that need it.

emcd · 2025-05-13T00:51:40Z

Manually add strings only for specific types that need it.

Fair. That is cognitive overhead during development, but... if one has a type checker's language server hooked up to their development environment, then it is probably bearable.

ericvsmith · 2025-05-14T09:30:10Z

I'd prefer not to add workarounds in _is_type for third party packages. I think that going forward, the best approach is @JelleZijlstra ’s suggestion to not use from __future__ import annotations and to use explicit strings where necessary. Maybe we should discuss this on d.p.o. and come up with a policy on it.

emcd · 2025-05-15T01:54:43Z

I'd prefer not to add workarounds in _is_type for third party packages.

I've had a chance to look at the code and the old b.p.o. issue mentioned in the comments. While this does give me some sympathy for your position, it really does not change the fact that this is a sharp edge on which developers can easily slice their hands. If I'm someone who has not inspected the internals and historical discussions of dataclasses and just go by the Python documentation, this behavior is very surprising. Why?

Most people have become quite accustomed to using the PEP 563 functionality (from __future__ import annotations) by now and are probably blissfully unaware that PEP 749 was accepted only mere days ago or what that means for their code in the longer term. (I was not aware of this until I looked for more context around your and Jelle's advice in this thread.)
The typing_extensions third-party package is mentioned in three different places within the official Python standard library documentation on typing, including once in the opening paragraphs. Why wouldn't a developer want to use this forward-compatibility layer instead of typing? (Especially as it seems to be officially blessed.)

Also, this jumped out at me while reading the old discussion:

Maybe we could do some additional checking if typing has been imported? If we see "T.ClassVar" ("[identifier].ClassVar" appears at the beginning of the string) then if typing has been imported, further check that "T is typing"? Although this isn't going to help with "from typing import ClassVar as CV" (but only 0.00004% of users would do that [3]_) and I don't want to use regex's for this. str.partition() is likely good enough, if we decide to go this route.

What happened to the str.partition idea? Check to see if the module name (possibly an import alias) exists in the module namespace of the dataclass. Then, if that name points to a module, inspect that module's __dict__ to see if ClassVar/InitVar exists in that namespace and is exactly typing.ClassVar/dataclasses.InitVar?

Also, was the performance difference between a regex match and an eval ever quantified? What if the code did annotation.split('[', maxsplit=1)[0].strip() on string annotations and used that as a cache key for eval-resolved objects? While this might make the dataclass-with-few-fields case marginally slower, it could actually boost the performance of dataclass-with-many-fields case. We're talking about an interpreted language here - is the performance difference that sensitive of an issue?

I think that going forward, the best approach is @JelleZijlstra ’s suggestion to not use from __future__ import annotations and to use explicit strings where necessary. Maybe we should discuss this on d.p.o. and come up with a policy on it.

If multiple CPython maintainers are saying do not use from __future__ import annotations and that is the official position of the steering council, or whoever holds the "sovereignty" of Python these days, then it could definitely use better publicizing. (Marking PEP 563 as Withdrawn is not going to be good enough.)

JelleZijlstra · 2025-05-15T02:04:12Z

I think the particular issue here is probably fixable, something like this:

diff --git a/Lib/dataclasses.py b/Lib/dataclasses.py
index 86d29df0639..91cbe06342c 100644
--- a/Lib/dataclasses.py
+++ b/Lib/dataclasses.py
@@ -763,8 +763,8 @@ def _is_type(annotation, cls, a_module, a_type, is_type_predicate):
         else:
             # Look up module_name in the class's module.
             module = sys.modules.get(cls.__module__)
-            if module and module.__dict__.get(module_name) is a_module:
-                ns = sys.modules.get(a_type.__module__).__dict__
+            if module and isinstance(module.__dict__.get(module_name), types.ModuleType):
+                ns = module.__dict__.get(module_name).__dict__
         if ns and is_type_predicate(ns.get(match.group(2)), a_module):
             return True
     return False

If someone is interested in confirming this works, writing tests, and putting up a PR, I'd be happy to review it.

PEP 749 was accepted only mere days ago

PEP-649, which already said that the future import would be deprecated, was accepted two years ago. PEP 749 merely made the deprecation more concrete.

dzherb · 2025-05-15T07:14:40Z

I'd like to work on this

sobolevn · 2025-05-15T07:29:39Z

@dzherb thanks! You can experiment with #133956 (comment) and add tests for this case. Probably, some other corner-cases can be found :)

emcd · 2025-05-15T12:10:30Z

I think the particular issue here is probably fixable, something like this:

-            if module and module.__dict__.get(module_name) is a_module:
-                ns = sys.modules.get(a_type.__module__).__dict__
+            if module and isinstance(module.__dict__.get(module_name), types.ModuleType):
+                ns = module.__dict__.get(module_name).__dict__

Yes, looks good. This is what I had in mind when I wrote "if that name points to a module, inspect that module's __dict__ ..." in my previous response. I can work on this in a few days from now, but am focused on getting a project across the line currently.

I'd like to work on this

Thanks. Go for it. If you don't get to it in a few days, then I can pick it up.

Not sure what the policy is on back-porting fixes, but it would be nice to see this in the next patch/micro releases of 3.9 - 3.13 too.

sobolevn · 2025-05-15T12:29:25Z

We can backport fixes to 3.13 and 3.14 only :(
Others are security-only at the moment.

…ther than 'typing_extensions.ClassVar'. (python/cpython#133956)

emcd added the type-bug An unexpected behavior, bug, or error label May 12, 2025

AlexWaygood added the topic-dataclasses label May 13, 2025

picnixz added the stdlib Python modules in the Lib dir label May 15, 2025

bedevere-app bot mentioned this issue May 15, 2025

gh-133956 fix bug where ClassVar string annotation in @dataclass caused incorrect __init__ generation #134073

Open

BrianPugh mentioned this issue May 21, 2025

NameError: name 'Annotated' is not defined BrianPugh/cyclopts#439

Open

emcd added a commit to emcd/python-dynadoc that referenced this issue May 25, 2025

Interfaces: Adjust cache fields to use different 'typing.ClassVar' ra…

238f009

…ther than 'typing_extensions.ClassVar'. (python/cpython#133956)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

`dataclasses`: Synthetic `init` method on dataclass has broken `ClassVar` annotation under some conditions. #133956

`dataclasses`: Synthetic `init` method on dataclass has broken `ClassVar` annotation under some conditions. #133956

emcd commented May 12, 2025 •

edited by bedevere-app bot

Loading

JelleZijlstra commented May 13, 2025

Uh oh!

JelleZijlstra commented May 13, 2025

Uh oh!

emcd commented May 13, 2025

Uh oh!

JelleZijlstra commented May 13, 2025

Uh oh!

emcd commented May 13, 2025

Uh oh!

ericvsmith commented May 14, 2025

Uh oh!

emcd commented May 15, 2025

Uh oh!

JelleZijlstra commented May 15, 2025

Uh oh!

dzherb commented May 15, 2025

Uh oh!

sobolevn commented May 15, 2025

Uh oh!

emcd commented May 15, 2025

Uh oh!

sobolevn commented May 15, 2025

Uh oh!

Uh oh!

dataclasses: Synthetic __init__ method on dataclass has broken ClassVar annotation under some conditions. #133956

dataclasses: Synthetic __init__ method on dataclass has broken ClassVar annotation under some conditions. #133956

Comments

emcd commented May 12, 2025 • edited by bedevere-app bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bug report

Bug description:

CPython versions tested on:

Operating systems tested on:

Linked PRs

JelleZijlstra commented May 13, 2025

Uh oh!

JelleZijlstra commented May 13, 2025

Uh oh!

emcd commented May 13, 2025

Uh oh!

JelleZijlstra commented May 13, 2025

Uh oh!

emcd commented May 13, 2025

Uh oh!

ericvsmith commented May 14, 2025

Uh oh!

emcd commented May 15, 2025

Uh oh!

JelleZijlstra commented May 15, 2025

Uh oh!

dzherb commented May 15, 2025

Uh oh!

sobolevn commented May 15, 2025

Uh oh!

emcd commented May 15, 2025

Uh oh!

sobolevn commented May 15, 2025

Uh oh!

`dataclasses`: Synthetic `init` method on dataclass has broken `ClassVar` annotation under some conditions. #133956

`dataclasses`: Synthetic `init` method on dataclass has broken `ClassVar` annotation under some conditions. #133956

emcd commented May 12, 2025 •

edited by bedevere-app bot

Loading