Skip to content

Model subclass instances #492

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Mar 30, 2023
Merged

Model subclass instances #492

merged 8 commits into from
Mar 30, 2023

Conversation

samuelcolvin
Copy link
Member

@samuelcolvin samuelcolvin commented Mar 29, 2023

TODO:

  • make sure revalidate works without from_attributes
  • do we need to remove warnings from model serialization on extra fields? - turns out I already thought about this!
  • implement the same logic on dataclasses
  • I think this now means we're calling __instancecheck__ in python on every validation, that's going to be super slow, we might need our own fast isinstance check, this in turn will need to work with the nascent PydanticGenericAlias - this only seems to add ~10%, I think fine

This allows both instances of the model, and instances of subclasses of the model to be validated.

In strict mode, only instances of the exact model type are allowed - this is up for discussion.

In both strict and lax mode, regardless of whether we have an exact instance or subclass instance, the validate_instances config flag (can also be set on the validator directly) forces fields to be revalidated.

When re-validating, __fields_set__ from the original instance is reused.

Leaking private data and "the FastAPI problem"

A big problem FastAPI had is was that allowing subclasses meant private information on subclasses was being leaked during serialisation.

E.g.:

from pydantic import BaseModel


class DataModel(BaseModel):
    name: str
    age: int


class ResponseModel(BaseModel):
    data: DataModel


class PrivateDataModel(DataModel):
    massive_secret: str


def my_view():
    data = PrivateDataModel(
        name='John',
        age=30,
        massive_secret=(
            'generative AI is good at inferring meaning and bad a '
            'calculating an answer LLMs are over-hyped'
        ),
    )
    response = ResponseModel(data=data)
    return response.json(indent=2)


print(my_view())
# output includes "massive_secret" in V1, but not in V2

However this is not a problem with Pydantic V2 since the serializer build for ResponseModel uses DataModel to build the serialization logic for .data, and only fields of DataModel are included.

You can run the above on pydantic main now and it works correctly (you'll need to install pydantic-core from this branch, and change .json to .model_dump_json, but otherwise it'll work.

@tiangolo please confirm you're happy with this approach.

@codspeed-hq
Copy link

codspeed-hq bot commented Mar 29, 2023

CodSpeed Performance Report

Merging #492 model-subclass-instances (865d1fb) will not alter performances.

Summary

🔥 0 improvements
❌ 0 regressions
✅ 93 untouched benchmarks

🆕 2 new benchmarks
⁉️ 0 dropped benchmarks

Benchmarks breakdown

Benchmark main model-subclass-instances Change
🆕 test_model_instance N/A 47.2 µs N/A
🆕 test_model_instance_abc N/A 48.3 µs N/A

@codecov-commenter
Copy link

codecov-commenter commented Mar 29, 2023

Codecov Report

Merging #492 (64d2efb) into main (12f596d) will decrease coverage by 0.09%.
The diff coverage is 87.77%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #492      +/-   ##
==========================================
- Coverage   94.73%   94.65%   -0.09%     
==========================================
  Files          93       93              
  Lines       11665    11686      +21     
  Branches       25       25              
==========================================
+ Hits        11051    11061      +10     
- Misses        607      618      +11     
  Partials        7        7              
Impacted Files Coverage Δ
src/input/input_abstract.rs 87.62% <33.33%> (ø)
src/validators/dataclass.rs 95.69% <81.81%> (-1.46%) ⬇️
src/validators/model.rs 98.61% <97.56%> (+1.55%) ⬆️
pydantic_core/core_schema.py 96.72% <100.00%> (+<0.01%) ⬆️
src/input/input_python.rs 98.26% <100.00%> (-0.01%) ⬇️

... and 1 file with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 12f596d...64d2efb. Read the comment docs.

@tiangolo
Copy link
Collaborator

I'm happy with this approach! Thanks for pinging and tagging me! 🙇


More details:

I was having a small doubt I wanted to confirm on the laptop, thinking about non-model fields. E.g. list[X], and how serialization would work in that case for a non-model.

But I realized, first, I would probably take whatever is declared in the FastAPI path operation function as a Pydantic field, and I'm gonna be the one that creates ResponseModel with that field behind the scenes (which is what I already do in v1).

And then I double checked/confirmed it worked as expected by slightly modifying the example to use a list[DataModel] and it worked, so I think this is perfect. This will allow me to remove a lot of (potentially fragile) workaround cruft in FastAPI and definitely improve performance on that side. 🚀 🎉

Thanks again for doing this! 🙇

Modified example with list[DataModel] for completeness:

from pydantic import BaseModel


class DataModel(BaseModel):
    name: str
    age: int


class ResponseModel(BaseModel):
    data: list[DataModel]


class PrivateDataModel(DataModel):
    massive_secret: str


def my_view():
    data = PrivateDataModel(
        name='John',
        age=30,
        massive_secret=(
            'generative AI is good at inferring meaning and bad a '
            'calculating an answer LLMs are over-hyped'
        ),
    )
    response = ResponseModel(data=[data])
    return response.model_dump_json(indent=2)


print(my_view())
# output includes "massive_secret" in V1, but not in V2

@dmontagu
Copy link
Collaborator

dmontagu commented Mar 30, 2023

In both strict and lax mode, regardless of whether we have an exact instance or subclass instance, the validate_instances config flag (can also be set on the validator directly) forces fields to be revalidated.

As discussed elsewhere (but want to make sure it doesn't get lost) — I think that it is more likely to be useful to have strict mode allow subclass instances. (And I think it will definitely be an early feature request if we don't do that now.)

The reason I say this is that, over the years, many people requested that FastAPI be stricter about what is accepted during parsing (i.e., not coercing strings to int, etc.). But I think that is largely orthogonal to the behavior of how handling subclasses works, since that is really a pure-python-API consideration (you don't have class information while parsing, of course).

Considering that even with the strictest possible settings, type checkers such as mypy and pyright will not produce errors if you pass proper subclass instances to fields, I expect that, at least among the users it affects, it will be much more popular to allow subclass instances than not. (I could be convinced otherwise if someone could share a compelling use case for this, I haven't seen one though; you can use Final to make it a type error to subclass a class, and it just seems pretty contrived to me that you'd want to allow a class to be subclassed, but disallow subclass instances from being used in your pydantic models.)

I know I personally would have preferred both to have strict parsing of primitives and to allow subclass instances to pass validation from the python API (i.e., being allowed as inputs to __init__) in all of the FastAPI application codebases I have built over the past years.


I'll just add a brief note about invariance, to head off any potential attempts to draw an analogy with how mypy will give a type error for List[Subclass] if you try to pass it to a field annotated as type List[Class]:

The invariance of List means that List[Subclass] is not considered a subclass of List[Class]. It's still the case that subclasses are always allowed as instances of fields, it's just that instances of MyGeneric[Subclass] are allowed to be used as instances of MyGeneric[Class] precisely when MyGeneric is covariant (i.e., the TypeVar used to define it is marked as covariant). And List is not covariant.

@samuelcolvin samuelcolvin force-pushed the model-subclass-instances branch from 082ccdd to f6ea4c9 Compare March 30, 2023 10:11
@samuelcolvin samuelcolvin enabled auto-merge (squash) March 30, 2023 11:23
@samuelcolvin samuelcolvin merged commit 956a235 into main Mar 30, 2023
@samuelcolvin samuelcolvin deleted the model-subclass-instances branch March 30, 2023 11:27
@samuelcolvin
Copy link
Member Author

Merged this by mistake.

@dmontagu your logic matches the "wisdom" of the twitter crowd, I'll change to match that behaviour.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants