Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial work on generic models for pydantic v2 #4970

Closed
wants to merge 12 commits into from

Conversation

dmontagu
Copy link
Contributor

Relevant issue: #4665

I have opened this as a draft pull request to get some initial feedback on:

  • The inclusion of typevars_map as an argument to various schema-building functions
  • The addition of some generics-related methods to BaseModel
  • Moving some of the existing generics functionality to pydantic.main and some to pydantic.generics_utils. (Part of the motivation of creating a new file was to keep the PR small/easy-to-review for now. I am imagining that this file will eventually replace the current pydantic.generics.)
  • The details of pydantic.generics_utils.create_generic_submodel and BaseModel.__class_getitem__

I am sure that there are still a variety of edge cases that are better-handled in pydantic v1 because I haven't tested them (e.g., anything depending on _assigned_parameters, which I haven't needed to migrate yet).

If I get positive feedback on the approach in this PR, I will review those edge cases more thoroughly.

I have tried to bias toward copying less of the existing code when I didn't fully understand what it was used for, though I intend to check more thoroughly for any regressions soon (definitely before merging). Hopefully the mostly-disabled tests of GenericModel cover most of these..

pydantic/main.py Outdated Show resolved Hide resolved
pydantic/generics_utils.py Outdated Show resolved Hide resolved
pydantic/generics_utils.py Outdated Show resolved Hide resolved
Copy link
Member

@samuelcolvin samuelcolvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks great.

Quite satisfying to see that the new core/core-schema approach has simplified this a lot, at least that's how I see it?!

Might be worth looking through the 1.10.X-fixes branch and see if there are any bug fixes there that should be included here? E.g. #4551 mentioned below, you might just copy any new tests and check they pass.

Please update.

pydantic/_internal/_generate_schema.py Outdated Show resolved Hide resolved
pydantic/_internal/_generate_schema.py Outdated Show resolved Hide resolved
pydantic/_internal/_generate_schema.py Outdated Show resolved Hide resolved
pydantic/generics_utils.py Outdated Show resolved Hide resolved
pydantic/generics_utils.py Outdated Show resolved Hide resolved
pydantic/main.py Outdated Show resolved Hide resolved
pydantic/main.py Outdated Show resolved Hide resolved
pydantic/main.py Outdated Show resolved Hide resolved
pydantic/main.py Outdated Show resolved Hide resolved
pydantic/main.py Show resolved Hide resolved
@dmontagu
Copy link
Contributor Author

dmontagu commented Jan 23, 2023

@samuelcolvin while investigating #4146 (comment), I noticed the following unintuitive (to me) behavior:

from typing import Generic, TypeVar
from pydantic import BaseModel

DataT = TypeVar("DataT")

class MyGenericModel(BaseModel, Generic[DataT]):
    generic: list[DataT] | None


class A(BaseModel):
    wrapper: MyGenericModel[int]

m = MyGenericModel(generic=None)
print(A(wrapper=m))
Traceback (most recent call last):
  File "scratch_66.py", line 14, in <module>
    print(A(wrapper=m))
          ^^^^^^^^^^^^
  File "pydantic/pydantic/main.py", line 155, in __init__
    values, fields_set = __pydantic_self__.__pydantic_validator__.validate_python(data)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pydantic_core._pydantic_core.ValidationError: 1 validation error for MyGenericModel[int]
wrapper
  Input should be a valid dictionary [type=dict_type, input_value=MyGenericModel(generic=None), input_type=MyGenericModel]

This is happening because m here is an instance of MyGenericModel, which is a superclass of MyGenericModel[int]. There's nothing specific to generics with this error — this is basically the same as doing:

from pydantic import BaseModel

class Inner(BaseModel):
    x: int

class SubInner(Inner):
    pass

class Outer(BaseModel):
    wrapper: SubInner

Outer(wrapper=Inner(x=1))

which produces essentially the same error.

However, I can't help but feel it's a little awkward that you have to add the annotation to the type when initializing it to get it to pass validation as a field value.

I'm not sure I have a great proposal for a resolution to this (though I think we might be able to do something where we re-validate the data for concrete parametrizations of generic models, even if we don't do that for more general model superclasses). I don't think it needs to be resolved now (or possibly ever), as things are still plenty functional if you have to use the right parametrization when instantiating the class, it just feels a little awkward since you don't need to do that for non-pydantic generic classes.

I do think others will end up spending some time confused by getting that error message though if they accidentally-or-not try to pass instances of the non-concrete generic model into fields with concrete parametrizations. (because I did, and I'm the one working on the generics feature 😄). Even if we don't change the validation behavior, this might merit some special casing for the error message?

Curious if you have any thoughts.

@samuelcolvin
Copy link
Member

Very easy to add a flag to either config or a model to skip the isinstancd check - just run validation against .__dict__ and see what happens.

With this nascent flag set, any class instance with the right fields would pass, maybe that's okay.

The problem is when to enable it? I guess we could enable it for generics by default?

We could also do some kind of "is super class" check, which sounds weird but could be useful.

@dmontagu
Copy link
Contributor Author

We could also do some kind of "is super class" check, which sounds weird but could be useful.

While addressing some of the more complex tests I realized I was running into some awkwardnesses with the class hierarchy being generated (M[int, T][str] would be two "layers" down from M, whereas M[int, str] is only one; M[int, T][str] was a different class than M[T, str][int] was a different class than M[int, str]).

To handle this, I made some more extensive changes to the implementation (adding fields like __pydantic_generic_parent__ to hold a reference to the last class created the "normal" way, which then gets reparametrized using the full set of arguments when you do partial parametrizations). This means that no matter how you go about doing a partial parametrization, you should get class equality if the parametrization is the same, and parametrization only adds one layer to the class hierarchy no matter how many times __class_getitem__ is called (with unsubstituted TypeVars).

I think it may be possible to leverage this approach for a more (appropriately) limited version of the "superclass" check you are describing in a straightforward way -- basically allowing instances of the unparametrized parent class. But, I'm not totally sure that this is something that makes sense.

Also, just to be clear, I'm concerned about the increase in complexity I've added but I think it's going to be necessary to handle some cases of deeply nested models with the same TypeVar used at multiple levels in the hierarchy; the more complex edge cases for which bugfixes and, thankfully, tests have been added over the last couple years :). I am pushing my progress directly here but I want to be clear that right now I'm mostly trying to find a path forward that works to handle the various edge cases, and will work on simplifying / reducing the amount of new/unnecessary fields once I figure out something that works.

@dmontagu
Copy link
Contributor Author

One interesting thing I noticed -- when using a bounded or constrained TypeVar, we don't do class-creation-time validation of parameter compatibility.

In particular:

from typing import TypeVar, Generic

from pydantic import BaseModel

T = TypeVar("T", bound=str)


class Model(BaseModel, Generic[T]):
    data: T

Model[int](data=1)  # no error due to TypeVar substitution with `int`, despite being a type system error
Model(data=1)  # error thanks to bound on TypeVar requiring an instance of `str`

This would be caught by mypy, and I'm thinking we probably don't want to get into the business of being a runtime type system in addition to being a runtime type validator, but just wanted to point this out.


On a semi-related note, when it comes time to implement subclass checking, we may want to use information about the covariance/contravariance of the TypeVars involved; I'm not sure whether 1.X does that currently (I'm assuming not, but I need to check how that all works). I guess it would come down to how hard it is to do runtime validation of subclass relationships of type parameters, writing that out makes me think probably not going to happen, but I'll look into it a bit.

@dmontagu
Copy link
Contributor Author

dmontagu commented Jan 26, 2023

On the note of compatibility with runtime subclass checking of generics:

from typing import Sequence

issubclass(Sequence[int], Sequence[float])

# TypeError: Subscripted generics cannot be used with class and instance checks

@samuelcolvin I'm wondering if disallowing subclass checks entirely with parameterized generics might be smart for reducing the amount of complex behavior we need to support; notice PyCharm even shows an error when you try (this is the case even if you introduce an alias for A[int] below):

image

I can imagine some advantages to supporting this, but I'm wondering if it's smarter to leave this to the type system and use a discriminated-union-esque approach if you really need to differentiate between specific generic parametrizations at runtime.

Note, we could still aim to support putting the parametrized generic in the first position of the subclasscheck (this currently works on this branch), which I would think would be enough for most practical cases, but "standard" (i.e., from typing) generics don't seem to support this either.

@samuelcolvin
Copy link
Member

We've agreed that we don't need subclass stuff to work since it doesn't work with typing generics.

@caniko
Copy link
Contributor

caniko commented Feb 13, 2023

I don't see PEP646 or variadic generics being mentioned in this draft. Are you planning on adding this later?

@dmontagu
Copy link
Contributor Author

dmontagu commented Feb 14, 2023

I've paused work on this temporarily while trying to work through JSON schema and some other smaller things, and trying to familiarize myself more with pydantic_core, where I want to move at least some of the generics logic to get around some of the more complex challenges (such as recursive generic models with long cycles).

That said, I am hoping to add support for variadic generics as part of this. If it turns out to be particularly complicated I may postpone that in favor of getting non-variadic generic models working, but I am (cautiously?) optimistic.

To be clear, the use case I have in mind would be something like:

class MyGeneric(BaseModel, Generic[*Shape]):
    x: int
    y: tuple[*Shape]

@caniko — if you have a different or more complex use case in mind than just passing the TypeVarTuple through to a generic field, I would appreciate if you could share, to make sure it is considered during implementation.

@caniko
Copy link
Contributor

caniko commented Feb 14, 2023

Your example is fine:

class MyGeneric(BaseModel, Generic[*Shape]):
    x: int
    y: tuple[*Shape]

Moreover, in >=3.11:

Tuple[str, *Tuple[int, int], bool] -> Tuple[str, int, int, bool]

This is an extension of the logic from your example.

Also keep in mind that normal types like TypeVar instances can precede a TypeVarTuple instance:

T = TypeVar('T')    # or NewType('T')
Y = TypeVar('Y')    # or NewType('Y')
Shape = TypeVarTuple('Shape')

class MyGeneric(BaseModel, Generic[T, Y, *Shape]):
    x: T
    y: tuple[*Shape]
    z: list[Y, *Shape]

@dmontagu
Copy link
Contributor Author

dmontagu commented Mar 1, 2023

Closing in favor of a new branch that I think is closer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants