New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial work on generic models for pydantic v2 #4970
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks great.
Quite satisfying to see that the new core/core-schema approach has simplified this a lot, at least that's how I see it?!
Might be worth looking through the 1.10.X-fixes
branch and see if there are any bug fixes there that should be included here? E.g. #4551 mentioned below, you might just copy any new tests and check they pass.
Please update.
@samuelcolvin while investigating #4146 (comment), I noticed the following unintuitive (to me) behavior: from typing import Generic, TypeVar
from pydantic import BaseModel
DataT = TypeVar("DataT")
class MyGenericModel(BaseModel, Generic[DataT]):
generic: list[DataT] | None
class A(BaseModel):
wrapper: MyGenericModel[int]
m = MyGenericModel(generic=None)
print(A(wrapper=m))
This is happening because from pydantic import BaseModel
class Inner(BaseModel):
x: int
class SubInner(Inner):
pass
class Outer(BaseModel):
wrapper: SubInner
Outer(wrapper=Inner(x=1)) which produces essentially the same error. However, I can't help but feel it's a little awkward that you have to add the annotation to the type when initializing it to get it to pass validation as a field value. I'm not sure I have a great proposal for a resolution to this (though I think we might be able to do something where we re-validate the data for concrete parametrizations of generic models, even if we don't do that for more general model superclasses). I don't think it needs to be resolved now (or possibly ever), as things are still plenty functional if you have to use the right parametrization when instantiating the class, it just feels a little awkward since you don't need to do that for non-pydantic generic classes. I do think others will end up spending some time confused by getting that error message though if they accidentally-or-not try to pass instances of the non-concrete generic model into fields with concrete parametrizations. (because I did, and I'm the one working on the generics feature 😄). Even if we don't change the validation behavior, this might merit some special casing for the error message? Curious if you have any thoughts. |
Very easy to add a flag to either config or a model to skip the isinstancd check - just run validation against With this nascent flag set, any class instance with the right fields would pass, maybe that's okay. The problem is when to enable it? I guess we could enable it for generics by default? We could also do some kind of "is super class" check, which sounds weird but could be useful. |
While addressing some of the more complex tests I realized I was running into some awkwardnesses with the class hierarchy being generated (M[int, T][str] would be two "layers" down from M, whereas M[int, str] is only one; M[int, T][str] was a different class than M[T, str][int] was a different class than M[int, str]). To handle this, I made some more extensive changes to the implementation (adding fields like I think it may be possible to leverage this approach for a more (appropriately) limited version of the "superclass" check you are describing in a straightforward way -- basically allowing instances of the unparametrized parent class. But, I'm not totally sure that this is something that makes sense. Also, just to be clear, I'm concerned about the increase in complexity I've added but I think it's going to be necessary to handle some cases of deeply nested models with the same TypeVar used at multiple levels in the hierarchy; the more complex edge cases for which bugfixes and, thankfully, tests have been added over the last couple years :). I am pushing my progress directly here but I want to be clear that right now I'm mostly trying to find a path forward that works to handle the various edge cases, and will work on simplifying / reducing the amount of new/unnecessary fields once I figure out something that works. |
One interesting thing I noticed -- when using a bounded or constrained TypeVar, we don't do class-creation-time validation of parameter compatibility. In particular: from typing import TypeVar, Generic
from pydantic import BaseModel
T = TypeVar("T", bound=str)
class Model(BaseModel, Generic[T]):
data: T
Model[int](data=1) # no error due to TypeVar substitution with `int`, despite being a type system error
Model(data=1) # error thanks to bound on TypeVar requiring an instance of `str` This would be caught by mypy, and I'm thinking we probably don't want to get into the business of being a runtime type system in addition to being a runtime type validator, but just wanted to point this out. On a semi-related note, when it comes time to implement subclass checking, we may want to use information about the covariance/contravariance of the TypeVars involved; I'm not sure whether 1.X does that currently (I'm assuming not, but I need to check how that all works). I guess it would come down to how hard it is to do runtime validation of subclass relationships of type parameters, writing that out makes me think probably not going to happen, but I'll look into it a bit. |
On the note of compatibility with runtime subclass checking of generics: from typing import Sequence
issubclass(Sequence[int], Sequence[float])
# TypeError: Subscripted generics cannot be used with class and instance checks @samuelcolvin I'm wondering if disallowing subclass checks entirely with parameterized generics might be smart for reducing the amount of complex behavior we need to support; notice PyCharm even shows an error when you try (this is the case even if you introduce an alias for I can imagine some advantages to supporting this, but I'm wondering if it's smarter to leave this to the type system and use a discriminated-union-esque approach if you really need to differentiate between specific generic parametrizations at runtime. Note, we could still aim to support putting the parametrized generic in the first position of the subclasscheck (this currently works on this branch), which I would think would be enough for most practical cases, but "standard" (i.e., from |
We've agreed that we don't need subclass stuff to work since it doesn't work with typing generics. |
I don't see PEP646 or variadic generics being mentioned in this draft. Are you planning on adding this later? |
I've paused work on this temporarily while trying to work through JSON schema and some other smaller things, and trying to familiarize myself more with pydantic_core, where I want to move at least some of the generics logic to get around some of the more complex challenges (such as recursive generic models with long cycles). That said, I am hoping to add support for variadic generics as part of this. If it turns out to be particularly complicated I may postpone that in favor of getting non-variadic generic models working, but I am (cautiously?) optimistic. To be clear, the use case I have in mind would be something like: class MyGeneric(BaseModel, Generic[*Shape]):
x: int
y: tuple[*Shape] @caniko — if you have a different or more complex use case in mind than just passing the TypeVarTuple through to a generic field, I would appreciate if you could share, to make sure it is considered during implementation. |
Your example is fine:
Moreover, in Tuple[str, *Tuple[int, int], bool] -> Tuple[str, int, int, bool] This is an extension of the logic from your example. Also keep in mind that normal types like T = TypeVar('T') # or NewType('T')
Y = TypeVar('Y') # or NewType('Y')
Shape = TypeVarTuple('Shape')
class MyGeneric(BaseModel, Generic[T, Y, *Shape]):
x: T
y: tuple[*Shape]
z: list[Y, *Shape] |
Closing in favor of a new branch that I think is closer. |
Relevant issue: #4665
I have opened this as a draft pull request to get some initial feedback on:
typevars_map
as an argument to various schema-building functionsBaseModel
pydantic.main
and some topydantic.generics_utils
. (Part of the motivation of creating a new file was to keep the PR small/easy-to-review for now. I am imagining that this file will eventually replace the currentpydantic.generics
.)pydantic.generics_utils.create_generic_submodel
andBaseModel.__class_getitem__
I am sure that there are still a variety of edge cases that are better-handled in pydantic v1 because I haven't tested them (e.g., anything depending on
_assigned_parameters
, which I haven't needed to migrate yet).If I get positive feedback on the approach in this PR, I will review those edge cases more thoroughly.
I have tried to bias toward copying less of the existing code when I didn't fully understand what it was used for, though I intend to check more thoroughly for any regressions soon (definitely before merging). Hopefully the mostly-disabled tests of
GenericModel
cover most of these..