New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for arbitrary class instances #520
Conversation
Codecov Report
@@ Coverage Diff @@
## master #520 +/- ##
=====================================
Coverage 100% 100%
=====================================
Files 14 14
Lines 2271 2283 +12
Branches 447 449 +2
=====================================
+ Hits 2271 2283 +12 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@samuelcolvin There's a subtle point I would like you to check and give me your opinion.
@@ -111,9 +111,9 @@ class Foo(BaseModel): | |||
} | |||
|
|||
with pytest.raises(ValidationError) as exc_info: | |||
module.Foo(b={'a': '321'}, c=[{'b': 234}], d={'bar': {'a': 345}}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a caveat here: for the inner b
, it would try to use 234
as an arbitrary class instance.
It would then try to extract each of the fields of the self-referencing Foo
model.
As 234
is just a number, it wouldn't have any of the fields.
But, as all the fields are optional or with defaults, the final value of b
would be just a Foo
model with all the default values.
So, in a similar way that an int
would be converted to True
when a field has type bool
, any arbitrary object (of any value, including int
) without attributes would be treated as an empty dict
, and converted to the equivalent of a model with default values.
If you don't like it @samuelcolvin , other option I see would be to accept an object as an arbitrary only if it is not a sub-class of the known singular classes (int
, str
, etc).
@tiangolo - have you investigated the performance implications of this change? If I follow the change correctly, you're replacing one straight I'd hope most ORMs provide a better way to get this info, for example a super dirty SQLAlchemy example would be |
@cjw296 that's very ORM-dependent. They all have their own way of doing it, it's not very standard. If they all implemented Let's wait for @samuelcolvin's opinion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will need to review it much more when I have time, but I'm happy with the principal and I think my small below should probably be addressed before further review.
Thanks or the contribution and the obvious time you've put into thinking about this.
pydantic/main.py
Outdated
@@ -228,15 +227,17 @@ class BaseModel(metaclass=MetaModel): | |||
Config = BaseConfig | |||
__slots__ = ('__values__', '__fields_set__') | |||
|
|||
def __init__(self, **data: Any) -> None: | |||
def __init__(self, _object: Any = None, **data: Any) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to maintain a hard rule that only data is passed to __init__
to end users using pydantic APIs being able to achieve unexpected behaviour by setting special fields.
In this case every use of pydantic models what didn't want to use _object
would need a manual check before calling Model(**mydata)
that mydata
doesn't include _object
.
Better to add another classmethod
a bit like construct
that creates a new model instead without calling __init__
(or maybe does use __init__
?), but leave __init__
's signature unchanged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here the problem is that Field
's implementation instantiates the class directly (calling __init__
), then we would need checks in Field
s validators to check which method to call in several places.
But I just realized that this could be achieved with *args
, that way we avoid that problem. Let me try implementing that and see if that works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean MyModel
accepting (*args, **kwargs)
like dataclasses?
I'm not opposed to that but I think it might end up being a big change. Maybe worth doing it as a separate PR before this?
More generally if you you can use *args
, maybe you'd be best to just use pydantic dataclasses for all of this, then Modal
could maybe remain (mostly) unchanged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just updated it. Nope, no special support. It only avoids the problem of Model(_object="foo", b="bar")
.
*args
is used only to avoid having a possible keyword argument. The first value would be extracted from args
, only when there's no **data
. But it doesn't add any special support for *args
. And actually, only the first value is used. But by having *args
, it can't be passed as a keyword argument.
So, it would keep working as normally, with Model(a="foo", b="bar")
.
But when using Model(my_object)
, my_object
would be taken from the *args
, only the first element.
So, Model(_object="foo", b="bar")
wouldn't be a problem.
I can't use only dataclasses because what I need is to support it when Field.validate
creates a model from a class.
Field.validate
calls Field._validate_singleton
, etc.
They then call Field.apply_validators
.
And then those will call (later at some point) cls(data)
. With cls
being a Pydantic model (normally a standard model inheriting from BaseModel
).
But by implementing it this way, it has the effect that it can be used in many other cases (with ORMs or other data objects, standard dataclasses, etc).
pydantic/main.py
Outdated
model: Union[BaseModel, Type[BaseModel]], | ||
input_data: 'DictStrAny', | ||
raise_exc: bool = True, | ||
return_fields_set: bool = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
surely we can make validate_model
always return fields_set
? see #518 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! That's easier/simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, I'll try to merge #518 before this PR so you might get some conflicts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No prob, I'll fix them.
Still, if this PR is accepted, the implementation of #518 would change (except the tests).
If you merge this first, and then #518 only adds the tests, there wouldn't be any conflicts (as I'm intentionally not adding those tests).
Or you can copy the code in here to #518 (or some part of it).
@@ -593,11 +613,15 @@ def validate_model( # noqa: C901 (ignore complexity) | |||
f'you might need to call {model.__class__.__name__}.update_forward_refs().' | |||
) | |||
|
|||
value = input_data.get(field.alias, _missing) | |||
value = _get_input_value(input_data, field.alias) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really care about the performance of the new ORM functionality since I won't ever be using an ORM (unless I end up in hell or working at a bank).
But I do care a lot about the performance of pydantic. Will this have a performance impact? If so how much and how can we mitigate it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. I'll implement the changes and then we can check if it's OK for Pydantic's standard use.
And later, after this PR, I or anyone else can do more benchmarks for specific ORMs/tools and see if things can be improved there. But we'll have a baseline to start.
You probably want to rebase to avoid conflicts with #518 now it's merged |
31df355
to
82c3400
Compare
Great! Done ✔️ |
Humm, I'm very keen to try and help here, but I'm quite scared of this, I think it will change the behaviour of pydantic in some subtle ways that will be very hard to debug. I also think it will also damage performance and (worse) make further performance optimisations (eg. rewriting some of Please could you provide a self contained example of what you're trying to allow here and I'll take it away and hack at it. Maybe I can find another way, maybe I can't and I'll be persuaded that this solution is ok. Sorry if that's an annoying response. |
https://www.youtube.com/watch?v=GiPe1OiKQuk I get this feeling quite often, here I feel there are edge cases that will break - I can only partially think what they are right now, but I'd rather just avoid them completely. |
Hehe, I know (unknow). And from the changes proposed it's not obvious how this all actually affects what I'm trying to achieve. Let me write something that can show the use case better here. |
The simplest/most generic use case: from pydantic import BaseModel
from typing import List
class PetCls:
def __init__(self, *, name: str, species: str):
self.name = name
self.species = species
class PersonCls:
def __init__(self, *, name: str, age: float = None, pets: List[PetCls]):
self.name = name
self.age = age
self.pets = pets
class Pet(BaseModel):
name: str
species: str
class Person(BaseModel):
name: str
age: float = None
# Having recursive models/sub-models is what makes it necessary for Pydantic models
# To receive arbitrary objects too
pets: List[Pet]
bones = PetCls(name="Bones", species="dog")
orion = PetCls(name="Orion", species="cat")
alice = PersonCls(name="Alice", age=20, pets=[bones, orion])
alice_model = Person(alice)
assert alice_model.dict() == {
"name": "Alice",
"pets": [{"name": "Bones", "species": "dog"}, {"name": "Orion", "species": "cat"}],
"age": 20.0,
} |
The closest to my actual use case in FastAPI: from pydantic import BaseModel, Schema, BaseConfig
from pydantic.json import pydantic_encoder
from pydantic.fields import Field
from typing import List
class PetCls:
def __init__(self, *, name: str, species: str):
self.name = name
self.species = species
class PersonCls:
def __init__(self, *, name: str, age: float = None, pets: List[PetCls]):
self.name = name
self.age = age
self.pets = pets
class Pet(BaseModel):
name: str
species: str
class Person(BaseModel):
name: str
age: float = None
# Having recursive models/sub-models is what makes it necessary for Pydantic models
# To receive arbitrary objects too
pets: List[Pet]
# In FastAPI, a response is declared with a Python type
# the same as for Pydantic model's attributes
# This allows using, e.g. List[Person]
response_type = List[Person]
# Because the type is List[Person], FastAPI creates a Field from it, not a Pydantic model
field = Field(
name="person_list",
type_=response_type,
class_validators={},
default=None,
required=False,
model_config=BaseConfig,
schema=Schema(None),
)
bones = PetCls(name="Bones", species="dog")
orion = PetCls(name="Orion", species="cat")
goose = PetCls(name="Goose", species="flerken")
snowball = PetCls(name="Snowball", species="cat")
alice = PersonCls(name="Alice", age=20, pets=[bones, orion])
bob = PersonCls(name="Blice", age=25, pets=[goose, snowball])
people = [alice, bob]
# Afterwards, FastAPI validates/serializes the data using the field.validate method
# Because it calls the classes (Pet, Person) with the data directly, adding a new
# method (like some_pydantic_model.from_arbitrary(alice)) wouldn't work. Because
# field.validate() internally ends up calling Person(alice).
values, errors = field.validate(people, {}, loc="root")
print(values)
# [<Person name='Alice' pets=[<Pet name='Bones' species='dog'>, <Pet name='Orion' species='cat'>] age=20.0>, <Person name='Blice' pets=[<Pet name='Goose' species='flerken'>, <Pet name='Snowball' species='cat'>] age=25.0>]
# FastAPI uses its own jsonable_encoder, but it does something very similar to this:
jsonable_people = [pydantic_encoder(p) for p in values]
# This data is what would be returned by an endpoint (converted to JSON)
assert jsonable_people == [
{
"name": "Alice",
"pets": [
{"name": "Bones", "species": "dog"},
{"name": "Orion", "species": "cat"},
],
"age": 20.0,
},
{
"name": "Blice",
"pets": [
{"name": "Goose", "species": "flerken"},
{"name": "Snowball", "species": "cat"},
],
"age": 25.0,
},
] |
The second example is the closest to my actual use case, simplifying as much as possible, without mixing SQLAlchemy nor FastAPI itself in it. The first example is what I suspect would be the most obvious "benefit"/use case for other Pydantic users, being able to call: alice_model = Person(alice) A side benefit is that even when using ORMs/tools that have recursive/circular references, by declaring models without circular references the original data can be converted to Pydantic models that can be serialized to JSON without the circular reference problem. It also solves several cases with dynamic attributes, it would work for Python If that still was not desirable/acceptable, another option that I considered was being able to inject the function to use for But that would imply changing the signature in all the I don't know if that would also affect performance (maybe more). After trying something with that style, I realized that if it was implemented at this level (as proposed in this PR) it would work with fewer changes in Pydantic, and would have that side-effect/"benefit". |
please review #562 and let me know if that works for you. It's quite a small change but I hope it covers this use case. |
replaced by #562 |
Change Summary
Add support for arbitrary class instances.
From the new docs:
This makes Pydantic compatible with virtually all ORMs and many other tools.
A Pydantic model would be able to receive a SQLAlchemy model, a MongoEngine (MongoDB) model, etc.
It doesn't make any assumption about the underlying data, nor any compromise with any third party library.
I also tried to minimize the code change surface and the code flow alterations. So, using it as normally should be close to unaltered, but this new use case would be supported too.
This solves several things requested by FastAPI users (I think some have requested similar things here too).
I planned and started the code initially in FastAPI directly but I ended up having to duplicate most of the
Field
validation functions, after trying several different things, to modify a single line in them. It would become a lot of code duplication and desynchronization with the upstream code here in Pydantic. So, I thought it could be added here.Related issue number
Slightly related to #491
Related to/fixes in FastAPI:
tiangolo/fastapi#218
tiangolo/fastapi#214
tiangolo/fastapi#194
tiangolo/fastapi#84
Checklist
HISTORY.rst
has been updated#<number>
@<whomever>