Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for arbitrary class instances #520

Closed
wants to merge 12 commits into from

Conversation

tiangolo
Copy link
Member

@tiangolo tiangolo commented May 13, 2019

Change Summary

Add support for arbitrary class instances.

From the new docs:

from pydantic import BaseModel

class DatabaseFoo:
    def __init__(self, count, size=None):
        self.count = count
        self.size = size

class Foo(BaseModel):
    count: int = ...
    size: float = None

database_foo = DatabaseFoo(count=2, size=4.2)
m = Foo(database_foo)
print(m)
# Foo count=2 size=4.2
print(m.dict())
# {'count': 2, 'size': 4.2}

This makes Pydantic compatible with virtually all ORMs and many other tools.

A Pydantic model would be able to receive a SQLAlchemy model, a MongoEngine (MongoDB) model, etc.

It doesn't make any assumption about the underlying data, nor any compromise with any third party library.

I also tried to minimize the code change surface and the code flow alterations. So, using it as normally should be close to unaltered, but this new use case would be supported too.


This solves several things requested by FastAPI users (I think some have requested similar things here too).

I planned and started the code initially in FastAPI directly but I ended up having to duplicate most of the Field validation functions, after trying several different things, to modify a single line in them. It would become a lot of code duplication and desynchronization with the upstream code here in Pydantic. So, I thought it could be added here.

Related issue number

Slightly related to #491

Related to/fixes in FastAPI:

tiangolo/fastapi#218
tiangolo/fastapi#214
tiangolo/fastapi#194
tiangolo/fastapi#84

Checklist

  • Unit tests for the changes exist
  • Tests pass on CI and coverage remains at 100%
  • Documentation reflects the changes where applicable
  • HISTORY.rst has been updated
    • if this is the first change since a release, please add a new section
    • include the issue number or this pull request number #<number>
    • include your github username @<whomever>

@codecov
Copy link

codecov bot commented May 13, 2019

Codecov Report

Merging #520 into master will not change coverage.
The diff coverage is 100%.

@@          Coverage Diff          @@
##           master   #520   +/-   ##
=====================================
  Coverage     100%   100%           
=====================================
  Files          14     14           
  Lines        2271   2283   +12     
  Branches      447    449    +2     
=====================================
+ Hits         2271   2283   +12

Copy link
Member Author

@tiangolo tiangolo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@samuelcolvin There's a subtle point I would like you to check and give me your opinion.

@@ -111,9 +111,9 @@ class Foo(BaseModel):
}

with pytest.raises(ValidationError) as exc_info:
module.Foo(b={'a': '321'}, c=[{'b': 234}], d={'bar': {'a': 345}})
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a caveat here: for the inner b, it would try to use 234 as an arbitrary class instance.

It would then try to extract each of the fields of the self-referencing Foo model.

As 234 is just a number, it wouldn't have any of the fields.

But, as all the fields are optional or with defaults, the final value of b would be just a Foo model with all the default values.

So, in a similar way that an int would be converted to True when a field has type bool, any arbitrary object (of any value, including int) without attributes would be treated as an empty dict, and converted to the equivalent of a model with default values.


If you don't like it @samuelcolvin , other option I see would be to accept an object as an arbitrary only if it is not a sub-class of the known singular classes (int, str, etc).

@cjw296
Copy link

cjw296 commented May 14, 2019

@tiangolo - have you investigated the performance implications of this change? If I follow the change correctly, you're replacing one straight dict.get() with a functional call, a try/except on .get and then a fall through to attribute lookup for every attribute.

I'd hope most ORMs provide a better way to get this info, for example a super dirty SQLAlchemy example would be MyPydanticModel(**my_orm_obj.__dict__), @zzzeek can probably tell us the right way to do that, as that will have the SQLCalchemy instance state variable in it.

@tiangolo
Copy link
Member Author

@cjw296 that's very ORM-dependent. They all have their own way of doing it, it's not very standard.

If they all implemented my_orm_obj.__iter__, just dict(my_orm_obj) would work. ...but they don't.

Let's wait for @samuelcolvin's opinion.

Copy link
Member

@samuelcolvin samuelcolvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will need to review it much more when I have time, but I'm happy with the principal and I think my small below should probably be addressed before further review.

Thanks or the contribution and the obvious time you've put into thinking about this.

pydantic/main.py Outdated
@@ -228,15 +227,17 @@ class BaseModel(metaclass=MetaModel):
Config = BaseConfig
__slots__ = ('__values__', '__fields_set__')

def __init__(self, **data: Any) -> None:
def __init__(self, _object: Any = None, **data: Any) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to maintain a hard rule that only data is passed to __init__ to end users using pydantic APIs being able to achieve unexpected behaviour by setting special fields.

In this case every use of pydantic models what didn't want to use _object would need a manual check before calling Model(**mydata) that mydata doesn't include _object.

Better to add another classmethod a bit like construct that creates a new model instead without calling __init__ (or maybe does use __init__?), but leave __init__'s signature unchanged.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the problem is that Field's implementation instantiates the class directly (calling __init__), then we would need checks in Fields validators to check which method to call in several places.

But I just realized that this could be achieved with *args, that way we avoid that problem. Let me try implementing that and see if that works.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean MyModel accepting (*args, **kwargs) like dataclasses?

I'm not opposed to that but I think it might end up being a big change. Maybe worth doing it as a separate PR before this?

More generally if you you can use *args, maybe you'd be best to just use pydantic dataclasses for all of this, then Modal could maybe remain (mostly) unchanged?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just updated it. Nope, no special support. It only avoids the problem of Model(_object="foo", b="bar").

*args is used only to avoid having a possible keyword argument. The first value would be extracted from args, only when there's no **data. But it doesn't add any special support for *args. And actually, only the first value is used. But by having *args, it can't be passed as a keyword argument.

So, it would keep working as normally, with Model(a="foo", b="bar").

But when using Model(my_object), my_object would be taken from the *args, only the first element.

So, Model(_object="foo", b="bar") wouldn't be a problem.


I can't use only dataclasses because what I need is to support it when Field.validate creates a model from a class.

Field.validate calls Field._validate_singleton, etc.

They then call Field.apply_validators.

And then those will call (later at some point) cls(data). With cls being a Pydantic model (normally a standard model inheriting from BaseModel).

But by implementing it this way, it has the effect that it can be used in many other cases (with ORMs or other data objects, standard dataclasses, etc).

pydantic/main.py Outdated
model: Union[BaseModel, Type[BaseModel]],
input_data: 'DictStrAny',
raise_exc: bool = True,
return_fields_set: bool = False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

surely we can make validate_model always return fields_set? see #518 (comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! That's easier/simpler.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented it, and I just included a fix for #517 (removing 2 lines). I tested it locally with @sommd's tests implemented in PR #518 and they passed.

I didn't include his tests in this PR because I think he deserves the credit for those tests, for finding the bug and writing the tests for it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, I'll try to merge #518 before this PR so you might get some conflicts.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No prob, I'll fix them.

Still, if this PR is accepted, the implementation of #518 would change (except the tests).

If you merge this first, and then #518 only adds the tests, there wouldn't be any conflicts (as I'm intentionally not adding those tests).

Or you can copy the code in here to #518 (or some part of it).

@@ -593,11 +613,15 @@ def validate_model( # noqa: C901 (ignore complexity)
f'you might need to call {model.__class__.__name__}.update_forward_refs().'
)

value = input_data.get(field.alias, _missing)
value = _get_input_value(input_data, field.alias)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really care about the performance of the new ORM functionality since I won't ever be using an ORM (unless I end up in hell or working at a bank).

But I do care a lot about the performance of pydantic. Will this have a performance impact? If so how much and how can we mitigate it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. I'll implement the changes and then we can check if it's OK for Pydantic's standard use.

And later, after this PR, I or anyone else can do more benchmarks for specific ORMs/tools and see if things can be improved there. But we'll have a baseline to start.

tests/test_edge_cases.py Outdated Show resolved Hide resolved
@samuelcolvin
Copy link
Member

samuelcolvin commented May 18, 2019

You probably want to rebase to avoid conflicts with #518 now it's merged

@tiangolo
Copy link
Member Author

You probably want to rebase to avoid conflicts with #518 now it's merged

Great! Done ✔️

@samuelcolvin
Copy link
Member

Humm, I'm very keen to try and help here, but I'm quite scared of this, I think it will change the behaviour of pydantic in some subtle ways that will be very hard to debug.

I also think it will also damage performance and (worse) make further performance optimisations (eg. rewriting some of validate_model in c) much more difficult.

Please could you provide a self contained example of what you're trying to allow here and I'll take it away and hack at it. Maybe I can find another way, maybe I can't and I'll be persuaded that this solution is ok.

Sorry if that's an annoying response.

@samuelcolvin
Copy link
Member

https://www.youtube.com/watch?v=GiPe1OiKQuk

I get this feeling quite often, here I feel there are edge cases that will break - I can only partially think what they are right now, but I'd rather just avoid them completely.

@tiangolo
Copy link
Member Author

Hehe, I know (unknow).

And from the changes proposed it's not obvious how this all actually affects what I'm trying to achieve.

Let me write something that can show the use case better here.

@tiangolo
Copy link
Member Author

The simplest/most generic use case:

from pydantic import BaseModel
from typing import List


class PetCls:
    def __init__(self, *, name: str, species: str):
        self.name = name
        self.species = species


class PersonCls:
    def __init__(self, *, name: str, age: float = None, pets: List[PetCls]):
        self.name = name
        self.age = age
        self.pets = pets


class Pet(BaseModel):
    name: str
    species: str


class Person(BaseModel):
    name: str
    age: float = None
    # Having recursive models/sub-models is what makes it necessary for Pydantic models
    # To receive arbitrary objects too
    pets: List[Pet]


bones = PetCls(name="Bones", species="dog")

orion = PetCls(name="Orion", species="cat")

alice = PersonCls(name="Alice", age=20, pets=[bones, orion])

alice_model = Person(alice)

assert alice_model.dict() == {
    "name": "Alice",
    "pets": [{"name": "Bones", "species": "dog"}, {"name": "Orion", "species": "cat"}],
    "age": 20.0,
}

@tiangolo
Copy link
Member Author

The closest to my actual use case in FastAPI:

from pydantic import BaseModel, Schema, BaseConfig
from pydantic.json import pydantic_encoder
from pydantic.fields import Field
from typing import List


class PetCls:
    def __init__(self, *, name: str, species: str):
        self.name = name
        self.species = species


class PersonCls:
    def __init__(self, *, name: str, age: float = None, pets: List[PetCls]):
        self.name = name
        self.age = age
        self.pets = pets


class Pet(BaseModel):
    name: str
    species: str


class Person(BaseModel):
    name: str
    age: float = None
    # Having recursive models/sub-models is what makes it necessary for Pydantic models
    # To receive arbitrary objects too
    pets: List[Pet]

# In FastAPI, a response is declared with a Python type
# the same as for Pydantic model's attributes
# This allows using, e.g. List[Person]
response_type = List[Person]

# Because the type is List[Person], FastAPI creates a Field from it, not a Pydantic model
field = Field(
    name="person_list",
    type_=response_type,
    class_validators={},
    default=None,
    required=False,
    model_config=BaseConfig,
    schema=Schema(None),
)

bones = PetCls(name="Bones", species="dog")

orion = PetCls(name="Orion", species="cat")

goose = PetCls(name="Goose", species="flerken")

snowball = PetCls(name="Snowball", species="cat")

alice = PersonCls(name="Alice", age=20, pets=[bones, orion])

bob = PersonCls(name="Blice", age=25, pets=[goose, snowball])

people = [alice, bob]

# Afterwards, FastAPI validates/serializes the data using the field.validate method
# Because it calls the classes (Pet, Person) with the data directly, adding a new 
# method (like some_pydantic_model.from_arbitrary(alice)) wouldn't work. Because
# field.validate() internally ends up calling Person(alice).
values, errors = field.validate(people, {}, loc="root")

print(values)
# [<Person name='Alice' pets=[<Pet name='Bones' species='dog'>, <Pet name='Orion' species='cat'>] age=20.0>, <Person name='Blice' pets=[<Pet name='Goose' species='flerken'>, <Pet name='Snowball' species='cat'>] age=25.0>]

# FastAPI uses its own jsonable_encoder, but it does something very similar to this:
jsonable_people = [pydantic_encoder(p) for p in values]

# This data is what would be returned by an endpoint (converted to JSON)
assert jsonable_people == [
    {
        "name": "Alice",
        "pets": [
            {"name": "Bones", "species": "dog"},
            {"name": "Orion", "species": "cat"},
        ],
        "age": 20.0,
    },
    {
        "name": "Blice",
        "pets": [
            {"name": "Goose", "species": "flerken"},
            {"name": "Snowball", "species": "cat"},
        ],
        "age": 25.0,
    },
]

@tiangolo
Copy link
Member Author

The second example is the closest to my actual use case, simplifying as much as possible, without mixing SQLAlchemy nor FastAPI itself in it.

The first example is what I suspect would be the most obvious "benefit"/use case for other Pydantic users, being able to call:

alice_model = Person(alice)

A side benefit is that even when using ORMs/tools that have recursive/circular references, by declaring models without circular references the original data can be converted to Pydantic models that can be serialized to JSON without the circular reference problem.

It also solves several cases with dynamic attributes, it would work for Python @property decorated methods, etc.


If that still was not desirable/acceptable, another option that I considered was being able to inject the function to use for Field.validate. Having a default of the current one, but allowing to inject a custom one (that would do the check I'm proposing here).

But that would imply changing the signature in all the Field.validate methods: Field._validate_sequence_like, Field._validate_tuple, etc.

I don't know if that would also affect performance (maybe more).

After trying something with that style, I realized that if it was implemented at this level (as proposed in this PR) it would work with fewer changes in Pydantic, and would have that side-effect/"benefit".

@samuelcolvin
Copy link
Member

please review #562 and let me know if that works for you.

It's quite a small change but I hope it covers this use case.

@samuelcolvin
Copy link
Member

replaced by #562

samuelcolvin added a commit that referenced this pull request Jun 6, 2019
* Support ORM objects to 'parse_obj', replace #520

* switch to GetterDict and orm_mode

* tweaks

* update docs

* split tests and add @tiangolo's suggestion

* split tests and add @tiangolo's suggestion

* fix coverage
alexdrydew pushed a commit to alexdrydew/pydantic that referenced this pull request Dec 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants