Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First instantiation of generic and nested Union model defines type coercion of models instantiated later #4519

Closed
7 of 15 tasks
sveinugu opened this issue Sep 14, 2022 · 9 comments
Assignees
Labels
bug V1 Bug related to Pydantic V1.X

Comments

@sveinugu
Copy link
Contributor

Initial Checks

  • I have searched GitHub for a duplicate issue and I'm sure this is something new
  • I have searched Google & StackOverflow for a solution and couldn't find anything
  • I have read and followed the docs and still think this is a bug
  • I am confident that the issue is with pydantic (not my code, or another library in the ecosystem like FastAPI or mypy)

Description

Reopening of nested variant of bug described in issue #4474, for now mostly as a reference point for other repos. The basic variant of the bug was fixed in version 1.10.2.

Example code:

from typing import Generic, TypeVar, List, Union

from pydantic.generics import GenericModel

NumberT = TypeVar('NumberT')


class NumberModel(GenericModel, Generic[NumberT]):
    data: NumberT


list_of_floats_or_ints_model = NumberModel[List[Union[float, int]]]
print(list_of_floats_or_ints_model(data=['1']).data)

list_of_ints_or_floats_model = NumberModel[List[Union[int, float]]]
print(list_of_ints_or_floats_model(data=['1']).data)

Still prints:

[1.0]
[1.0]

While changing the order of the models prints:

[1]
[1]

See this #4474 (comment) for more info on the remaining issue, which is not trivial to fix and possibly depends on changes in the typing package.

A skipped test documenting the issue was also added to the pydantic source code in version 1.10.2.

Example Code

No response

Python, Pydantic & OS Version

pydantic version: 1.10.2
            pydantic compiled: True
                 install path: /Users/sveinugu/Library/Caches/pypoetry/virtualenvs/unifair-myuXak-6-py3.10/lib/python3.10/site-packages/pydantic
               python version: 3.10.4 | packaged by conda-forge | (main, Mar 24 2022, 17:42:03) [Clang 12.0.1 ]
                     platform: macOS-11.2.3-arm64-arm-64bit
     optional deps. installed: ['typing-extensions']

Affected Components

@sveinugu sveinugu added bug V1 Bug related to Pydantic V1.X unconfirmed Bug not yet confirmed as valid/applicable labels Sep 14, 2022
@samuelcolvin samuelcolvin removed the unconfirmed Bug not yet confirmed as valid/applicable label Sep 16, 2022
@samuelcolvin
Copy link
Member

Thanks 👍.

Obviously not ideal, but let us know if you think there's anything can be easily be fixed here, or if we just need to wait for v2?

@sveinugu
Copy link
Contributor Author

sveinugu commented Sep 16, 2022

Some immediate thoughts, given my limited knowledge of the pydantic source code:

I believe one workaround could be to define a variant of Union within pydantic, e.g. OrderedUnion. This would also make more explicit the fact that you base type coercion on the order of the types in a Union, even though Union is considered unordered by Python. However, I don't know how much work that would be or how important it is in the grand scheme of things.

Alternatively, you might want to reconsider the type coercion logic, however I don't have any ideas at this moment of good alternatives.

I do not have enough overview of v2 to see if this would fix the issue in itself. As the typing package itself makes use of a similar type of caching as in GenericModel, the same issue appears in pure Python (see #4474 (comment)). However, this is not a bug in Python as Unions are defined to be unordered. I don't see how v2 in itself would fix this (I'm thinking here mainly of the Rust reimplementation), as the user would still provide pydantic with identical copies of the same Union due to caching in typing, even though they are ordered differently.

@sveinugu
Copy link
Contributor Author

sveinugu commented Sep 16, 2022

Come to think of it now, the pydantic bug as described here is relatively minor as you would need to:

  1. use GenericModel
  2. have type coercion logic that depends on the order of types in a Union
  3. that Union needs to be nested below the top type

However, the Python issue happens independently of the usage of GenericModel (point 1.). This makes the Python issue in practice a larger issue than the pydantic GenericModel caching issue described here, and even more so because it is not really a bug at all and thus more difficult to handle. The combination of only 2. and 3. is not that rare, I would assume.

@sveinugu
Copy link
Contributor Author

sveinugu commented Sep 16, 2022

An illustration on how the Python issue might affect pydantic even if it is not due to a pydantic bug per se:

from pydantic import BaseModel
from typing import List, Union

# import something_else

JSON_List = List[Union[int, float, bool, str]]


class JSON(BaseModel):
    json_list: JSON_List


assert JSON(json_list=('abc', '1', '2.0', 'True')).json() == '{"json_list": ["abc", 1, 2.0, true]}'

Here, the types are ordered at least relatively sensibly. But let's say the contents of something_else.py is as follows:

from typing import List, Union

SomethingElse = List[Union[bool, int, float, str]]

Then removing the comment on the import line in the first file makes the assertion fail ('1' will be parsed as True). This is not really a bug in either Python or pydantic but instead an unfortunate consequence of a pydantic design choice combined with a casual decision by the Python devs (see python/cpython#86483).

@sydney-runkle
Copy link
Member

We’re no longer actively developing Pydantic V1 (although it will continue to receive security fixes for the next year or so), so we’re closing issues solely related to V1. If you really think this shouldn’t be closed, please comment or create a new issue 🚀.

@sveinugu
Copy link
Contributor Author

sveinugu commented Dec 7, 2023

@sydney-runkle I don't think this should be closed, as it is still relevant for Pydantic v2. Perhaps we should even open another issue, as there are really two issues at play here. The first issue is really a core Python issue that has consequences for pydantic. The problem is that type caching in the typing library does not maintain the order of the type arguments, e.g.:

from typing import List, Optional, Union, get_args

print(get_args(List[Union[float, int]]))
print(get_args(List[Union[int, float]]))
print(get_args(List[float | int]))
print(get_args(List[int | float]))
print(get_args(Optional[Union[float, int]]))
print(get_args(Optional[Union[int, float]]))
print(get_args(Optional[float | int]))
print(get_args(Optional[int | float]))

Prints:

(typing.Union[float, int],)
(typing.Union[float, int],)
(typing.Union[float, int],)
(typing.Union[float, int],)
(<class 'float'>, <class 'int'>, <class 'NoneType'>)
(<class 'float'>, <class 'int'>, <class 'NoneType'>)
(<class 'float'>, <class 'int'>, <class 'NoneType'>)
(<class 'float'>, <class 'int'>, <class 'NoneType'>)

However, newer syntax variations do not have this issue, e.g.:

from typing import Union, get_args

print(get_args(list[Union[float, int]]))
print(get_args(list[Union[int, float]]))
print(get_args(list[float | int]))
print(get_args(list[int | float]))
print(get_args(float | int | None))
print(get_args(int | float | None))

Prints:

(typing.Union[float, int],)
(typing.Union[int, float],)
(float | int,)
(int | float,)
(<class 'float'>, <class 'int'>, <class 'NoneType'>)
(<class 'int'>, <class 'float'>, <class 'NoneType'>)

There are a couple of related threads in the cpython repo:

AFAIK, there is no issue in the pydantic repo to A) follow python development, B) add documentation in pydantic about the issue to help developers figure out the reason for strange behaviour. An example for such strange behaviour is as follows (in pydantic 2.5.2):

from typing import List
from pydantic import BaseModel

class A(BaseModel):
    list_of_numbers: List[int | float]

class B(BaseModel):
    list_of_numbers: List[float | int]

print(A(list_of_numbers=['1']), B(list_of_numbers=['1']))

Prints:

list_of_numbers=[1] list_of_numbers=[1]

While moving B before A in the above code, and with no other syntax change, prints:

list_of_numbers=[1.0] list_of_numbers=[1.0]

The issue is only related to the order in which the Python process reads the code, also across files, potentially giving rise to very strange errors. Hence, I think the issue should be documented in the pydantic docs with a recommendation to make use of new type notation if allowed by the Python version (list[], dict[] from Python 3.9 onwards, X | Y from Python 3.10).

@sveinugu
Copy link
Contributor Author

sveinugu commented Dec 7, 2023

The other issue is a pydantic one, which is still a bug in pydantic v2 (and even a regression). If we now focus on the new type notation (so that the broken behaviour in Python documented above in #4519 (comment) does not interfere) there are still issues with the pydantic caching of generic types. For example:

from typing import Generic, TypeVar, List, Union
from pydantic import BaseModel

NumberT = TypeVar('NumberT')

class NumberModel(BaseModel, Generic[NumberT]):
    data: NumberT

NumberModel[list[float | int]](data=['1']), NumberModel[list[int | float]](data=['1'])

produces (interactively):

(NumberModel[list[Union[float, int]]](data=[1.0]),
 NumberModel[list[Union[float, int]]](data=[1.0]))

While switching the order of the items in the last line produces (in a new process):

(NumberModel[list[Union[int, float]]](data=[1]),
 NumberModel[list[Union[int, float]]](data=[1]))

list[Union[]] has the same broken behaviour

Even worse is that it seems there is a regression (still in v2) of the simpler variant of the bug that was fixed for pydantic v1 in #4474. For example:

 NumberModel[int | float](data='1'), NumberModel[float | int](data='1')

produces:

(NumberModel[Union[int, float]](data=1),
 NumberModel[Union[int, float]](data=1))

While, strangely, Union[] now works:

NumberModel[Union[int, float]](data='1'), NumberModel[Union[float, int]](data='1')

gives:

(NumberModel[Union[int, float]](data=1),
 NumberModel[Union[float, int]](data=1.0))

Both of these latter two variants worked in pydantic v1.

@sveinugu
Copy link
Contributor Author

sveinugu commented Dec 7, 2023

I can also add that the issue IS less problematic in v2, as smart_union=True is now default behaviour. Still, the above failing code examples do not include any config changes, so the order of unions is still important also in a default setup.

@sveinugu
Copy link
Contributor Author

sveinugu commented Jan 30, 2024

@sydney-runkle I did not receive a reply on my comments objecting to the closing of this issue. This just caused a bug in my code due to the fact that pydantic did not differentiate between two models like below:

from typing import Generic, TypeVar
from pydantic import BaseModel

T = TypeVar('T')

class MyModel(BaseModel, Generic[T]):
    data: T

print(MyModel[tuple[str | bytes, str]])
print(MyModel[tuple[bytes | str, str]])

This prints:

<class '__main__.MyModel[tuple[Union[str, bytes], str]]'>
<class '__main__.MyModel[tuple[Union[str, bytes], str]]'>

which caused a hard-to-debug issue in my code.

sveinugu added a commit to fairtracks/omnipy that referenced this issue Jan 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug V1 Bug related to Pydantic V1.X
Projects
None yet
Development

No branches or pull requests

4 participants