Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pydantic fails to parse object: invents a new irrelevant object instead #2513

Closed
3 tasks done
HansBrende opened this issue Mar 12, 2021 · 13 comments
Closed
3 tasks done
Assignees
Labels
bug V1 Bug related to Pydantic V1.X

Comments

@HansBrende
Copy link

HansBrende commented Mar 12, 2021

Checks

  • I added a descriptive title to this issue
  • I have searched (google, github) for similar issues and couldn't find anything
  • I have read and followed the docs and still think this is a bug

Bug

Output of python -c "import pydantic.utils; print(pydantic.utils.version_info())":

             pydantic version: 1.8.1
            pydantic compiled: True
                 install path: /Users/hansbrende/opt/miniconda3/envs/testenv/lib/python3.7/site-packages/pydantic
               python version: 3.7.10 | packaged by conda-forge | (default, Feb 19 2021, 15:59:12)  [Clang 11.0.1 ]
                     platform: Darwin-19.0.0-x86_64-i386-64bit
     optional deps. installed: ['typing-extensions']
import pydantic
from typing import Union, List

class Item(pydantic.BaseModel):
    name: str

class Items(pydantic.BaseModel):
    items: Union[Item, List[Item]]

print(Items(items=[{'name': 'MY NAME', 'irrelevant': 'foobar'}]))

OUTPUT:

items=Item(name='irrelevant')

EXPECTED OUTPUT:

items=[Item(name='MY NAME')]

EDIT: After discussion & reflection, I think the presence of the Union type in my example is a red herring. Here is an even more minimal example (with Union absent) that reproduces the fundamental bug here:

import pydantic

class Item(pydantic.BaseModel):
    name: str

print(Item.parse_obj([{'name': 'MY NAME', 'irrelevant': 'foobar'}]))

OUTPUT:

name='irrelevant'

EXPECTED OUTPUT: ValidationError

Note: if I swap the order of fields in the dict to: print(Item.parse_obj([{'irrelevant': 'foobar', 'name': 'MY NAME'}])), I do get a ValidationError as expected:

pydantic.error_wrappers.ValidationError: 1 validation error for Item
name
  field required (type=value_error.missing)

And if I add or remove a field from the dict (so that the dict no longer contains 2 items), I get the expected ValidationError:

pydantic.error_wrappers.ValidationError: 1 validation error for Item
__root__
  Item expected dict not list (type=type_error)

I have proposed a couple quick-fixes for this bug; I can open a PR for one of them if desired. As pointed out in the comments, I realize this will be fixed automatically in v2 by #1268; however, 3 of my proposed fixes are pretty much backwards-compatible with existing behavior, so I would think one of them could be introduced before v2.

@HansBrende HansBrende added the bug V1 Bug related to Pydantic V1.X label Mar 12, 2021
@PrettyWood
Copy link
Member

Hi @HansBrende
This is a known issue that may be fixed in v1.9
See #2092 (comment) for the proposal

@HansBrende
Copy link
Author

@PrettyWood Although the issue you linked to also involves unions, I don't believe this is the same issue AT ALL... unless I am totally missing something!

@HansBrende
Copy link
Author

HansBrende commented Mar 12, 2021

@PrettyWood to give you a better idea of how weird this is, if I change the dictionary order to put the irrelevant field first:

print(Items(items=[{'irrelevant': 'foobar', 'name': 'MY NAME'}]))

Then the output is correct:

items=[Item(name='MY NAME')]

If I preserve the order but add a second irrelevant key to the dictionary:

print(Items(items=[{'name': 'MY NAME', 'irrelevant': 'foobar', 'irrelevant2': 'hello'}]))

Then the output is still correct:

items=[Item(name='MY NAME')]

@PrettyWood
Copy link
Member

Yes I know it's the exact same issue that I linked even though it's hard to see the relationship. Pydantic tries to match types in the order of the union. And for this can coerce.
The issue is that it first calls dict() to match a BaseModel.
And dict([{a: 1, b: 2}]) == {a: b}. If the number of items is odd the coercion fails but if it's even it works.
Hence the solution I linked with cases like https://github.com/PrettyWood/pydantic/pull/66/files#diff-bd467a363c9f155a2e6a7716aeca0c351089135a53d09a66138cea4f16506ebdR2728-R2738

@HansBrende
Copy link
Author

@PrettyWood Gotcha, that makes sense. Thanks for the explanation!

@PrettyWood
Copy link
Member

I keep it open since I'll need to add extra logic for this in the PR for SmartUnion

@HansBrende
Copy link
Author

HansBrende commented Mar 12, 2021

@PrettyWood coming back to this now that I've had some time to mull it over: the real bug in this specific example (in my mind) is not so much the absence of SmartUnion but the ability of pydantic to parse such a contrived Item from a list containing a dict. So if we remove the Union type altogether to avoid the red herring, what this really boils down to is the following bug:

import pydantic

class Item(pydantic.BaseModel):
    name: str

print(Item.parse_obj([{'name': 'MY NAME', 'irrelevant': 'foobar'}]))

OUTPUTS:

name='irrelevant'

EXPECTED OUTPUT: Validation Error

Whereas, swapping the order of the dict to

print(Item.parse_obj([{'irrelevant': 'foobar', 'name': 'MY NAME'}]))

We get the expected validation error, as we should:

pydantic.error_wrappers.ValidationError: 1 validation error for Item
name
  field required (type=value_error.missing)

Current behavior is very much non-deterministic (since, for example, the order of objects in a dict is not guaranteed to be consistent between versions of python, or especially if it is parsed from, say, client-supplied json)... which should qualify as a bug in its own right independent of how unions are implemented.

@HansBrende
Copy link
Author

HansBrende commented Mar 12, 2021

@PrettyWood here is a rough quick-fix that would completely eliminate this problem. Rather than unconditionally calling dict(input) on everything, why not do something along these lines?

def convert_to_a_REASONABLE_dict(input) -> dict:
    if not hasattr(input, 'keys'):
       raise ValidationError
    return dict(input)

Quick-fix #​2 (preserves ability to coerce lists into dicts):

Or, if preserving the ability to construct a dict from a list of tuples is absolutely necessary (which I don't see why it would be--I really like pydantic's lenient mindset in general, but lists should not be coercible into dicts), then the above function could be made slightly more lenient by doing something like:

def _checked_kv_pair(kv):
    if type(kv) is not dict:
        return kv
    raise TypeError('Expected a 2-tuple, got a dict')

def convert_to_a_REASONABLE_dict(input) -> dict:
    if not hasattr(input, 'keys'):
       input = (_checked_kv_pair(kv) for kv in input)
    return dict(input)

[Note: why did I use the 'keys' attribute to determine whether the input is a mapping? After some experimenting I figured out that the presence of the 'keys' attribute on an object is how the dict() function itself decides whether the object is a mapping or an iterable of 2-tuples. If there is no 'keys' attribute and the object is not iterable, the dict() function will fail with a TypeError: object is not iterable.]

Quick-fix #​3 (like #​2 but applies to all mappings)

For funsies, here is a somewhat pathological tweak to approach #​2 which doesn't limit itself exclusively to dicts, but also avoids the edge-case of a named tuple having an element named keys:

def _checked_kv_pair(kv):
    if not hasattr(kv, 'keys') or not hasattr(kv, '__getitem__'):  # short-circuit: cannot be a valid mapping
        return kv
    # allow edge cases such as a named tuple having an element named "keys":
    k, v = kv
    try:
        if k is kv[0] and v is kv[1]:
            return k, v
    except:
        pass
    raise TypeError(f'Expected a 2-tuple, got a {type(kv)}')

I can submit a PR for one of these approaches if desired...

@PrettyWood
Copy link
Member

This will be done for v2 #1268

@HansBrende
Copy link
Author

HansBrende commented Mar 13, 2021

@PrettyWood Great to hear! In that case, my approach #​2 or #​3 (or something similar) could potentially be snuck in before 2.0, since it is not really a breaking change, correct?

@HansBrende
Copy link
Author

HansBrende commented Mar 14, 2021

Approach #​4: use existing validators.tuple_validator()!

I just noticed as I was browsing the pydantic.validators source code that you already have a tuple_validator written. So a 4th approach would be to simply tweak your existing dict_validator via your existing tuple_validator (importantly, the tuple_validator does not allow dicts to be coerced into tuples)!

def dict_validator(v: Any) -> Dict[Any, Any]:
    if isinstance(v, dict):
        return v
    try:
        if not hasattr(v, 'keys'):   # <-- only modification to existing code is this extra validation step
            v = (tuple_validator(t) for t in v)  # in pydantic v2, simply replace this line with: raise TypeError
        return dict(v)
    except (TypeError, ValueError):
        raise errors.DictError()

@HansBrende
Copy link
Author

HansBrende commented Mar 14, 2021

@PrettyWood @samuelcolvin What do you think? Should I submit a PR for one of these approaches? (And if so, which one?)

@dmontagu
Copy link
Contributor

This is fixed in v2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug V1 Bug related to Pydantic V1.X
Projects
None yet
Development

No branches or pull requests

3 participants