Pydantic fails to parse object: invents a new irrelevant object instead #2513

HansBrende · 2021-03-12T06:52:18Z

Checks

I added a descriptive title to this issue
I have searched (google, github) for similar issues and couldn't find anything
I have read and followed the docs and still think this is a bug

Bug

Output of python -c "import pydantic.utils; print(pydantic.utils.version_info())":

             pydantic version: 1.8.1
            pydantic compiled: True
                 install path: /Users/hansbrende/opt/miniconda3/envs/testenv/lib/python3.7/site-packages/pydantic
               python version: 3.7.10 | packaged by conda-forge | (default, Feb 19 2021, 15:59:12)  [Clang 11.0.1 ]
                     platform: Darwin-19.0.0-x86_64-i386-64bit
     optional deps. installed: ['typing-extensions']

import pydantic
from typing import Union, List

class Item(pydantic.BaseModel):
    name: str

class Items(pydantic.BaseModel):
    items: Union[Item, List[Item]]

print(Items(items=[{'name': 'MY NAME', 'irrelevant': 'foobar'}]))

OUTPUT:

items=Item(name='irrelevant')

EXPECTED OUTPUT:

items=[Item(name='MY NAME')]

EDIT: After discussion & reflection, I think the presence of the Union type in my example is a red herring. Here is an even more minimal example (with Union absent) that reproduces the fundamental bug here:

import pydantic

class Item(pydantic.BaseModel):
    name: str

print(Item.parse_obj([{'name': 'MY NAME', 'irrelevant': 'foobar'}]))

OUTPUT:

name='irrelevant'

EXPECTED OUTPUT: ValidationError

Note: if I swap the order of fields in the dict to: print(Item.parse_obj([{'irrelevant': 'foobar', 'name': 'MY NAME'}])), I do get a ValidationError as expected:

pydantic.error_wrappers.ValidationError: 1 validation error for Item
name
  field required (type=value_error.missing)

And if I add or remove a field from the dict (so that the dict no longer contains 2 items), I get the expected ValidationError:

pydantic.error_wrappers.ValidationError: 1 validation error for Item
__root__
  Item expected dict not list (type=type_error)

I have proposed a couple quick-fixes for this bug; I can open a PR for one of them if desired. As pointed out in the comments, I realize this will be fixed automatically in v2 by #1268; however, 3 of my proposed fixes are pretty much backwards-compatible with existing behavior, so I would think one of them could be introduced before v2.

The text was updated successfully, but these errors were encountered:

PrettyWood · 2021-03-12T06:55:23Z

Hi @HansBrende
This is a known issue that may be fixed in v1.9
See #2092 (comment) for the proposal

HansBrende · 2021-03-12T06:57:41Z

@PrettyWood Although the issue you linked to also involves unions, I don't believe this is the same issue AT ALL... unless I am totally missing something!

HansBrende · 2021-03-12T07:04:20Z

@PrettyWood to give you a better idea of how weird this is, if I change the dictionary order to put the irrelevant field first:

print(Items(items=[{'irrelevant': 'foobar', 'name': 'MY NAME'}]))

Then the output is correct:

items=[Item(name='MY NAME')]

If I preserve the order but add a second irrelevant key to the dictionary:

print(Items(items=[{'name': 'MY NAME', 'irrelevant': 'foobar', 'irrelevant2': 'hello'}]))

Then the output is still correct:

items=[Item(name='MY NAME')]

PrettyWood · 2021-03-12T07:17:00Z

Yes I know it's the exact same issue that I linked even though it's hard to see the relationship. Pydantic tries to match types in the order of the union. And for this can coerce.
The issue is that it first calls dict() to match a BaseModel.
And dict([{a: 1, b: 2}]) == {a: b}. If the number of items is odd the coercion fails but if it's even it works.
Hence the solution I linked with cases like https://github.com/PrettyWood/pydantic/pull/66/files#diff-bd467a363c9f155a2e6a7716aeca0c351089135a53d09a66138cea4f16506ebdR2728-R2738

HansBrende · 2021-03-12T07:21:11Z

@PrettyWood Gotcha, that makes sense. Thanks for the explanation!

PrettyWood · 2021-03-12T11:49:35Z

I keep it open since I'll need to add extra logic for this in the PR for SmartUnion

HansBrende · 2021-03-12T23:11:27Z

@PrettyWood coming back to this now that I've had some time to mull it over: the real bug in this specific example (in my mind) is not so much the absence of SmartUnion but the ability of pydantic to parse such a contrived Item from a list containing a dict. So if we remove the Union type altogether to avoid the red herring, what this really boils down to is the following bug:

import pydantic

class Item(pydantic.BaseModel):
    name: str

print(Item.parse_obj([{'name': 'MY NAME', 'irrelevant': 'foobar'}]))

OUTPUTS:

name='irrelevant'

EXPECTED OUTPUT: Validation Error

Whereas, swapping the order of the dict to

print(Item.parse_obj([{'irrelevant': 'foobar', 'name': 'MY NAME'}]))

We get the expected validation error, as we should:

pydantic.error_wrappers.ValidationError: 1 validation error for Item
name
  field required (type=value_error.missing)

Current behavior is very much non-deterministic (since, for example, the order of objects in a dict is not guaranteed to be consistent between versions of python, or especially if it is parsed from, say, client-supplied json)... which should qualify as a bug in its own right independent of how unions are implemented.

HansBrende · 2021-03-12T23:53:25Z

@PrettyWood here is a rough quick-fix that would completely eliminate this problem. Rather than unconditionally calling dict(input) on everything, why not do something along these lines?

def convert_to_a_REASONABLE_dict(input) -> dict:
    if not hasattr(input, 'keys'):
       raise ValidationError
    return dict(input)

Quick-fix #2 (preserves ability to coerce lists into dicts):

Or, if preserving the ability to construct a dict from a list of tuples is absolutely necessary (which I don't see why it would be--I really like pydantic's lenient mindset in general, but lists should not be coercible into dicts), then the above function could be made slightly more lenient by doing something like:

def _checked_kv_pair(kv):
    if type(kv) is not dict:
        return kv
    raise TypeError('Expected a 2-tuple, got a dict')

def convert_to_a_REASONABLE_dict(input) -> dict:
    if not hasattr(input, 'keys'):
       input = (_checked_kv_pair(kv) for kv in input)
    return dict(input)

[Note: why did I use the 'keys' attribute to determine whether the input is a mapping? After some experimenting I figured out that the presence of the 'keys' attribute on an object is how the dict() function itself decides whether the object is a mapping or an iterable of 2-tuples. If there is no 'keys' attribute and the object is not iterable, the dict() function will fail with a TypeError: object is not iterable.]

Quick-fix #3 (like #2 but applies to all mappings)

For funsies, here is a somewhat pathological tweak to approach #2 which doesn't limit itself exclusively to dicts, but also avoids the edge-case of a named tuple having an element named keys:

def _checked_kv_pair(kv):
    if not hasattr(kv, 'keys') or not hasattr(kv, '__getitem__'):  # short-circuit: cannot be a valid mapping
        return kv
    # allow edge cases such as a named tuple having an element named "keys":
    k, v = kv
    try:
        if k is kv[0] and v is kv[1]:
            return k, v
    except:
        pass
    raise TypeError(f'Expected a 2-tuple, got a {type(kv)}')

I can submit a PR for one of these approaches if desired...

PrettyWood · 2021-03-13T10:58:46Z

This will be done for v2 #1268

HansBrende · 2021-03-13T19:48:52Z

@PrettyWood Great to hear! In that case, my approach #2 or #3 (or something similar) could potentially be snuck in before 2.0, since it is not really a breaking change, correct?

HansBrende · 2021-03-14T00:18:01Z

Approach #4: use existing `validators.tuple_validator()`!

I just noticed as I was browsing the pydantic.validators source code that you already have a tuple_validator written. So a 4th approach would be to simply tweak your existing dict_validator via your existing tuple_validator (importantly, the tuple_validator does not allow dicts to be coerced into tuples)!

def dict_validator(v: Any) -> Dict[Any, Any]:
    if isinstance(v, dict):
        return v
    try:
        if not hasattr(v, 'keys'):   # <-- only modification to existing code is this extra validation step
            v = (tuple_validator(t) for t in v)  # in pydantic v2, simply replace this line with: raise TypeError
        return dict(v)
    except (TypeError, ValueError):
        raise errors.DictError()

HansBrende · 2021-03-14T00:26:13Z

@PrettyWood @samuelcolvin What do you think? Should I submit a PR for one of these approaches? (And if so, which one?)

dmontagu · 2023-04-28T17:44:44Z

This is fixed in v2

HansBrende added the bug V1 Bug related to Pydantic V1.X label Mar 12, 2021

This was referenced Mar 14, 2021

Make dict validation stricter #1268

Closed

feat: make dict validator strict by default #2211

Closed

dmontagu self-assigned this Apr 28, 2023

dmontagu closed this as completed Apr 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pydantic fails to parse object: invents a new irrelevant object instead #2513

Pydantic fails to parse object: invents a new irrelevant object instead #2513

HansBrende commented Mar 12, 2021 •

edited

PrettyWood commented Mar 12, 2021

HansBrende commented Mar 12, 2021

HansBrende commented Mar 12, 2021 •

edited

PrettyWood commented Mar 12, 2021

HansBrende commented Mar 12, 2021

PrettyWood commented Mar 12, 2021

HansBrende commented Mar 12, 2021 •

edited

HansBrende commented Mar 12, 2021 •

edited

PrettyWood commented Mar 13, 2021

HansBrende commented Mar 13, 2021 •

edited

HansBrende commented Mar 14, 2021 •

edited

HansBrende commented Mar 14, 2021 •

edited

dmontagu commented Apr 28, 2023

Pydantic fails to parse object: invents a new irrelevant object instead #2513

Pydantic fails to parse object: invents a new irrelevant object instead #2513

Comments

HansBrende commented Mar 12, 2021 • edited

Checks

Bug

PrettyWood commented Mar 12, 2021

HansBrende commented Mar 12, 2021

HansBrende commented Mar 12, 2021 • edited

PrettyWood commented Mar 12, 2021

HansBrende commented Mar 12, 2021

PrettyWood commented Mar 12, 2021

HansBrende commented Mar 12, 2021 • edited

HansBrende commented Mar 12, 2021 • edited

Quick-fix #​2 (preserves ability to coerce lists into dicts):

Quick-fix #​3 (like #​2 but applies to all mappings)

PrettyWood commented Mar 13, 2021

HansBrende commented Mar 13, 2021 • edited

HansBrende commented Mar 14, 2021 • edited

Approach #​4: use existing validators.tuple_validator()!

HansBrende commented Mar 14, 2021 • edited

dmontagu commented Apr 28, 2023

HansBrende commented Mar 12, 2021 •

edited

HansBrende commented Mar 12, 2021 •

edited

HansBrende commented Mar 12, 2021 •

edited

HansBrende commented Mar 12, 2021 •

edited

Quick-fix #2 (preserves ability to coerce lists into dicts):

Quick-fix #3 (like #2 but applies to all mappings)

HansBrende commented Mar 13, 2021 •

edited

HansBrende commented Mar 14, 2021 •

edited

Approach #4: use existing `validators.tuple_validator()`!

HansBrende commented Mar 14, 2021 •

edited