Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple issues when trying to wrap an existing dataclass as pydantic #2555

Closed
3 tasks done
yoavg opened this issue Mar 21, 2021 · 9 comments · Fixed by #2557
Closed
3 tasks done

Multiple issues when trying to wrap an existing dataclass as pydantic #2555

yoavg opened this issue Mar 21, 2021 · 9 comments · Fixed by #2557
Labels
bug V1 Bug related to Pydantic V1.X

Comments

@yoavg
Copy link

yoavg commented Mar 21, 2021

Checks

  • I added a descriptive title to this issue
  • I have searched (google, github) for similar issues and couldn't find anything
  • I have read and followed the docs and still think this is a bug

Bug

Output of python -c "import pydantic.utils; print(pydantic.utils.version_info())":

             pydantic version: 1.8.1
            pydantic compiled: True
                 install path: /Users/yogo/.local/share/virtualenvs/server-Fv_PcELd/lib/python3.7/site-packages/pydantic
               python version: 3.7.9 (default, Nov 20 2020, 18:45:38)  [Clang 12.0.0 (clang-1200.0.32.27)]
                     platform: Darwin-19.6.0-x86_64-i386-64bit
     optional deps. installed: ['typing-extensions']

We are trying to use pydantic (as part of FastAPI) with our own dataclass-based datamodel, using pydantic's automatic conversion from dataclasses. While doing so, we encountered several places where the code crashes in a cryptic way. Three of them we managed to resolve (by changing our code), but not the fourth one.

This is (a stripped down version of) our dataclasses (definitions.py). (+ See a more minimal one at the end.)

from __future__ import annotations
from typing import List, Dict, Optional, Union, Sequence, Mapping
from dataclasses import dataclass, field

TokenIndex = int

PrimitiveMetadataValue = Union[str, int, float, bool]
MetadataValue = Union[PrimitiveMetadataValue, Sequence[PrimitiveMetadataValue]]

@dataclass(frozen=True)
class Span:
    first: TokenIndex
    last: TokenIndex

@dataclass(frozen=True)
class LabeledSpan(Span):
    label: str

@dataclass(frozen=True)
class BinaryRelation:
    subject: LabeledSpan
    object: LabeledSpan
    label: str

@dataclass(frozen=True)
class Entity(LabeledSpan):
    priority: float = field(default=0, compare=False, hash=False)
    source: Optional[str] = None

@dataclass(frozen=True)
class KbEntity(LabeledSpan):
    ontology: str
    score: Optional[float] = None
    canonical: Optional[str] = None

@dataclass(frozen=True)
class Event(LabeledSpan):
    args: List[LabeledSpan]
    trigger: Span

@dataclass(frozen=True)
class Edge:
    source: TokenIndex
    target: TokenIndex
    label: str

@dataclass(frozen=True)
class Graph:
    roots: List[int]
    edges: List[Edge]

@dataclass(frozen=True, order=True)
class SentenceSource:
    filename: str
    offset: int

@dataclass(frozen=True)
class Sentence:
    words: List[str]
    document_start_char_offsets: List[int]
    document_end_char_offsets: List[int]
    lemmas: List[str]
    tags: List[str]

    overlapping_entities: List[Entity]
    kb_entities: List[List[KbEntity]]
    chunks: List[LabeledSpan]
    graphs: Dict[str, Graph]
    relations: Dict[str, List[BinaryRelation]]
    events: List[Event]
    source: Optional[SentenceSource] = field(default=None, compare=False, hash=False)
    metadata: Dict[str, MetadataValue] = field(default_factory=dict, compare=False, hash=False)

And this is the minimal pydantic code:

import pydantic
from definitions import Sentence

class M(pydantic.BaseModel):
    s: Sentence #pydantic.dataclasses.dataclass(Sentence)
    # the commented out option fails similarly.
     
print(M.schema())

On the initial run, we get:

TypeError: non-default argument 'metadata' follows default argument

The error message is clear, and the issue is easily resolved by switching the order of the source and metadata fields in Sentence.
However, we think it might be a bug to consider default_factory as a non-default argument.

After switching the order, we now get:

TypeError: cannot inherit non-frozen dataclass from a frozen one

This has to do with nested frozen dataclasses, and I opened a separate issue about it (#2541).

After removing all frozen annotations, we now get the really cryptic message:

TypeError: issubclass() arg 1 must be a class

This is due to using type-aliases (TokenIndex = int) and then using them as field types.
After removing these as well, we get:

  File "pydantic/main.py", line 704, in pydantic.main.BaseModel.schema
  File "pydantic/schema.py", line 167, in pydantic.schema.model_schema
  File "pydantic/schema.py", line 548, in pydantic.schema.model_process_schema
  File "pydantic/schema.py", line 589, in pydantic.schema.model_type_schema
  File "pydantic/schema.py", line 241, in pydantic.schema.field_schema
  File "pydantic/schema.py", line 495, in pydantic.schema.field_type_schema
  File "pydantic/schema.py", line 841, in pydantic.schema.field_singleton_schema
  File "pydantic/schema.py", line 548, in pydantic.schema.model_process_schema
  File "pydantic/schema.py", line 589, in pydantic.schema.model_type_schema
  File "pydantic/schema.py", line 241, in pydantic.schema.field_schema
  File "pydantic/schema.py", line 440, in pydantic.schema.field_type_schema
  File "pydantic/schema.py", line 773, in pydantic.schema.field_singleton_schema
  File "pydantic/schema.py", line 667, in pydantic.schema.field_singleton_sub_fields_schema
  File "pydantic/schema.py", line 495, in pydantic.schema.field_type_schema
  File "pydantic/schema.py", line 839, in pydantic.schema.field_singleton_schema
KeyError: <class 'pydantic.dataclasses.LabeledSpan'>

This one we couldn't solve by changing our code, though I will note that commenting out some fields did help.
However, the issue is not about any specific field, but about their combinations. Here are minimal Sentence classes the exhibits this behavior:

@dataclass
class Sentence(Transferable):
    chunks: List[LabeledSpan]
    events: List[Event]

Having either chunks or events by themselves, do work. Having BinaryRelation in itself fails, presumably because it uses two LabeledSpans internally.

@dataclass
class Sentence(Transferable):
    relations: Dict[str, List[BinaryRelation]]

Below is a minimal definitions.py file with a Sentence class that exhibits the final issue:

from dataclasses import dataclass

@dataclass
class Span:
   first: int
   last: int

@dataclass
class LabeledSpan(Span):
   label: str

@dataclass
class BinaryRelation:
   subject: LabeledSpan
   object: LabeledSpan
   label: str

@dataclass
class Sentence:
   relations: BinaryRelation

We will be happy to provide further details, and/or help to resolve these issues.

@yoavg yoavg added the bug V1 Bug related to Pydantic V1.X label Mar 21, 2021
@PrettyWood
Copy link
Member

Hi @yoavg
FYI I started to work again on the whole stdlib dataclass to validated dataclass #2557. Feedback more than welcome! 🙏

@yoavg
Copy link
Author

yoavg commented Mar 22, 2021

Thanks @PrettyWood !
I would just say that our primary interest is not validation---we actually got to pydantic through FastAPI, and would like to expose our existing data model through FastAPI, which in turn requires pydantic (with a validation as a nice added side effect).

So I don't have much to offer in terms of feedback on the API, short of listing the areas that we found that currently break the pydantic conversion code (which are also listed above):

  • handling of type aliases
  • nested frozen data-classes
  • repetition of the same dataclass-type in multiple levels of the hierarchy
  • default_factory vs default

We just implemented a working workaround that does less than what you desire to achieve, but works for our needs of converting our dataclasses hierarchy to pydantic objects, using create_model, and a bunch of python "reflection tricks". I'll be happy to share this (rather ugly) code if you are interested.

@yoavg
Copy link
Author

yoavg commented Mar 28, 2021

I see that #2541 was closed, but note my new comment there: the proposed solution doesn't work, the issue remains and is real.

@PrettyWood
Copy link
Member

I reopened it

@yoavg
Copy link
Author

yoavg commented Mar 28, 2021

Thanks! Sorry for notifying about this also here and not only there, it is just that I don't know how your github project is set up, and do know that in our repos, I never notice responses on closed issues... So just wanted to be safe.

@PrettyWood
Copy link
Member

@yoavg Btw I took the whole code in description and just switched dataclasses.dataclass with pydantic.dataclasses.dataclass with the refactoring in #2557 and it works as expected :)
If you have more code to test before this is reviewed and hopefully merged it'd be great!

@yoavg
Copy link
Author

yoavg commented Apr 13, 2021

@PrettyWood Yes, if we use pydantic dataclasses this will work, but we are interfacing with an existing code-base that uses "regular" dataclasses, which cannot change. So we need the ability to wrap regular dataclasses. I assume this will become a common use-case as people start using FastAPI to expose their existing domain models as web endpoints.

@yoavg
Copy link
Author

yoavg commented Apr 13, 2021

btw, it would have made it easier to switch from dataclasses to pydantic.dataclasses if it weren't for the runtime performance penalty due to validation, which is really not needed for the internal apis, and is substantial.

@PrettyWood
Copy link
Member

No worries I just said this to explain it will work ! You can keep your raw dataclasses ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug V1 Bug related to Pydantic V1.X
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants