Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Provide a discriminated union type (OpenAPI 3) #619

Closed
sm-Fifteen opened this issue Jun 24, 2019 · 59 comments · Fixed by #2336
Closed

[Feature Request] Provide a discriminated union type (OpenAPI 3) #619

sm-Fifteen opened this issue Jun 24, 2019 · 59 comments · Fixed by #2336
Labels
feature request help wanted Pull Request welcome Schema JSON schema

Comments

@sm-Fifteen
Copy link

Feature Request

Pydantic currently has a decent support for union types through the typing.Union type from PEP484, but it does not currently cover all the cases covered by the JSONSchema and OpenAPI specifications, most likely because the two specifications diverge on those points.

OpenAPI supports something similar to tagged unions where a certain field is designated to serve as a "discriminator", which is then matched against literal values to determine which of multiple schemas to use for payload validation. In order to allow Pydantic to support those, I suppose there would have to be a specific type similar to typing.Union in order to specify what discriminator field to use and how to match it. Such a type would then be rendered into a schema object (oneOf) with an OpenAPI discriminator object built into it, as well as correctly validate incoming JSON into the correct type based on the value or the discriminator field. This change would only impact OpenAPI, as JSON schema (draft 7 onwards) uses conditional types instead, which would probably need to be the topic of a different feature request, as both methods appear mutually incompatible.

Implementation ideas

I'd imagine the final result to be something like this.

MyUnion = Union[Foo, Bar]
MyTaggedUnion = TaggedUnion(Union[Foo, Bar], discriminator='type', mapping={'foo': Foo, 'bar': Bar}))

Python doesn't have a feature like TypeScript to let you statically ensure that discriminator exists as a field for all variants of that union, though that shouldn't be a problem since this is going to be raised during validation regardless.

discriminator and mapping could also simply be added to Schema, though I'm not sure about whether it's a good idea to add OpenAPI-specific extensions there.

PEP 593 would also have been a nice alternative, since it would hypothetically allow tagged unions to be implemented as a regular union with annotations specific to Pydantic for that purpose, however it is only still a draft and most likely won't make it until Python 3.9 (if at all).

@samuelcolvin samuelcolvin added feature request help wanted Pull Request welcome Feedback Wanted and removed help wanted Pull Request welcome labels Jun 25, 2019
@samuelcolvin
Copy link
Member

The validation component of this can already by accomplished via the const kwarg to Schema will be permitted as annotations via Literal once #582 is released.

This allows you to add a field which is constrained to one or multiple values then use that field to discriminate between to models.

However at the moment this doesn't extend to a discriminator object in the schema.

Perhaps it would be possible to either:

  • automatically spot the discriminator when building schema, or
  • add a property to Config or a kwarg to Schema to tell pydantic about it

?

@sm-Fifteen
Copy link
Author

Guessing what the discriminator field may be based on const or literal values might lead to unexpected behavior, using Config properties or schema parameters would probably be preferable.

A union field could have a discriminator parameter on its schema object to indicate which field to match against, which would then have to be const/literal values on each of the variants. Each literal value could then be rendered as a separate key/value pair in the mapping dictionary, which would raise an error in case of collision.

This would leave the default case (first key in the discriminator mapping is considered as the default option is nothing matches, IIRC) undefined, though...

I'll see if I can try coming up with something that can be easily validated against while not being too cumbersome to work with.

@sm-Fifteen
Copy link
Author

sm-Fifteen commented Jun 30, 2019

I've taken another look at the OpenAPI spec on this and tried a number of potential syntaxes.

The Literal syntax for Python 3.8 combined with an inheritance model flows really well. Considering discriminator objects are only allowed as fields of non-inlined schema objects in OpenAPI, there should be no issue with allowing them as BaseModel properties only. I'm not entirely sure, however, of how well mypy will tolerate shadowing with narrower types like that.

from pydantic import BaseModel, Schema
from typing import Literal

class Pet(BaseModel):
	petType: str = Schema(...)

	# One or the other
	__discriminator__ = 'petType'

	class Config:
		discriminator = 'petType'

class Cat(Pet):
	petType: Literal['cat', 'robot_cat'] # Should render as a string enum
	name: str = Schema(...)

class Dog(Pet):
	petType: Literal['dog', 'robot_dog']
	bark: str = Schema(...)

There would however be the problem of dealing with versions of Python without support for Literal, which could be accomplished with const as you've suggested, but this would mean having no more than one matched value per type. "Mapping keys MUST be string values, but tooling MAY convert response values to strings for comparison." would indicate that the discriminator property has to be of type string and that the values for subclasses can be validated as enums (of strings) as well.

The OpenAPI spec also mentions that the schema names can act as implicit discriminators, whether or not an explicit mapping is present (so 'Dog' and 'Cat' could technically be valid if there is no other constraint), however I don't really believe supporting that use case is of much concern.

@dmontagu
Copy link
Contributor

Literal is supported in python 3.7 (and 3.6 if I recall correctly) by importing from the package typing_extensions.

@sm-Fifteen
Copy link
Author

sm-Fifteen commented Jul 2, 2019

Ah, well, that's sure to make things a lot simpler, then.

That leaves the question of whether to use model config or __dunder__ attributes to define the discriminator property. Pydantic has some examples for both styles and I can't find any info on when one should be prefered over the other in style guidelines.

EDIT: All __dunder__ attributes and methods are considered reserved by the interpreter and can break without warning, so adding more such attributes may not be such a good idea after all.

I might try filing a PR for this in the coming days if I have time.

@tiangolo
Copy link
Member

I will propose a PR to include additional schema (JSON Schema) data for models.

This will allow you to create the validation required using, e.g. Literal or the Generics functionality @dmontagu added. Then you can do the validation in your model using standard Pydantic validators.

And then you can describe it in the extra schema data (as JSON Schema/OpenAPI schema) as an OpenAPI discriminator, etc. purely for documentation.


I think adding discriminator support directly to Pydantic wouldn't be convenient, as the discriminator ideas are very specific to OpenAPI, are a bit constrained (not that generalizable), and have conflicts with some ideas in JSON Schema (if, else, etc).

But combining these things I described above (with the extra schema I'll PR) you should be able to achieve what you need, with the OpenAPI documentation you expect @sm-Fifteen .

@tiangolo tiangolo mentioned this issue Jul 15, 2019
4 tasks
@tiangolo
Copy link
Member

tiangolo commented Aug 7, 2019

@sm-Fifteen I think you can now perform the validation as you need in a validator and generate the schema the way you want it using schema_extra. Could you check if that solves your use case?

@samuelcolvin
Copy link
Member

Thinking about this more (and running into a similar problem myself), I think some kind of discriminator field is required:

  1. To speed up parsing so that validation only needs to be attempted against one model
  2. To make the error message less verbose - currently if you try to validate against multiple models, all but one are likely to have multiple errors. All those errors get added to the error output making it extremely verbose.

Personally I think this should be done by adding a discriminator argument to Field / Schema (#577) rather than creating a custom Union type that will never play well with mypy or IDEs.

@samuelcolvin samuelcolvin added help wanted Pull Request welcome and removed Feedback Wanted labels Aug 11, 2019
@samuelcolvin
Copy link
Member

So Usage would be something like

class Foo(BaseModel):
    model_type: Literal['foo']

class Bar(BaseModel):
    model_type: Literal['bar']

class MyModel(BaseModel):
    foobar: Union[Foo, Bar] = Field(..., descriminator='model_type')

@sm-Fifteen
Copy link
Author

I think adding discriminator support directly to Pydantic wouldn't be convenient, as the discriminator ideas are very specific to OpenAPI, are a bit constrained (not that generalizable), and have conflicts with some ideas in JSON Schema (if, else, etc).

@tiangolo: I don't know if we could have a solution that would work for both OpenAPI and JSONSchema without losing the benefits of mypy validation. I don't even know if JSON Schema's fully conditional validation system can cleanly be mapped to a type system at all. Being able to specify fields unknown to Pydantic in your generated schema is nice, but discriminators affect validation logic and not being able to get mypy to tell appart the subtypes would be unfortunate.

So Usage would be something like

class Foo(BaseModel):
    model_type: Literal['foo']

class Bar(BaseModel):
    model_type: Literal['bar']

class MyModel(BaseModel):
    foobar: Union[Foo, Bar] = Field(..., descriminator='model_type')

@samuelcolvin: I like the idea and proposed syntax, though I see the "running into a similar problem" issue was closed after you posted your reply, so I'm not sure if you still thing the addition is warranted?

@samuelcolvin
Copy link
Member

I still definitely want descriminator argument to Field.

@samuelcolvin
Copy link
Member

but it might need to be a function or a field name.

@samuelcolvin
Copy link
Member

maybe determinant would be a better name than descriminator? Docs. should probably talk about both at least.

(I've just spent some time looking for this issue as I looked for "determinant" rather than "descriminator")

@sm-Fifteen
Copy link
Author

maybe determinant would be a better name than descriminator? Docs. should probably talk about both at least.

(I've just spent some time looking for this issue as I looked for "determinant" rather than "descriminator")

Considering this would mainly be there to map with OpenAPI's discriminator field, I figure it would make more sense to call it that, unless you're maybe trying to figure out to make it work the same way with JSON Schema's model?

Considering discriminator objects are only allowed as fields of non-inlined schema objects in OpenAPI, there should be no issue with allowing them as BaseModel properties only.

I figured I should restate that part, since it would probably affect the resulting API design.

@samuelcolvin
Copy link
Member

My interest in using descriminators is not related to openAPI or JSONSchema, I want a way of using Unions, where:

  1. Avoid having to run validation for all union types where it's trivial to work out what type the object should have
  2. Avoid the massively long error messages that currently result from failed validation against unions with many options.

I'm fine with descriminator as a name, although I think actually that determinant would be a more appropriate name. As long as both are used in the docs.

Considering discriminator objects are only allowed as fields of non-inlined schema objects in OpenAPI, there should be no issue with allowing them as BaseModel properties only.

Can't say I entirely understand what this means, but sounds good to me.

@sm-Fifteen
Copy link
Author

Nevermind that, I thought the spec essentially said something along the lines of "Only schemas that are in components/schemas can have a discriminator field.", which would have required to have an explicit superclass for all of those Unions.

The spec actually says something more like "The discriminator can only map towards schemas that have IDs (i.e: the subclasses must appear in components/schemas).", which makes a lot more sense considering how the discriminator mapping can only contain references.

MyResponseType:
  discriminator:
    propertyName: petType
    mapping:
      # Notice how there's no $ref, it's just a direct reference to the target type
      dog: '#/components/schemas/Dog'
      monster: 'https://gigantic-server.com/schemas/Monster/schema.json'
  oneOf:
  - $ref: '#/components/schemas/Cat'
  - $ref: '#/components/schemas/Dog'
  - $ref: '#/components/schemas/Lizard'
  - $ref: 'https://gigantic-server.com/schemas/Monster/schema.json'

@ashears
Copy link
Contributor

ashears commented Oct 4, 2019

This feature is highly desired in my teams implementation :)

@Congee
Copy link

Congee commented Dec 19, 2019

I don't get how schema_extra can solve this problem. Is there any workaround that I can override the BaseModel.parse_raw behavior?

@kontsaki
Copy link

@dgasmith

class ActionModel(BaseModel):
    class Config:
        fields = {"action": dict(const=True)}
        extra = "forbid"

class Something(ActionModel):
    action = "something"

ACTIONS = {
    "something": Something,
}

class Action:
    @classmethod
    def __get_validators__(cls):
        yield cls.return_action

    @classmethod
    def return_action(cls, values):
        try:
            action = values["action"]
        except KeyError:
            raise MalformedAction(
                f"Missing required 'action' field for action: {values}"
            )
        try:
            return ACTIONS[action](**values)
        except KeyError:
            raise MalformedAction(f"Incorrect action: {action}")

 class Flow(BaseModel):
    actions: List[Action]

@levrik
Copy link

levrik commented Feb 9, 2021

@samuelcolvin @PrettyWood Any updates on this? I'm in the progress of switching away from GraphQL (for a lot of reasons, unrelated) but having proper support for tagged unions is kinda an issue. Also that tagged unions aren't properly exposed to OpenAPI.

@GavanWilhite
Copy link

Not sure if there are any bounty programs that pydantic supports, but I'd be happy to put up some $ for this

@wgriffin13
Copy link

@ghostbody Inspired our solution. We used type Literal from typing_extensions and an Enum class.

@unique
class SelectFieldTypes(str, Enum):
    date = "date"
    multiple_choice = "multiple-choice"
    multiple_select = "multiple-select"


class DateModel(BaseModel):
    id: str
    type: Literal[SelectFieldTypes.date]
    text: str
    description: str
    required: bool
    value: Optional[str]


class MultipleChoice(BaseModel):
    id: str
    type: Literal[SelectFieldTypes.multiple_choice]
    text: str
    description: str
    required: bool
    options: List[Option]
    value: Optional[str]


class MultipleSelect(BaseModel):
    id: str
    type: Literal[SelectFieldTypes.multiple_select]
    text: str
    description: str
    required: bool
    options: List[Option]
    value: Optional[str]

class SelectTemplate(BaseModel):
    id: str
    text: str
    fields_: List[
        Union[
            DateModel,
            MultipleChoice,
            MultipleSelect,
        ]
    ] = Field(..., alias="fields")

@PrettyWood
Copy link
Member

PrettyWood commented Feb 9, 2021

Hi everyone 😃
I just opened a small PR to see if that's the desired behaviour. If I understand correctly it's really meant to be used with Literal (like in TypeScript) to have

  • faster validation (I hence added directly in the metaclass the discriminator mapping)
  • smaller error
  • generated schema with the discriminator key

If that's all we want I guess my PR should be a good first draft 👍
Feeback more than welcome 😃
Do not hesitate to try it out

@levrik
Copy link

levrik commented Feb 10, 2021

@PrettyWood That PR looks great. I'll give it a try later!

@sm-Fifteen
Copy link
Author

@PrettyWood: That looks like a pretty good way of handling things, having types with literal fields that could be validated just fine without the use of discriminators, and then specifying a discriminator field on some union of those types use on the parent union, in a way that builds on top of regular validation. This is actually similar to what's has been proposed as a potential replacement for discriminator on the OpenAPI side of things (see OAI/OpenAPI-Specification#2143 (comment)).

I'm not seeing any logic or tests for handling such discriminated unions as the root element, though. Is this supported by this PR?

@PrettyWood
Copy link
Member

@sm-Fifteen

    class Cat(BaseModel):
        pet_type: Literal['cat']
        name: str

    class Dog(BaseModel):
        pet_type: Literal['dog']
        name: str

    class Pet(BaseModel):
        __root__: Union[Cat, Dog] = Field(..., discriminator='pet_type')

    my_dog = Pet.parse_obj({'pet_type': 'dog', 'name': 'woof'}).__root__
    assert isinstance(my_dog, Dog)

works. Not sure if that's what you had in mind

@PrettyWood
Copy link
Member

Btw @sm-Fifteen in fact it should be even easier with #2147.
I added two new utils schema and schema_json in the same PR.
Now you can write directly this

Pet = Annotated[Union[Cat, Dog], Field(discriminator='pet_type')]

try:
    parse_obj_as(Pet, {'pet_type': 'dog', 'nam': 'woof'})
except ValidationError as e:
    print(e)
# 1 validation error for ParsingModel[Annotated[Union[__main__.Cat, __main__.Dog], FieldInfo(default=Ellipsis, extra={'discriminator': 'pet_type'})]]
# __root__ (Dog) -> name
#   field required (type=value_error.missing)

print(schema_json(Pet, title='Pet', indent=2))
# {
#   "title": "Pet",
#   "discriminator": {
#     "propertyName": "pet_type",
#     "mapping": {
#       "cat": "#/definitions/Cat",
#       "dog": "#/definitions/Dog"
#     }
#   },
#   "anyOf": [
#     {
#       "$ref": "#/definitions/Cat"
#     },
#     {
#       "$ref": "#/definitions/Dog"
#     }
#   ],
#   "definitions": {
#  ...

@sm-Fifteen
Copy link
Author

sm-Fifteen commented Feb 24, 2021

Btw @sm-Fifteen in fact it should be even easier with #2147.
I added two new utils schema and schema_json in the same PR.

I wasn't super convinced by the other root union example, although I figured it was at least serviceable, but using PEP 593 Annotated types now that they are available really does make for something fairly ergonomic. I like it.

@KevOrr
Copy link

KevOrr commented Mar 11, 2021

Thank you so much for that PR. Testing it out, is seems like you can call Cat(name='Felix') directly without having to supply the tag if you add default values to the tags:

class Cat(BaseModel):
    pet_type: Literal['cat'] = 'cat'
    name: str

class Dog(BaseModel):
    pet_type: Literal['dog'] = 'dog'
    name: str

Pet = Annotated[Union[Cat, Dog], Field(discriminator='pet_type')]

# parse_obj_as(Pet, dict(name='Felix'))  # fails as expected
parse_obj_as(Pet, dict(pet_type='cat', name='Felix'))
Cat(name='Felix')

Is this an expected use case or are there problems with this pattern?

@PrettyWood
Copy link
Member

@KevOrr Yep the discriminator is used for the schema and to improve validation (faster and more explicit)
But IMO it's normal to be able to set the default value of your discriminator field to instantiate directly your submodels easily

@LyleScott
Copy link

I've been using this branch for weeks to add a discriminator to several types and it seems like it would be a handy compliment to Union. Is there more work to be done to solidify some part of it? I'm here to help if it is possible. The branch is falling further behind by the day and it seems like such a great feature.

@tgpfeiffer
Copy link

#619 (comment)
This is great, thank you! @PrettyWood! Works for me like this even with a plain Union in pydantic 1.8.2.

However, with this code, it seems as if (1) (even non-pre) root validators on the non-matching classes are still executed, though, and (2) they don't get passed the discriminator field value:

    class Cat(BaseModel):
        pet_type: Literal['cat']
        name: str
        age: int
        
        @root_validator
        def height_constraints(cls, values: Dict[str, Any]) -> Dict[str, Any]:
            print(values)
            assert str(values["age"]) in values["name"]  # silly check
            return values

    class Dog(BaseModel):
        pet_type: Literal['dog']
        name: str

    class Pet(BaseModel):
        __root__: Union[Cat, Dog] = Field(..., discriminator='pet_type')

    my_dog = Pet.parse_obj({'pet_type': 'dog', 'name': 'woof'}).__root__
    assert isinstance(my_dog, Dog)

Running this code prints:

{'name': 'woof'}
Traceback (most recent call last):
  File "test_pyd.py", line 23, in <module>
    my_dog = Pet.parse_obj({'pet_type': 'dog', 'name': 'woof'}).__root__
  File "pydantic/main.py", line 578, in pydantic.main.BaseModel.parse_obj
  File "pydantic/main.py", line 404, in pydantic.main.BaseModel.__init__
  File "pydantic/main.py", line 1040, in pydantic.main.validate_model
  File "pydantic/fields.py", line 723, in pydantic.fields.ModelField.validate
  File "pydantic/fields.py", line 899, in pydantic.fields.ModelField._validate_singleton
  File "pydantic/fields.py", line 723, in pydantic.fields.ModelField.validate
  File "pydantic/fields.py", line 906, in pydantic.fields.ModelField._validate_singleton
  File "pydantic/fields.py", line 913, in pydantic.fields.ModelField._apply_validators
  File "pydantic/class_validators.py", line 310, in pydantic.class_validators._generic_validator_basic.lambda12
  File "pydantic/main.py", line 735, in pydantic.main.BaseModel.validate
  File "pydantic/main.py", line 404, in pydantic.main.BaseModel.__init__
  File "pydantic/main.py", line 1066, in pydantic.main.validate_model
  File "test_pyd.py", line 13, in height_constraints
    assert str(values["age"]) in values["name"]  # silly check
KeyError: 'age'  

Does anyone know if that is being addressed in #2336?

@PrettyWood
Copy link
Member

Hi @tgpfeiffer
I can't reproduce with your code snippet and the branch f/discriminated-union.
The whole purpose of the discriminated union is to NOT validate extra classes so if this is actually happening I will definitely look into it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request help wanted Pull Request welcome Schema JSON schema
Projects
None yet
Development

Successfully merging a pull request may close this issue.