Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add oneOf+const JSON Schema Option for Literals #9029

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

rmehyde
Copy link

@rmehyde rmehyde commented Mar 16, 2024

Change Summary

  • Adds a json_schema_literal_type config field with a oneof-const option to generate properties using oneOf: [{'const': value'}...] instead of the default enum: [value...]
  • In the oneof-const option, if multiple values are sourced from Enums with __doc__ attributes on their cases, use those descriptions in the oneOf options

Related issue number

fix #8888

Checklist

  • The pull request title is a good summary of the changes - it will be used in the changelog
  • Unit tests for the changes exist
  • Tests pass on CI
  • Documentation reflects the changes where applicable
  • My PR is ready to review, please add a comment including the phrase "please review" to assign reviewers

Selected Reviewer: @samuelcolvin

Copy link

codspeed-hq bot commented Mar 16, 2024

CodSpeed Performance Report

Merging #9029 will not alter performance

Comparing rmehyde:rmehyde/enum-docstrings (7bdf02f) with main (f794f65)

Summary

✅ 10 untouched benchmarks

Comment on lines 742 to 743
# TODO (reesehyde): do we want this condition or not? why do we still produce 'enum' for single values?
if len(expected) > 1:
Copy link
Author

@rmehyde rmehyde Mar 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems intentional that the "enum": [<value>] tag gets added as well as "const": <value> when there's only a single value, but I'm not quite sure why. I assume that we wouldn't want:

{
    "const": "SingleValue",
    "oneOf": [{"const": "SingleValue"}]
}

over just the "const": "SingleValue", but since I don't follow why we add the enum in the single-value case I wanted to check.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per this comment

we recently re-added enum even when it is a single item (where we used const alone previously) because the orval API client generator mishandled the case where const was specified and enum was not. (And I was not aware of any backwards incompatibility that that might have been introducing.)

@rmehyde
Copy link
Author

rmehyde commented Mar 16, 2024

please review

Thanks! :)

@@ -83,6 +83,7 @@ class ConfigWrapper:
regex_engine: Literal['rust-regex', 'python-re']
validation_error_cause: bool
use_attribute_docstrings: bool
json_schema_literal_type: Literal['enum', 'oneof-const']
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd been following a great suggestion to look to the newly added opt-in attribute docstring support for inspiration here.

But I'm realizing that I followed that paradigm a bit too closely by putting the config option here, I think the best place for it is in a BaseModel.model_json_schema() parameter. Will refactor.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rmehyde,

Ah yeah, that makes sense re moving to model_json_schema. That being said, you should still be able to keep most / all of the docs / tests that you've written, which is nice. Ping me when you've refactored, and I'd be more than happy to review! Looks like great work so far :).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sydney-runkle, I've made that update!

There's one check failing: a segfault when running the Ubuntu 3.10 tests. I don't believe this is directly related to my changes, and I don't get that behavior running the command in a 3.10 environment on my own Ubuntu machine. But I did re-run the CI/CD and the behavior is consistent, let me know what the right next step there is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good, that test is annoyingly flaky. I just reran it again, hopefully it passes this time 🍀

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a first glance, looks great. I'll get back to you with a final confirmation tomorrow regarding if this is the best place to have this parameter. I think it is, but I suppose I could see an argument for just creating a custom instance of GenerateJsonSchema with this logic, and passing that to the model_json_schema() function.

Could you re-add that comprehensive docs section that you added? Maybe to the JSON schema docs (instead of where you originally had it, in the config docs)

@dmontagu
Copy link
Contributor

dmontagu commented Mar 19, 2024

See this comment #8888 (comment) — in particular, is there a reason that overriding literal_schema in a subclass of GenerateJsonSchema doesn't work for you? And passing that subclass into the schema_generator argument of the model_json_schema function? I'd rather not proliferate the arguments to model_json_schema if at all possible to avoid, for the sake of avoiding combinatorial test explosion.

(Not to mention, the whole point of the schema_generator argument was to make it easy to override behaviors exactly like this.)

@dmontagu
Copy link
Contributor

dmontagu commented Mar 19, 2024

In particular, just to make it explicit, the following implementation passes at least the first of the tests you added (as you can see at the bottom here):

from enum import Enum
from typing import Any

from pydantic import BaseModel
from pydantic.json_schema import GenerateJsonSchema, JsonSchemaValue
from pydantic_core import core_schema, to_jsonable_python


class MyGenerateJsonSchema(GenerateJsonSchema):
    def literal_schema(self, schema: core_schema.LiteralSchema) -> JsonSchemaValue:
        """Generate a oneOf const schema, rather than the default behavior
        """
        expected = [v.value if isinstance(v, Enum) else v for v in schema['expected']]
        # jsonify the expected values
        expected = [to_jsonable_python(v) for v in expected]

        result: dict[str, Any] = {}
        if len(expected) == 1:
            result['const'] = expected[0]

        if len(expected) > 1:
            descriptions = schema.get('metadata', {}).get('enum_case_descriptions', [])
            members = []
            for e in expected:
                member = {'const': e}

                try:
                    description_idx = [d[0] for d in descriptions].index(e)
                    member['description'] = descriptions[description_idx][1]
                except ValueError:
                    pass

                members.append(member)
            result['oneOf'] = members

        types = {type(e) for e in expected}
        if types == {str}:
            result['type'] = 'string'
        elif types == {int}:
            result['type'] = 'integer'
        elif types == {float}:
            result['type'] = 'number'
        elif types == {bool}:
            result['type'] = 'boolean'
        elif types == {list}:
            result['type'] = 'array'
        return result


class FooBar(str, Enum):
    """
    This enum Foos and Bars
    """

    foo = 'foo'
    bar = 'bar'


class Model(BaseModel):
    enum: FooBar


assert Model.model_json_schema(schema_generator=MyGenerateJsonSchema) == {
    'title': 'Model',
    'type': 'object',
    'properties': {'enum': {'$ref': '#/$defs/FooBar'}},
    'required': ['enum'],
    '$defs': {
        'FooBar': {
            'title': 'FooBar',
            'description': 'This enum Foos and Bars',
            'oneOf': [
                {'const': 'foo'},
                {'const': 'bar'},
            ],
            'type': 'string',
        }
    },
}

(I didn't check the others but I'd assume they would also work with this tweak to the .model_json_schema(...) call.)

@sydney-runkle
Copy link
Member

@dmontagu,

Thanks for the feedback. Upon further review, it does seem like using a custom GenerateJsonSchema class could be the best path forward here. That being said, perhaps we could keep this PR open as an opportunity to document this pattern / example, for others also hoping to modify the literal schema in such a way!

@rmehyde
Copy link
Author

rmehyde commented Mar 19, 2024

Upon further review, it does seem like using a custom GenerateJsonSchema class could be the best path forward here.

Agreed, thanks for pointing me in the right direction @dmontagu!

That being said, perhaps we could keep this PR open as an opportunity to document this pattern / example, for others also hoping to modify the literal schema in such a way!

Happy to close this and the issue — they should still be findable!

@rmehyde rmehyde closed this Mar 19, 2024
@rmehyde
Copy link
Author

rmehyde commented Mar 24, 2024

Apologies @dmontagu I should have waited to respond until I could give this my full attention.

This change can't be implemented through a custom GenerateJsonSchema class because that class only has access to information captured in core_schema.LiteralSchema. The other half of this code change includes updating SchemaTransformer.get_enum_core_schema() to read the docstrings and add them to the LiteralSchema metadata in the enum_case_descriptions field.

The GenerateJsonSchema subclass you provided above relies on that field, so it only works with this branch's changes in pydantic._internal.

@rmehyde rmehyde reopened this Mar 24, 2024
@dmontagu
Copy link
Contributor

Makes sense, I think maybe the right thing to do is add a reference to the enum type in the core schema to make this possible. Want to think about it a bit more but I think we can figure something out.

@rmehyde
Copy link
Author

rmehyde commented Mar 24, 2024

Sounds right to me, and I like that that opens the option of having a properly typed actual field in the LiteralSchema. Shoving the docs into a shared key in a metadata dictionary felt less than ideal.

@sydney-runkle
Copy link
Member

@rmehyde and @dmontagu,

We've recently moved enum validation to rust, and thus changed the logic on the pydantic end of things as well. That might affect this significantly. See #9064.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Render Enum Member Docstrings in JSON Schema
4 participants