Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Render Enum Member Docstrings in JSON Schema #8888

Open
3 of 13 tasks
rmehyde opened this issue Feb 25, 2024 · 9 comments · May be fixed by #9029
Open
3 of 13 tasks

Render Enum Member Docstrings in JSON Schema #8888

rmehyde opened this issue Feb 25, 2024 · 9 comments · May be fixed by #9029

Comments

@rmehyde
Copy link

rmehyde commented Feb 25, 2024

Initial Checks

  • I have searched Google & GitHub for similar requests and couldn't find anything
  • I have read and followed the docs and still think this feature is missing

Description

BLUF

Proposal to:

  1. Migrate JSON Schema output of multiple literal values from enum to oneOf + const
  2. Source descriptions for literal values from enum attributes' __doc__ when available

What & Why

If we have a field which can take a fixed number of values, we might want to add some detail around the meaning of the
values.

For example, suppose we have a parameter called execution_mode which can take on values of Synchronous or Asynchronous. Perhaps we want to offer these notes to the user:

Synchronous: execute synchronously and block until the task is completed
Asynchronous: kick off execution in the background and return immediately

In JSON Schema, this is best achieved through the use of oneOf and const:

execution_mode:
  type: string
  oneOf:
    - const: Synchronous
      description: execute synchronously and block until the task is completed
    - const: Asynchronous
      description: kick off execution in the background and return immediately

It would be valuable if Pydantic could support this JSON Schema output, and capture the member details in any other places they belong.

How: Reading From Enum Members' __doc__

Of course, we can't provide these details to typing.Literal. But when defining our literal values in an enum, __doc__ seems like a natural place to source them. While there is no standard syntax to set attributes' __doc__, there is active interest in documenting attributes. PEP 727 is under discussion
and it revived interest in the PEP 224 style supported by supported by Sphinx and now Pydantic:

class SomeClass:
    class_attribute = 42
    """An integer."""

Despite the lack of syntax to do so, we can still set the __doc__ dunder on enum members if we'd like. For example:

from enum import StrEnum


class DocumentedStrEnum(StrEnum):
    """
    Courtesy of Ethan Furman: https://stackoverflow.com/a/50473952
    """
    def __new__(cls, value, doc=None):
        self = str.__new__(cls)
        self._value_ = value
        if doc is not None:
            self.__doc__ = doc
        return self


class EvalMode(DocumentedStrEnum):
    SYNCHRONOUS = "Synchronous", "execute synchronously and block until the task is completed"
    ASYNCHRONOUS = "Asynchronous", "kick off execution in the background and return immediately"


assert EvalMode.SYNCHRONOUS.__doc__ == "execute synchronously and block until the task is completed"

Pydantic Functionality

I would hope to see the following changes to make this happen:

  • Pydantic model_json_schema() produces oneOf + const fields instead of enum fields when multiple there are multiple literal value, since JSON Schema's enum doesn't support member descriptions
  • Enum.attribute.__doc__ values are passed through to the description field in the output schema

Implementation

I'm new to the Pydantic codebase so would need some guidance if there are places outside of Model JSON Schema this data
would need to be used. But the first thing that comes to my mind is:

Risks & Alternatives

I'm sure I'm missing things so looking forward to the discussion! Here's what comes to mind for me:

Sourcing from PEP 224 Style Docstrings

Pydantic recently added support for parsing PEP 224-style attribute docstrings in Model Fields. It would be worth considering supporting that syntax alternatively or additionally.

Previous Migration from anyOf + const to enum JSON Schema

Pydantic moved from anyOf + const to enum to better support OpenAPI. Some proper validation should be done but since OpenAPI's 3.1.0 version primarily focused on better alignment with JSON Schema I expect this wouldn't be an issue.

Standardization Around Non-attr.__doc__

There is fresh appetite to document symbols, but not all the proposals involve setting the __doc__ attribute on the class attribute itself. If a different standard were adopted in the future, Pydantic might have to support both the new standard and the nonstandard __doc__ dunder.

Affected Components

@sydney-runkle
Copy link
Member

@rmehyde,

Thanks for the well thought out feature request!

We recently added support for opt-in attribute docstring support. See the docs on that here: https://docs.pydantic.dev/dev/api/config/#pydantic.config.ConfigDict.use_attribute_docstrings. I think that support for this could be added in a similar way.

Feel free to submit a PR adding support for this feature, I'd be happy to review! We'd want to make sure that this is opt in, as to avoid breaking changes for other users.

@rmehyde
Copy link
Author

rmehyde commented Mar 10, 2024

Sounds great, thanks @sydney-runkle! I'll submit a PR using the same opt-in mechanism as as the new docstring support.

@sydney-runkle
Copy link
Member

Awesome! We're going to release 2.7 next week, so if you get it in before then, we can get it into this next release :)

@rmehyde rmehyde linked a pull request Mar 16, 2024 that will close this issue
5 tasks
@rmehyde
Copy link
Author

rmehyde commented Mar 16, 2024

Excellent, thanks again! I've opened #9029 to implement this

@dmontagu
Copy link
Contributor

dmontagu commented Mar 19, 2024

Pydantic model_json_schema() produces oneOf + const fields instead of enum fields when multiple there are multiple literal value, since JSON Schema's enum doesn't support member descriptions

The main reason I don't want to necessarily do that by default is that it may not play nice with third party JSON schema tooling.

In particular, we recently re-added enum even when it is a single item (where we used const alone previously) because the orval API client generator mishandled the case where const was specified and enum was not. (And I was not aware of any backwards incompatibility that that might have been introducing.) Before changing the default generated schema I would want to make sure that this doesn't lead to issues.

I am not necessarily opposed to adding this option, but I would feel better about it if it didn't require adding an argument to the model_json_schema function. In particular:

  • Would it be possible to use both the oneOf + const and enum in the same JSON schema? I wouldn't be surprised if this caused issues, but also wouldn't be surprised if it didn't, and if it worked, we could potentially just add both
  • Is there a reason you are unwilling to use a subclass of GenerateJsonSchema to implement this? The reason we added that class was to make it easier to do things like this in your own code without needing to modify the pydantic implementation.

@rmehyde
Copy link
Author

rmehyde commented Mar 19, 2024

The main reason I don't want to necessarily do that by default is that it may not play nice with third party JSON schema tooling.

Agreed, I think no matter the behavior of third-party tooling this would be best treated as a breaking change.

Is there a reason you are unwilling to use a subclass of GenerateJsonSchema to implement this? The reason we added that class was to make it easier to do things like this in your own code without needing to modify the pydantic implementation.

Just lack of awareness of that option! I didn't see from the outset that I could achieve this result that way, so started exploring the code and implementing from the perspective of a Pydantic change. Happy to just go that route if this isn't valuable to have in the library. Feel free to close if so.

@rmehyde
Copy link
Author

rmehyde commented Mar 19, 2024

Would it be possible to use both the oneOf + const and enum in the same JSON schema? I wouldn't be surprised if this caused issues, but also wouldn't be surprised if it didn't, and if it worked, we could potentially just add both

I think it would be reasonable for third-party tools to produce unpredictable behavior in this case, whether or not they cause issues today.

@rmehyde
Copy link
Author

rmehyde commented Mar 19, 2024

Closing this per the discussion here and in the PR.

I might bring this back up as a change in default behavior when V3 is on the horizon if it plays nicely enough with third-party tools.

@rmehyde rmehyde closed this as completed Mar 19, 2024
@rmehyde
Copy link
Author

rmehyde commented Mar 24, 2024

Apologies for the back and forth here, but on a closer look I don't think GenerateJsonSchema subclassing is an option here.

@rmehyde rmehyde reopened this Mar 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants