Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow different serializer to an Annotated type for "python" and "json" mode #8086

Open
3 of 13 tasks
Tracked by #9102
ChillPC opened this issue Nov 11, 2023 · 6 comments · May be fixed by #8432
Open
3 of 13 tasks
Tracked by #9102

Allow different serializer to an Annotated type for "python" and "json" mode #8086

ChillPC opened this issue Nov 11, 2023 · 6 comments · May be fixed by #8432
Assignees
Labels
feature request help wanted Pull Request welcome
Milestone

Comments

@ChillPC
Copy link

ChillPC commented Nov 11, 2023

Initial Checks

  • I have searched Google & GitHub for similar requests and couldn't find anything
  • I have read and followed the docs and still think this feature is missing

Description

A demo of how code might look when using the feature

Either give the possibility to have as many mode of serialization you want :

WeirdInt = Annotated[
    int,
    PlainSerializer(lambda i: f"{i} in mode python", return_type=str, mode={"python"}),
    PlainSerializer(lambda i: f"{i} in mode json", return_type=str, mode={"json"}),
    PlainSerializer(lambda i: f"{i} in mode my-custom-mode", return_type=str, mode={"my-custom-mode"}),

    PlainSerializer(lambda i: f"{i} in mode mode1 or mode2", return_type=str, mode={"mode1", "mode2"}),
]

class WeirdIntModel(BaseModel):
    i: WeirdInt

assert WeirdIntModel(i = 1).i == i
assert WeirdIntModel(i = 1).model_dump() == "1 in mode python"
assert WeirdIntModel(i = 1).model_dump(mode="json") == "1 in mode json" # Used by `.model_dump_json()`
assert WeirdIntModel(i = 1).model_dump(mode="my-custom-mode") == "1 in mode my-custom-mode"

assert WeirdIntModel(i = 1).model_dump(mode="mode1") == "1 in mode mode1 or mode2"
assert WeirdIntModel(i = 1).model_dump(mode="mode2") == "1 in mode mode1 or mode2"

Or at least differentiate the mode "python" and "json" :

WeirdInt = Annotated[
    int,
    PlainSerializer(lambda i: f"{i} in mode python", return_type=str),
    PlainSerializer(lambda i: f"{i} in mode json", return_type=str, when_used="json"), # Do not erase previous serializer
]

class WeirdIntModel(BaseModel):
    i: WeirdInt

assert WeirdIntModel(i = 1).i == i
assert WeirdIntModel(i = 1).model_dump() == "1 in mode python"
assert WeirdIntModel(i = 1).model_dump(mode="json") == "1 in mode json"

Your use case(s) for the feature

I work with mongodb and work with dates. It needs to have 3 different forms:

  • datetime.date in the business logic
  • datetime.datetime when dumping into mongo because there is no bson type representing only the date part of a datetime
  • str in format YYYYMMDD for the api

My first idea was to have multiple serializer on an Annotated type like this :

def validate_date(v: Any) -> date:
    if isinstance(v, datetime):
        return v.date()
    if isinstance(v, date):
        return v

    match v:
        case str(s):
            return date.fromisoformat(s)
        case int(x) | float(x):
            return date.fromtimestamp(x)
    raise ValueError(
        f"'{v} should be a valid date, a string in Iso8601 format "
        "or an integer/float of an epoch timestamp in seconds."
    )

GtfsDate = Annotated[
    date,
    BeforeValidator(validate_date),
    PlainSerializer(lambda d: datetime.combine(d, time()), return_type=datetime),
    PlainSerializer(lambda d: d.strftime("%Y%m%d"), return_type=str, when_used="json"),
    WithJsonSchema({"type": "str"}, mode="serialization"),
]

I thought that the 2nd PlainSerializer would override the 1st one only on "json" mode but serializing a BaseModel with this field give :

class T(BaseModel):
    d: GtfsDate

T(d=date.today()).model_dump()                      # => {'d': datetime.date(2023, 11, 10)} instead of {'d': datetime.datetime(2023, 11, 10, 0, 0)}
T(d=date.today()).model_dump(mode="json")  # => # {'d': '20231110'}

Why the feature should be added to pydantic (as opposed to another library or just implemented in your code)

This touch the serialization on the field level and not the model level. Custom user code would certainly be too cumbersome.

Affected Components

@sydney-runkle
Copy link
Member

Hi @ChillPC,

This seems like a great idea. You've brought up some great points about varied use cases for this kind of feature.

Do you have any interest in opening a PR adding support for this kind of logic? Perhaps this is something we could fit into our next minor release 😄.

@ChillPC
Copy link
Author

ChillPC commented Nov 14, 2023

Hello @sydney-runkle !

I sure would be interested but :

  • the code-base is quite large and I will certainly need help to grok it
  • I don't think that it would be easy to implement this feature
  • It will certainly touch to the rust pydantic-core code

Difficulties

BaseModel.model_dump(mode: Literal['json', 'python'] | str = 'python') is flexible enough so it would not be a problem.

The problem rise on the signature of the PlainSerializer with its when_used that is not flexible enough. It goes all the way to pydantic-core in src/serializers/type_serializers/format.rs.

In the definition of PlainSerializer there is :

schema['serialization'] = core_schema.plain_serializer_function_ser_schema(
    ...,
    when_used=self.when_used,
    ...
)

Would it be acceptable to store the newly created plain_serializer... into schema['serialization'][<name_of_mode>] ? Would it be in the rust part ?

Api consideration

For PlainSerializer (and WrapSerializer), the signature of when_used could be changed to something like that, thus keeping it retro compatible and translating the literals to a set object :

when_used: Literal['always', 'unless-none', 'json', 'json-unless-none'] | set[str] = {'python', 'json'}

But the set[str] would not act on the possibility of handling None like with unless-none. Should the distinction be added with a pre/suffix? A frozen dataclass that has when: str and unless_none: bool as the key of the dict ? Would it be "easy" to implement such a map in rust ?

And what about the key for the serializer to fallback to ? Would it be a magic string like "always" or "default" ?


Sorry if it is a lot of question 😅 I just want to be sure I am going on the right direction

@sydney-runkle sydney-runkle added the help wanted Pull Request welcome label Nov 15, 2023
@sshishov
Copy link

Also has this issue today.

We are allowing to serializer the model into python and json string. Why we do not allow the same keys for PlainSerializer? Why we allow always and json? Where is python?
Imho it is very big oversight from the core team thinking that the DATA stored in the model should be ALWAYS serializerd into python AS IS, what is not true in a lot of cases.

I would propose to have 2 different variables instead of a lot of Literals:

  • mode: python, json, always
  • unless_none: True, False

Am I missing something?

@sydney-runkle
Copy link
Member

@ChillPC,

Apologies for the delay. We're excited that you're going to help implement this!

I think that @davidhewitt will be the best person to answer these questions. Specifically, DH, what do you think about these two inquiries?

Would it be acceptable to store the newly created plain_serializer... into schema['serialization'][<name_of_mode>] ? Would it be in the rust part ?

In other words, should we expand the definition of when_used, or expand the locations in which we store serialization schema as suggested above?

But the set[str] would not act on the possibility of handling None like with unless-none. Should the distinction be added with a pre/suffix? A frozen dataclass that has when: str and unless_none: bool as the key of the dict ? Would it be "easy" to implement such a map in rust ?

Good question. I think having two indicators here could be useful.

And what about the key for the serializer to fallback to ? Would it be a magic string like "always" or "default" ?

The PlainSerializer and WrapSerializer types default to 'always', so I think the answer to your question is yes.

Feel free to reach out if you have more questions. I'll be much more prompt with responses moving forward.

@davidhewitt
Copy link
Contributor

davidhewitt commented Dec 5, 2023

I think there are two different feature requests here, and let's separate them. Custom modes like mode={"mode1", "mode2"} might be a lot of work, so let's keep that out of this issue and discuss that elsewhere if its really needed.

For "python" and "json" mode, this is already supported in pydantic_core by the json_or_python schema. If needed it could be built with the same validator for each mode but different serialization schemas for each mode, e.g.:

schema_python = some_validation_schema()
schema_json = schema_python.copy()
schema_python['serialization'] = python_serializer
schema_json['serialization'] = json_serializer

core_schema.json_or_python_schema(json_schema=schema_json, python_schema=schema_python)

So I think this can be implemented without any Rust changes, just need to work out a desirable way to expose this in the Pydantic API and build a schema like the above.

@ChillPC
Copy link
Author

ChillPC commented Dec 22, 2023

Hello @davidhewitt , json_or_python_schema seems to act weirdly. See my PR for details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request help wanted Pull Request welcome
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants