Serialisation and Dumping #4456
Replies: 9 comments 16 replies
-
I'm wondering on the separation of concerns between input and output. The second question after the "what is in the (default) output" is "how is it generated?" |
Beta Was this translation helpful? Give feedback.
-
All this sounds good to me and I don't think I have many strong or useful opinions. I agree with @PrettyWood that being able to separate aliases for loading and dumping would probably make sense. The other thing is just that I want to make sure I understood correctly, when talking about "python" and "json", the input is always a Pydantic field/model, right? And the output is dicts or JSON str, correct? I mean, this is talking specifically about those things and not about loading data, right? |
Beta Was this translation helpful? Give feedback.
-
I would like an easy way to set Currently the BaseModel config options don't have that as an option. So in my project I currently do:
And then everything else uses that new (And there's some other stuff in my BaseModel as well.) |
Beta Was this translation helpful? Give feedback.
-
One of the main inputs and outputs of my scripts that use pydantic is AWS' DynamoDB no-sql database. The As I mentioned in my previous comment, I currently subclass
It would be great if I could dump my model into a dict that contains Decimals not floats. (And strings not datetimes.) e.g. |
Beta Was this translation helpful? Give feedback.
-
I'm working on this now. Work is being tracked in #4739. |
Beta Was this translation helpful? Give feedback.
-
It's currently unclear how to apply custom serializers per-type or per-field in models for V2. All I've found in # TODO: Add a suggested migration path once there is a way to use custom encoders
@deprecated('custom_pydantic_encoder is deprecated.')
def custom_pydantic_encoder(...): ... which suggests this is still to-be-done, however as per above #4739 seems to be "done"? If so, what's the equivalent of the mechanism to activate this pseudo-code from above? def to_format(self, value: Any, format: str) -> Any: Seriealization is super relevant for https://github.com/NowanIlfideme/pydantic-yaml and https://github.com/NowanIlfideme/pydantic-kedro ; thank you in advance! 😃 |
Beta Was this translation helpful? Give feedback.
-
As discussed over here: #3293 It would be nice to just have to explicitly mention a parent class as type and let Pydantic dynamically find the right child class based on a discriminator. For now this works, but Pylance gives an error as Union visually contains just one argument:
It will make it less messy if there are more subclasses, because you don't have to mention them explicitly in the Union. |
Beta Was this translation helpful? Give feedback.
-
Hey, not sure if its the right place here, or if I should open my own discussion. In v1 the |
Beta Was this translation helpful? Give feedback.
-
Hi there, i faced nearly the same problem. I am using a discriminator-function with tags and want to build the union list dynamically. After playing around a bit i found a quite nice solution which i want to share with you I created a type registry: def singleton(cls):
instances = {}
def getinstance(*args, **kwargs):
if cls not in instances:
instances[cls] = cls(*args, **kwargs)
return instances[cls]
return getinstance
class TypeRegistry(ABC):
def __init__(self, registry_name: str, logger):
self._registry_name = registry_name
self._logger = logger
self._type_map = {}
def register(self, enum_value, registered_type):
"""Registers a device"""
if enum_value in self._type_map:
self._logger.debug(f"For enum-value {enum_value} the type {type(registered_type)} will be updated (Registry: {self._registry_name}).")
else:
self._logger.debug(f"For enum-value {enum_value} the type {type(registered_type)} will be registered (Registry: {self._registry_name}).")
self._type_map[enum_value] = registered_type
def type_is_registered(self, enum_value) -> bool:
"""Check, if a type has been registered."""
return (enum_value in self._type_map)
def get_class_for_type(self, enum_value) -> type:
"""Get the registered type for an enum-value."""
if enum_value not in self._type_map:
self._logger.warning(f"Requested type {enum_value} has not been registered yet (Registry: {self._registry_name}).")
return None
return self._type_map[enum_value]
def get_type_map(self) -> dict:
"""Return a copy of the type map."""
return self._type_map.copy()
@singleton
class DeviceTypeRegistry(TypeRegistry):
def __init__(self, ):
super().__init__("DeviceRegistry", logging.getLogger("DeviceTypeRegistry")) and a mapper, which maps the TypeRegistry into a tuple: class TypeRegistryMapper:
@classmethod
def as_tuple(self, type_registry: TypeRegistry):
ret = []
for type_name, type_class in type_registry.get_type_map().items():
ret.append(Annotated[type_class, Tag(type_name)])
return tuple(ret) The TypeRegistry is filled by a decorator: def DeviceReg(value):
"""Add type to TypeRegistry"""
def decorator(cls):
registry = DeviceTypeRegistry()
registry.register(value, cls)
return cls
return decorator Discriminator Function to choose Tags. The value 'type' comes from json: def device_discriminator(v: Any) -> str:
device_type = v.get("type")
reg = DeviceTypeRegistry()
if reg.type_is_registered(device_type):
return device_type
return "DEVICE_BASE" Sample Model: class Client(BaseModel):
homeId: str = ""
id: str = ""
label: str = ""
clientType: str = ""
class ModelDeviceBase(BaseModel):
pass
@DeviceReg("DEVICE_BASE")
class Device(ModelDeviceBase):
availableFirmwareVersion: str = ""
connectionType: str = ""
firmwareVersion: str = ""
firmwareVersionInteger: int = ""
type: str
@DeviceReg("RAIN_SENSOR")
class RainSensor(Device):
type: str
@DeviceReg("DIN_RAIL_SWITCH")
class DinRailSensor(Device):
type: str
class Base(BaseModel):
clients: dict[str, Client] = {}
devices: dict[
str,
Annotated[
Union[
TypeRegistryMapper.as_tuple(DeviceTypeRegistry())
],
Discriminator(device_discriminator),
],
] The RegistryToUnionMapper creates the annotated tag list. So i just have to add a new class with the I love it! Maybe it may helps someone else. |
Beta Was this translation helpful? Give feedback.
-
(A note on language, "dump" is used as the verb for the thing we're talking about, and consequently in method names,
"serialisation" is used as the noun, thus "dump" and "serialise" are considered synonyms in this context. Both are arguably wrong since we're often converting one python object into another, but I can't think of better terminology.)
Related discussions
Section of the V2 blog.
Required Features
Below is a list of features I think people want, let me know if I've missed anything significant.
Models and non-models
Most past conversations about this relate primarily to models since they are the primary building block for everything in pydantic V1. But in V2 models are no-longer the "quantum" of all validation and the output type of a validation schema can be virtually anything.
We therefore need to provide ways to customize serialisation of any object as well as continuing to support models.
Customising type serialisation
Including:
json.dumps
handles by default and therefore doesn't easily allow customisation ofCustomising field serialisation
E.g. a way to define how
model.age
is serialised as opposed tomodel.id
although they're both ints.JSON serialisation
JSON is a special case and we want builtin support for creating JSON from models and other objects.
Aliases
This is pretty simple at dump-time since we can decide on the alias when building the schema.
We should continue to support
by_alias
.exclude_unset
,exclude_defaults
andexclude_none
I guess these have to remain.
include
andexclude
Currently, this is configurable via
dict(include=...)
anddict(exclude=...)
which has some very complex behaviour. (dict()
is being renamed tomodel_dump()
in V2 as per Nodel Namespace Cleanup)It would make the logic much simpler, and probably make execution significantly faster, if we could remove
include
andexclude
arguments and only allow them to be hard-coded on the model,but I assume this would cause a revolution. (Also I know how useful customising these when calling
model_dump
can be.)We currently also have
exclude
andinclude
on aField
, I think we should definitely keepexclude
.The precise semantics of when
include
trumpsexclude
is inherently complex, it would be good if we could removeinclude
fromField
and only allow it viamodel_dump
.Output formats
Similar to the blog post, we need the following output formats:
dict
,list
,str
,int
,float
,bool
,None
All these formats should be customisable via functions - obviously 2 and 3 should use the same customisation logic.
Implementation
The plan is to implement as much of this logic as possible in rust within pydantic-core.
The key insight I have after thinking about this for a while is this:
json.dumps
,JSON.stringify
and friends rely on some variation ofisinstance
since they don't know anythingabout the data they're serialising until they're called, but we know the types of data before we serialise it.
We can therefore prepare the serializers and skip expensive
isinstance
checks by building aSerializer
thatshadows the model it is built to dump.
This approach is also very close to what we do for validation, and thus should allow good symmetry between the validation and serialisation.
There are two possible approaches here:
Validator
trait to also support serialisation.Serializer
trait and all required implementationsThe second approach is probably more code but would provide a cleaner separation of concerns. The real question is whether
all the validators make sense as serializers, and similarly whether there are serializers that don't make sense as validators.
Regardless of which approach we take, the basic idea would be the same: we have default implementations for "python" and "json" and the option to override with a python function.
The rough signature of the trait would be something like (I'm using python here for pseudo-code, but the actual implementation would be in rust)
This is slightly naive, in reality there would be some more complexity in returning a rust enum of
PyObject or JsonType
from
to_json
to avoid having to speed up JSON generation.We would then have two "finalisers" (better name required) which combine serialised fields into either a python
object or JSON.
cc @PrettyWood @tiangolo @hramezani
I have more to say on this, but I need to go for lunch... I'll try to add more soon.
Beta Was this translation helpful? Give feedback.
All reactions