Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check if the data can be structured without actually structuring it #414

Open
albertino87 opened this issue Aug 10, 2023 · 7 comments
Open

Comments

@albertino87
Copy link

  • cattrs version: 22.2.0
  • Python version: 3.10
  • Operating System: Windows 10

Description

I have a dataclass that can be instantiated (structured) with a from_dict method, in this way I know all the typings are correct:

from dataclasses import Field, asdict, dataclass, fields
from cattrs import Converter

@dataclass
class MyDataClass:
    a: str
    b: int

    def validate_types(self) -> None:
        self.from_dict(asdict(self))

    @classmethod
    def from_dict(cls, data: dict, converter: Converter = get_converter()):
        return converter.structure(data, cls)

It also has a validate_types method that can be run after instantiation

What I Did

I would like to add the validate_types in the post_init:

from dataclasses import Field, asdict, dataclass, fields
from cattrs import Converter

@dataclass
class MyDataClass:
    a: str
    b: int

    def __post_init__(self):
        self.validate_types()

    def validate_types(self) -> None:
        self.from_dict(asdict(self))

    @classmethod
    def from_dict(cls, data: dict, converter: Converter = get_converter()):
        return converter.structure(data, cls)

unfortunately this ends in an infinite loop, because the from_dict generates a new instance of the class and the code goes into the post_init which calls validate_types, which calls from_dict, which creates a new instance of the class which goes into the post_init and on and on.

would it be possible for you to expose a function/method of the converter that only checks if the data can be structured without actually structuring/creating the new instance?

@Tinche
Copy link
Member

Tinche commented Aug 10, 2023

You could consider the following approach:

from dataclasses import astuple, dataclass, fields

from cattrs import Converter

c = Converter()


def get_converter() -> Converter:
    return c


@dataclass
class MyDataClass:
    a: str
    b: int

    def __post_init__(self):
        self.validate_types()

    def validate_types(self) -> None:
        c.structure(astuple(self), tuple[tuple(f.type for f in fields(self.__class__))])

    @classmethod
    def from_dict(
        cls, data: dict, converter: Converter = get_converter()
    ) -> "MyDataClass":
        return converter.structure(data, cls)


MyDataClass(a="a", b="a")  # Will raise since `b` isn't an int

The idea is you run the structuring path, but not for your class but for a tuple[str, int].

Note that this approach will be somewhat slow, especially if you use MyDataClass.from_dict (it will actually validate twice, once when structuring in from_dict and once in validate_types). Maybe that's ok for your use case!

@albertino87
Copy link
Author

i guess this could also work :), tanks!

@albertino87
Copy link
Author

albertino87 commented Aug 11, 2023

You could consider the following approach:

from dataclasses import astuple, dataclass, fields

from cattrs import Converter

c = Converter()


def get_converter() -> Converter:
    return c


@dataclass
class MyDataClass:
    a: str
    b: int

    def __post_init__(self):
        self.validate_types()

    def validate_types(self) -> None:
        c.structure(astuple(self), tuple[tuple(f.type for f in fields(self.__class__))])

    @classmethod
    def from_dict(
        cls, data: dict, converter: Converter = get_converter()
    ) -> "MyDataClass":
        return converter.structure(data, cls)


MyDataClass(a="a", b="a")  # Will raise since `b` isn't an int

The idea is you run the structuring path, but not for your class but for a tuple[str, int].

Note that this approach will be somewhat slow, especially if you use MyDataClass.from_dict (it will actually validate twice, once when structuring in from_dict and once in validate_types). Maybe that's ok for your use case!

Unfortunately this method doesn't work when you have such dataclasses nested one inside the other all with:

@dataclass
class GenericDataClass:


    def __post_init__(self):
        self.validate_types()

    def validate_types(self) -> None:
        c.structure(astuple(self), tuple[tuple(f.type for f in fields(self.__class__))])

I guess my example from before was too simplistic :(

@Tinche
Copy link
Member

Tinche commented Aug 11, 2023

Your GenericDataClass has no fields, I assume it should have a field of type DataClass?

This is getting a little tricky. We can set up a separate converter with unstruct_strat=UnstructureStrategy.AS_TUPLE (that means it'll un/structure classes to tuples instead of dictionaries) and use that for this validation.

from dataclasses import astuple, dataclass, fields

from cattrs import Converter, UnstructureStrategy

c = Converter()
tuple_c = Converter(unstruct_strat=UnstructureStrategy.AS_TUPLE)


def get_converter() -> Converter:
    return c


@dataclass
class MyDataClass:
    a: str
    b: int

    def __post_init__(self):
        self.validate_types()

    def validate_types(self) -> None:
        tuple_c.structure(
            astuple(self), tuple[tuple(f.type for f in fields(self.__class__))]
        )

    @classmethod
    def from_dict(
        cls, data: dict, converter: Converter = get_converter()
    ) -> "MyDataClass":
        return converter.structure(data, cls)


@dataclass
class GenericDataClass:
    a: MyDataClass

    def __post_init__(self):
        self.validate_types()

    def validate_types(self) -> None:
        tuple_c.structure(
            astuple(self), tuple[tuple(f.type for f in fields(self.__class__))]
        )


GenericDataClass(MyDataClass("a", 1))

@albertino87
Copy link
Author

albertino87 commented Aug 11, 2023

ok, sorry for the incompleteness, the full example would be something like this (with whatever works in validate types):

from abc import ABC
from dataclasses import dataclass, astuple, fields

from cattrs import Converter

DEFAULT_STRUCTURE_HOOKS = {
}

def get_converter(
    custom_hooks: StructureHookMap | None = None
) -> Converter:
    if custom_hooks is None:
        custom_hooks = {}
    structure_hooks = DEFAULT_STRUCTURE_HOOKS | custom_hooks
    converter = Converter(forbid_extra_keys=True)
    for hook_type, hook in structure_hooks.items():
        converter.register_structure_hook(hook_type, hook)
    return converter

class Entity(ABC):

    def __init_subclass__(cls, frozen=True):
        return dataclass(frozen=frozen)(cls)

    def __post_init__(self):
        self.validate_types()

    def validate_types(self, converter: Converter = get_converter()) -> None:
        converter.structure(astuple(self), tuple[tuple(f.type for f in fields(self.__class__))])

    @classmethod
    def from_dict(
        cls, data: dict, converter: Converter = get_converter()
    ):
        return converter.structure(data, cls)


class Child(Entity):

    beta: float
    population_name: str


class Parent(Entity):

    failure_mode_name: str
    project: str
    populations: list[Child]
    id: str

valid_entity = Parent(
    id="123456789",
    failure_mode_name="failure mode A",
    project="Myproj",
    populations=[
        Child(beta=1.0, population_name="pop A"),
        Child(beta=2.0, population_name="pop B"),
    ],
)

I know that with from_dict it will validate the types twice but for now it's ok

but your proposal seems to be working if i change get_converter to:

def get_converter(
    custom_hooks: StructureHookMap | None = None
) -> tuple[Converter, Converter]:
    if custom_hooks is None:
        custom_hooks = {}
    structure_hooks = DEFAULT_STRUCTURE_HOOKS | custom_hooks
    converter1 = Converter()
    converter2 = Converter(unstruct_strat=UnstructureStrategy.AS_TUPLE)
    for hook_type, hook in structure_hooks.items():
        converter1.register_structure_hook(hook_type, hook)
        converter2.register_structure_hook(hook_type, hook)
    return converter1, converter2

and call the respective converters in from_dict and validate_types :)

however it would be nice to have something directly from cattrs :)

@Tinche
Copy link
Member

Tinche commented Aug 11, 2023

Hm, what you're really asking for is runtime validation of types in __init__, right? Maybe a way for a cattrs converter to wrap __init__ and apply checks?

@albertino87
Copy link
Author

albertino87 commented Aug 11, 2023

i think that what i'd like to have is already built in cattrs since it structures the data and while doing so it checks the typing. what i'd like to have is just the type check
something like:

converter.type_check(data, dataclass/tuple[...]/list[....]/or any other python structure)

the thing is that the method you proposed i think might fail if the nested dictionaries in data have an order of the keys different from the one of the nested dataclasses in the structure. it doesn't really affect me in this case since the data comes from asdict(self) so the order will be the same

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants