Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better Union resolution #423

Open
DLumi opened this issue Aug 30, 2023 · 3 comments
Open

Better Union resolution #423

DLumi opened this issue Aug 30, 2023 · 3 comments

Comments

@DLumi
Copy link

DLumi commented Aug 30, 2023

  • cattrs version: 23.1.2
  • Python version: 3.10.6
  • Operating System: Win 10

Description

Right now I'm looking to migrate from dataclass + dacite to attrs + cattrs, but there's a feature in dacite that I miss in cattrs: consequtive resolution of Unions. What dacite does is that it takes each type in the Union and tries to match it to the data. If nothing is matched - the error is raised. This behavior seemed pretty intuitive for me.
However, if cattrs meets even a really basic Union like Union[str, int, float] and data like '123' it will pretend it doesn't know how to deal with this one, even if the resolution is really straightforward. Yes, you can probably write a hook for that, but manually describing each Union is really undesirable and error-prone. Maybe hook factory solves this, but I'm still figuring this stuff out, and my idea is that this should not be as complicated as it is right now.

What I Did

from attrs import define
from cattrs import structure
from enum import Enum
from typing import Union

c = structure('123', Union[str, int, float])
# Expected behavior: '123'
# Actual behavior: cattrs.errors.StructureHandlerNotFoundError: Unsupported type: typing.Union[str, int, float]. Register a structure hook for it.

@define
class C:
    foo: str

c = structure({'foo': 'bar'}, Union[C, dict])

# Expected behavior: C(foo='bar')
# Actual behavior: cattrs.errors.StructureHandlerNotFoundError: Unsupported type: typing.Union[__main__.C, dict]. Register a structure hook for it.


class Foo(Enum):
    BAR = 'Bar'

c = structure('Bar', Union[Foo, str])

# Expected behavior: Foo.BAR
# Actual behavior: cattrs.errors.StructureHandlerNotFoundError: Unsupported type: typing.Union[__main__.Foo, str]. Register a structure hook for it.
@DLumi
Copy link
Author

DLumi commented Aug 30, 2023

Yes, managed to circumvent this with factories, but I still think that should be configurable out of the box, especially when the documentation regarding hooks and factories is not very clear, and the solution was ultimately found by trial and error.

If anyone wonders what I did:

from typing import Type, Union, get_origin, get_args
from cattrs import structure, global_converter

def is_union(type_: Type) -> bool:
    return get_origin(type_) is Union


def union_structure_hook_factory(_):

    def union_hook(data, type_):

        args = get_args(type_)

        if data is None:  # we don't try to coerce None into anything
            return None

        for t in args:
            try:
                return structure(data, t)
            except Exception:
                continue
        raise ValueError(f"Could not cast {data} to {type_}")
    return union_hook

global_converter.register_structure_hook_factory(is_union, union_structure_hook_factory)

After that the Union resolution works (mostly) like a charm. Optional types would work with this one, too, however, there could be corner cases that I missed.
Of course, you'd have to put your desired types in the annotation up front for both efficiency and undesired coercion avoidance.

@Tinche
Copy link
Member

Tinche commented Aug 30, 2023

Howdy! Thanks for your interest in cattrs.

The good news is in the next release at least the first example will work out of the box if you use a preconfigured converter (like from cattrs.preconf.orjson) thanks to the union passthrough strategy. (You can also manually apply this strategy to any converter yourself.)

I'm vaguely aware of the strategy you mention; I think Pydantic used to use it (and maybe they still do, I don't know). I really dislike it for a number of reasons though.

Firstly, it makes the ordering inside of a union matter. In the Python type system, in general, A | B is equivalent to B | A, and this changes that axiom. This adds cognitive load (i.e. you have to keep this in mind) for the sake of serialization.

Secondly, it's inefficient. The logic here will be O(n) where n is the size of the union, and if your value is at the end of the union, the structuring logic will hit the sad path for all other members of the union. The sad path is generally slow since it raises exceptions.

Thirdly, cattrs generally deals in coercion, not validation. (I think other libraries like Pydantic v2 are following in this nowadays, but again, not sure.) There are good reasons for this which I won't go into here. But this means if you do converter.structure(1, str), the result will end up being the string "1" instead of an exception. I think this fact might lead to surprising results in some cases, and we don't want surprising edge cases.

And lastly, I think it just leads to loss of information. For example, if your type is datetime | str and the value is "2023-08-30T00:00:00", there's really no way to know if you're being sent a datetime or an actual string with that contents.

For these reasons I feel this approach is an attractive nuisance and I would be reluctant to have cattrs do this by default.

That said, the point of cattrs is to be flexible. If you're willing to contribute a strategy that enables this I'd be glad to provide review and feedback. (A strategy is a high-level customization that you can apply to any converter to enable some behavior.) I think you have a solid start in your second comment.

@DLumi
Copy link
Author

DLumi commented Aug 31, 2023

Thank you for detailed answer!

I can totally see your reasoning for avoiding this behavior as default. However, I still think it should be implemented in some form or other, even with this points in mind.

Speaking of the equivalency of the types, you are right that ordering will add some cognitive load, but it kind of makes sense. Let me explain.
Unless you unite types that have common properties or interfaces (like, annotating various collections, etc), they are not exactly equivalent, there's always an preferred option, right? This is because your class simply would not be able to deal with different types the same way, we need some sort of alignment first.
Given that, it would make sense to put the preferred option up front as an indication for the end user.

Here's an example just to illustrate what I mean:

@define
class A:
    date_field: Union[datetime, str]

    def __attrs_post_init__(self):  # or we could have an attrs field with a converter specified
        if isinstance(self.date_field, str): 
            self.date_field = datetime.strptime(self.date_field, '%Y-%m-%d')

    def apply_offset(self, offset: timedelta = timedelta(days=1)):
        self.date_field += offset

So for me that would mean: "This code mainly works with datetime objects, so you'd rather provide those, however, we do accept strings for your conveniency, but we convert them anyway".

Regarding the efficiency of the solution, or lack of thereof, from a user experience stand point I'd probably have some default strategy can be inefficient or has some nuances to keep in mind, but at least it works, instead forcing you to bang your head into the wall and having to provide some custom boilerplate for exactly the same thing. As you said, cattrs is designed to be flexible, so if you don't like the defaults - you just override them with your custom hooks.
Well, that is my two cents on this, at least.

As for coercion / loss of information - cannot really comment on those, but I do feel that if you have some sort of conversion in your code, you should be fine most of the time. And if you don't, why would you want to unite different types in the first place?

Anyway, if you wish to see this sort of thing added to cattrs, I can probably contribute this as a "strategy". However, it's not that clear from documentation how these strategies are supposed to be formatted. There should be some rules for this, right?
For now, I'll just update my second comment with the code I ended up using for this. It might not cover every possible corner case (I can't really think of many), but it works for my particular use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants