Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation does not explain unstruct_collection_overrides' input #513

Open
td-anne opened this issue Feb 27, 2024 · 4 comments
Open

Documentation does not explain unstruct_collection_overrides' input #513

td-anne opened this issue Feb 27, 2024 · 4 comments

Comments

@td-anne
Copy link

td-anne commented Feb 27, 2024

  • cattrs version: 23.2.3
  • Python version: 3.11.4
  • Operating System: MacOS

Description

I am trying to unstructure sets into something that JSON can cope with and that is canonical (equal for any two equal sets; set iteration order can differ for equal sets). I tried unstruct_collection_overrides={Set: sorted} and obtained a very confusing error ("< not supported between dict and dict"). It turns out that the problem is that the functions in unstruct_collection_overrides receive a generator that emits the unstructured contents of the collection (at least for Set; I'm not sure for dicts). Since the objects in my set are frozen attrs objects, they themselves work fine in a Set (and being sorted), but once unstructured they become dictionaries, which can't be sorted.

[Edited to add: my conclusion is that this isn't really possible; that's not the bug, that's just an unfortunate limitation.]

The problem: the documentation at https://catt.rs/en/stable/unstructuring.html#customizing-collection-unstructuring never specifies what the unstructuring functions receive as input. Neither does the docstring for Converter (https://catt.rs/en/stable/cattrs.html#cattrs.Converter). A little explanation would have helped tremendously in tracking down the mysterious error.

What I Did

@frozen
class A:
    x: int = 0
    y: int = 0


def convert_set(s):
    list_s = list(s)
    print(f"{list_s=}")
    return sorted(list_s)


converter = Converter(unstruct_collection_overrides={Set: convert_set})

converter.unstructure({A(x=1), A(x=2), A(x=3)})

gives

list_s=[{'x': 3, 'y': 0}, {'x': 1, 'y': 0}, {'x': 2, 'y': 0}]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[12], [line 15](vscode-notebook-cell:?execution_count=12&line=15)
     [10](vscode-notebook-cell:?execution_count=12&line=10)     return sorted(list_s)
     [13](vscode-notebook-cell:?execution_count=12&line=13) converter = Converter(unstruct_collection_overrides={Set: convert_set})
---> [15](vscode-notebook-cell:?execution_count=12&line=15) converter.unstructure({A(x=1), A(x=2), A(x=3)})

File [~/Library/Caches/pypoetry/virtualenvs/da-dropper-wire-data-l5CQQYIb-py3.11/lib/python3.11/site-packages/cattrs/converters.py:238](https://file+.vscode-resource.vscode-cdn.net/Users/annearchibald/projects/da-dropper-wire-data/analysis/~/Library/Caches/pypoetry/virtualenvs/da-dropper-wire-data-l5CQQYIb-py3.11/lib/python3.11/site-packages/cattrs/converters.py:238), in BaseConverter.unstructure(self, obj, unstructure_as)
    [237](https://file+.vscode-resource.vscode-cdn.net/Users/annearchibald/projects/da-dropper-wire-data/analysis/~/Library/Caches/pypoetry/virtualenvs/da-dropper-wire-data-l5CQQYIb-py3.11/lib/python3.11/site-packages/cattrs/converters.py:237) def unstructure(self, obj: Any, unstructure_as: Any = None) -> Any:
--> [238](https://file+.vscode-resource.vscode-cdn.net/Users/annearchibald/projects/da-dropper-wire-data/analysis/~/Library/Caches/pypoetry/virtualenvs/da-dropper-wire-data-l5CQQYIb-py3.11/lib/python3.11/site-packages/cattrs/converters.py:238)     return self._unstructure_func.dispatch(
    [239](https://file+.vscode-resource.vscode-cdn.net/Users/annearchibald/projects/da-dropper-wire-data/analysis/~/Library/Caches/pypoetry/virtualenvs/da-dropper-wire-data-l5CQQYIb-py3.11/lib/python3.11/site-packages/cattrs/converters.py:239)         obj.__class__ if unstructure_as is None else unstructure_as
    [240](https://file+.vscode-resource.vscode-cdn.net/Users/annearchibald/projects/da-dropper-wire-data/analysis/~/Library/Caches/pypoetry/virtualenvs/da-dropper-wire-data-l5CQQYIb-py3.11/lib/python3.11/site-packages/cattrs/converters.py:240)     )(obj)

File :2, in unstructure_iterable(iterable)

Cell In[12], [line 10](vscode-notebook-cell:?execution_count=12&line=10)
      [8](vscode-notebook-cell:?execution_count=12&line=8) list_s = list(s)
      [9](vscode-notebook-cell:?execution_count=12&line=9) print(f"{list_s=}")
---> [10](vscode-notebook-cell:?execution_count=12&line=10) return sorted(list_s)

TypeError: '<' not supported between instances of 'dict' and 'dict'

as opposed to some kind of canonically sorted list.

@Tinche
Copy link
Member

Tinche commented Mar 1, 2024

Howdy!

Sorry to hear you had issues with this. Documentation is always a work in progress. Would you care to submit a PR for the docs dealing with this, to help the next person? I'll gladly review and merge it.

my conclusion is that this isn't really possible

So what exactly are you trying to accomplish? Forcing a consistent ordering of sets of frozen attrs classes in json? I love a challenge.

If you're willing to make the attrs classes orderable, you might try something like this:

from typing import get_args

from attrs import AttrsInstance, frozen, has

from cattrs import Converter
from cattrs._compat import is_mutable_set

c = Converter()


@frozen(order=True)
class A:
    x: int = 0
    y: int = 0


def unstructure_sets(val: set[AttrsInstance]) -> list:
    return [c.unstructure(v) for v in sorted(val)]


c.register_unstructure_hook_func(
    lambda t: is_mutable_set(t) and has(get_args(t)[0]), unstructure_sets
)

print(c.unstructure({A(x=1), A(x=2), A(x=3)}, unstructure_as=set[A]))

(This can be made faster by adding some complexity.)

@td-anne
Copy link
Author

td-anne commented Mar 4, 2024

Aha! That is indeed what I was trying to do. I think the magic input here is the nearly undocumented unstructure_as argument. Dropping it results in the set unstructuring seeing the contents as dictionaries.

I don't mind writing up a doc PR, but I don't think I understand how the unstructuring procedure works. Presumably it takes the tree of objects and unstructures from the leaves up? That is, the unstructuring procedure for collections is "unstructure all the contents, put them in a collection of the same kind, and then use unstruct_collection_overrides to convert the new collection? How does this interact with unstructure_as?

@td-anne
Copy link
Author

td-anne commented Mar 4, 2024

I think the following achieves what I was aiming for:

def unstructure_set(val: set) -> list:
    vals = list(val)
    try:
        return [c.unstructure(v) for v in sorted(vals)]
    except TypeError as e:
        print(f"Whoops, can't sort: {e}")
        return [c.unstructure(v) for v in vals]


c = Converter()
c.register_unstructure_hook_func(
    lambda t: issubclass(t, set)
    or issubclass(getattr(t, "__origin__", type(None)), set),
    unstructure_set,
)

print(c.unstructure({1, 2, -1}))


@frozen(order=True)
class A:
    x: int = 0
    y: int = 0


print(c.unstructure({A(x=4), A(x=2), A(x=3)}))
print(c.unstructure({A(x=4), A(x=2), A(x=3)}, unstructure_as=set[A]))


@frozen
class B:
    x: int = 0
    y: int = 0


print(c.unstructure({B(x=4), B(x=2), B(x=3)}))
print(c.unstructure({B(x=4), B(x=2), B(x=3)}, unstructure_as=set[B]))

That is, c unstructures set as list, sorted if possible but unsorted if the elements don't support ordering.

I don't think I can do this with unstruct_collection_overrides because that gets the contents already unstructured. This solution also allows users to include unstructure_as or not, as desired; if I weren't concerned about allowing unstructure_as I could do this with a simpler register_unstructure_hook(set, unstructure_set).

Dicts are amenable to a similar approach, I think, but they can also be handled at the JSON level with sort_keys=True. And if dicts are to be unstructured into a JSON-compatible way the keys all need to be converted to unique strings anyway, so those will always be sortable post-unstructuring. Thus for dicts I think we could get away with an unstruct_collection_overrides without any additional requirement.

In practice I ran into another complexity: one of my objects has an Identifier field, which I would like to be a union of three different (frozen(order=True)) attrs classes. I would like to be able to unstructure sets of identifiers into a sorted list. Unfortunately I haven't been able to get the different classes to compare to each other, even if I try making them subclasses of a common class. Thus I had to write an Identifier.key() function that extracts the key for each; so a sorting unstructure function needs to know to call that function on them. Which I think means I need to go back to requiring users to specify unstructure_as.

@Tinche
Copy link
Member

Tinche commented Mar 5, 2024

Which I think means I need to go back to requiring users to specify unstructure_as.

Are you wrapping cattrs in a framework of sorts? I've done that a lot and have almost always been able to know exactly what I was unstructuring, although that might not be the case for you unless you lean on type hinting a lot.

I played around with your example; instead of trying to sort and then dealing with a possible exception, I tried this:

c = Converter()


def unstructure_ordered_set(val: set) -> list:
    return c.unstructure(sorted(val))


c.register_unstructure_hook_func(
    lambda t: issubclass(getattr(t, "__origin__", type(None)), set)
    and is_orderable(t.__args__[0]),
    unstructure_ordered_set,
)


def is_orderable(type: type) -> bool:
    return type.__lt__ != object.__lt__

Obviously it's not the same as yours (only works for orderable classes) but it seems to work ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants