Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Deep collections.abc.Iterable type-checking 😬 #344

Open
rbroderi opened this issue Mar 16, 2024 · 4 comments
Open

[Feature Request] Deep collections.abc.Iterable type-checking 😬 #344

rbroderi opened this issue Mar 16, 2024 · 4 comments

Comments

@rbroderi
Copy link

Am i missing something or is this not implemented.

from beartype.door import is_bearable
from collecitons.abc import Iterable
a = [1]
b = ["string"]
c = [None]
is_bearable(a, Iterable[int]) -> true
is_bearable(b, Iterable[str]) -> true
is_bearable(c, Iterable[int]) -> true

But it works with list:

is_bearable(a, list[int]) -> true
is_bearable(b, list[int) -> false
is_bearable(c, list[int]) -> false
@TeamSpen210
Copy link

It's not safe to inspect the contents of an iterable, because that might change the value - think of a file object, for example.

@leycec
Copy link
Member

leycec commented Mar 19, 2024

...heh. @TeamSpen210 defends the indefensible. You're both right, of course. In the general case, deep type-checking of arbitrary iterables is dangerous. So-called "one-shot" iterables like generators are yet another example of iterables that cannot be safely iterated. To even look at them is to destroy them. They're like an inverse Medusa. 🐍

But @beartype was born for danger. This is why the next next release of @beartype ...so, beartype 0.19.0? will deeply type-check the proper subset of iterables that can be safely deeply type-checked. Mostly, this means:

  • Mappings (i.e., objects satisfying the collections.abc.Mappings ABC).
  • Sequences (i.e., objects satisfying the collections.abc.Sequence ABC).
  • Sets (i.e., objects satisfying the collections.abc.Set ABC).

And... that's it. I think, anyway. Technically, this feature request is a duplicate of long-standing decades-old feature request #53. But let's leave this open as a gentle reminder to myself that I should actually do this before the Heat Death of the Universe. I promise nothing and deliver even less.

@leycec leycec changed the title collections.abc.Iterable [Feature Request] Deep collections.abc.Iterable type-checking 😬 Mar 19, 2024
@asford
Copy link

asford commented Apr 2, 2024

It may be worth considering Collection as well...
It's a terrifyingly ambiguous case, because it can be repeat-iterated (unlike the aetherial Iterable),
however without __getitem__ it doesn't support beartype's ambition of limiting-case exhaustive type checking via a one-way random walk.

While BIG DATA certainly isn't lying about this case, it is, in the finest BIG DATA tradition, opaque as to be nearly deceptive.

That is to say... from the examples, one could blithely assume that bearcheck would (for example) say is_bearable("abcd", Collection[int]) == False.
However, one would be mistaken!

@leycec I think this issue could fork in two directions "doc bug" or "behavior bug"?

If "doc bug", we should probably float a big-fat-warning in the demos describing the current partial support for collections deep checking.

If "behavior bug", it might be good to work out the behavior for procedural checks using is_bearable.
Because we now claim type-narrowing on is_bearable; I'm afraid that the semantics of the procedural interface are different than the standard bearcheck interface. 😬

In the @bearcheck interface, were tolerant of "false positives" on the match. If bearcheck doesn't raise an error if the collection doesn't match, it's fine. Hopefully the type mismatch will be caught later!
In in the is_bearable typeguard interface, we're intolerant of false positives. Because the TypeGuard claims that a True result allows narrowing, is_bearable([1, 2, 3], Iterable[str]) breaks static analysis (and is a dangerous conditional).

Do you think adding a non-random-sampling, single element check via next(iter(obj)) for the baser forms of the collections.abc hierarchy (i.e. either Collection or (Sized & Iterable)) would be easy?
It also feels like if the target object is a Sequence beartype could just wing it anyway, even if just checking Iterable?

Perhaps the following deep-checking heuristic for is_bearable(obj, Iterable[type]):

  1. If the obj implements Sequence, deep-check a single random element for type.
  2. If the obj implements Collection (Sized, Iterable), deep-check the first elements via first(iter(obj)).
    One could gesture here to the idea of doing a stochastic n-element check,
    where beartype walks n-elements into the Collection for a small value of n.
    Say, the check depth is Poisson distributed at lambda=3?
  3. If the obj doesn't implement Iterable, mismatch.
  4. If the obj implements Iterable but a deep check is provided then... warn?
    Deep checking iterables via the procedural interface should be discouraged.

And deep-check Collection with the same logic:

  1. If the obj implements Sequence, deep-check a single random element for type.
  2. If the obj implements Collection (Sized, Iterable), deep-check the first elements via first(iter(obj)).
    One could gesture here to the idea of doing a stochastic n-element check,
    where beartype walks n-elements into the Collection for a small value of n.
    Say, the check depth is Poisson distributed at lambda=3?
  3. If the obj doesn't implement Collection, mismatch.

Happy to pitch in a bit on docs or code if you think it'd be useful.

@leycec
Copy link
Member

leycec commented Apr 6, 2024

MASSIVE COMBO! But first, big apologies for the long delay in my reply. The @beartype 0.18.0 release was a shambolic zombie that lurched about and destroyed everything including the entire Python ecosystem. Repairing that took intestinal fortitude and a willingness to roll almost everything back. This is why I'm currently clutching a teddy bear. Back to your stunning thoughts...

While BIG DATA certainly isn't lying about this case, it is, in the finest BIG DATA tradition, opaque as to be nearly deceptive.

🤣 😭

You're the Big Typing Boss, @asford. As always, I defer to your wisdom. You're right about everything, of course. Doco concerns are especially germane. Because I coded rather than did docos for a year, our docos are currently also a shambolic zombie that is rifling through your pockets while muttering, "Code... Code!" Our docos are in dire need of renovation is what I'm saying.

It would be spectacular if somebody who is not me could eventually document exactly what @beartype does for each type hint factory. It's no longer obvious to anyone (including me) what @beartype is currently doing, exactly. For example, we should document:

  • For each mapping type hint, that @beartype type-checks only the first key-value pair of the corresponding mapping object. Mapping type hints include all type hints of the form:
    • dict[{key_type}, {value_type}].
    • collections.defaultdict[{key_type}, {value_type}].
    • collections.abc.Mapping[{key_type}, {value_type}].
    • collections.abc.MutableMapping[{key_type}, {value_type}].
    • collections.abc.OrderedDict[{key_type}, {value_type}].
    • typing.DefaultDict[{key_type}, {value_type}].
    • typing.Dict[{key_type}, {value_type}].
    • typing.Mapping[{key_type}, {value_type}].
    • typing.MutableMapping[{key_type}, {value_type}].
    • typing.OrderedDict[{key_type}, {value_type}].
  • For each sequence type hint, that @beartype type-checks only a pseudo-random item of the corresponding sequence object. Sequence type hints include all type hints of the form:
    • list[{item_type}].
    • tuple[{item_type}, ...].
    • collections.abc.MutableSequence[{item_type}].
    • collections.abc.Sequence[{item_type}].
    • typing.List[{item_type}].
    • typing.MutableSequence[{item_type}].
    • typing.Sequence[{item_type}].
    • typing.Tuple[{item_type}, ...].
  • And so and so forth. Gah! How boring! My kidneys hurt just thinking about how boring this all is!

The ideal home for this documentation might be a new doc/src/api_hints.rst file linked to from our front page. I'm currently waving my hands wildly about.

I think this issue could fork in two directions "doc bug" or "behavior bug"?

Seriously. The "why not both?" girl is correct, as she always is. The current behaviour is mostly undocumented, which is bad. But the current behaviour also fails to deeply type-check collections annotated by Collection[...] type hints, which is even worse.

Documentation should probably take priority there – because documentation is the low-hanging fruit, right? In theory, somebody who is not me could even bang out documentation on our wiki without my permission or involvement. In practice, I would love that. I permit everything that I'm uninvolved with.

Do you think adding a non-random-sampling, single element check via next(iter(obj)) for the baser forms of the collections.abc hierarchy (i.e. either Collection or (Sized & Iterable)) would be easy?

Totally! Yes! Absolutely! ...*for various definitions of "totally," "yes," and "absolutely."

This is actually my next action item. I'm currently hip-deep in unnecessary micro-optimizations for the deep dictionary type-checking shipped with @beartype 0.18.0. I belatedly realized I could eliminate one excess statement for each such type-check in the code @beartype dynamically generates to type-check dictionary key-value pairs. Nobody cares. Yet, I care. Why? It's unclear. "Asperger's, mumble-mumble, something, profit" is probably the answer here.

After that, deep type-checking for all remaining collections that are safely type-checkable by simply inspecting their first items via next(iter(obj)) is up. This includes things like:

  • Arbitrary collections.
  • Dequeues.
  • Sets.
  • Maybe more? Not sure. I'm not sure of anything anymore, @asford.

Perhaps the following deep-checking heuristic for is_bearable(obj, Iterable[type]):

...heh. You're really clever, huh? Yeah. You've already guessed exactly what @beartype 0.20.0 is probably going to do. Probably. How did you guess that? Are you in my brain? Are you in telepathic communication with a parasitic brain worm currently rifling through my memories? If so, please ask where I left the salt. It's a conundrum and my stomach is demanding salty stuff.

Say, the check depth is Poisson distributed at lambda=3?

😮 😨 😱

Let's pretend you didn't just suggest a new BeartypeStrategy.OPoisson strategy in which @beartype type-checks according to a Poisson distribution. Everybody's gonna start wanting that! Noooooooooo! What... have... you... done... with... my... free... time... My video game backlog is now screaming.

I'll just leave this here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants