Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Docs] Add a new FAQ entry, Q: "How do I type-check types from optional dependencies that might not be installed on user machines?" A: "*Very* carefully." #329

Open
tvdboom opened this issue Feb 18, 2024 · 5 comments

Comments

@tvdboom
Copy link

tvdboom commented Feb 18, 2024

It would be nice to have a configuration option in beartype to make it check the input parameters but not check the output. I have two reasons:

  • The input is provided by the end user so I want that checked. The output is controlled by me so I trust it.
  • The output type depends on an optional dependency that the end user may or may not have installed. Correctly specifying all possible output types (including those of the optional dependency) could break the program if checked at runtime. With static type checkers I can import the dependcy savely with if TYPE_CHECKING.

Example:

from typing import TYPE_CHECKING
import pandas as pd

if TYPE_CHECKING:
    import polars as pl

@beartype
class A:
    def transform(X: pd.DataFrame) -> pd.DataFrame | pl.DataFrame:  # <-- beartype fails here because it doesn't recognize pl
        some logic...
@leycec
Copy link
Member

leycec commented Feb 19, 2024

Good news descends like @leycec onto the bed after a hard day of keyboard bashing. I love and live for typing puzzles like this. Thankfully, so does @beartype – because @beartype already does everything you want. Rejoice! Family Day in Canada ...don't ask is your lucky day.

You have two choices depending on whether you want to require Python ≥ 3.12 or not. You probably don't. You probably shouldn't. Still, the choice (as well as the power to destroy your package) is all yours:

  • Choice 1: You hate Python < 3.11. In this case, define a PEP 695 type alias:
from typing import TYPE_CHECKING, TypeAlias
import pandas as pd

# Define a "GenericDataFrame" type hint matching both Pandas and Pandera type hints.
#
# If static type-checking, define an obsolete PEP 613 type alias. Look. Just. Do. It.
if TYPE_CHECKING:
    import polars as pl
    GenericDataFrame: TypeAlias = pd.DataFrame | pl.DataFrame
# If actually running Python, define a modern PEP 695 type alias. Note that this
# requires Python ≥ 3.12. Python < 3.11 will raise a "SyntaxError" on attempting to
# import this module. The codebase you destroy may be your own.
else:
    # If Polars is installed, extend this type alias to cover both Pandas and Polars.
    try:
        import polars as pl
        type GenericDataFrame = pd.DataFrame | pl.DataFrame
    # Else, default this type alias to just cover Pandas.
    except ImportError:
        type GenericDataFrame = pd.DataFrame

@beartype
class A:
    def transform(X: pd.DataFrame) -> GenericDataFrame:  # <-- @beartype now hugs and squeezes you
        some logic...
  • Choice 2: You love Python < 3.11. In this case, define a PEP 484 NewType-style type alias:
from typing import TYPE_CHECKING, NewType, TypeAlias
import pandas as pd

# Define a "GenericDataFrame" type hint matching both Pandas and Pandera type hints.
#
# If static type-checking, define an obsolete PEP 613 type alias. Look. Just. Do. It.
if TYPE_CHECKING:
    import polars as pl
    GenericDataFrame: TypeAlias = pd.DataFrame | pl.DataFrame
# If actually running Python, define an archaic PEP 484 "NewType" type alias. It's
# best not to ask what's going on here. This is like the Bloodborne of typing.
else:
    # If Polars is installed, extend this type alias to cover both Pandas and Polars.
    try:
        import polars as pl
        GenericDataFrame = NewType('GenericDataFrame', pd.DataFrame | pl.DataFrame)
    # Else, default this type alias to just cover Pandas.
    except ImportError:
        GenericDataFrame = NewType('GenericDataFrame', pd.DataFrame)

@beartype
class A:
    def transform(X: pd.DataFrame) -> GenericDataFrame:  # <-- @beartype again hugs and squeezes you
        some logic...

Similar logic either way. The devil's in the try: block of the else: condition. Pick your typing poison. There are many to choose from. For your safety, @beartype supports all possible typing poisons. Warning labels on bottles are ignorable.

I Still Want @beartype to Ignore Everything

@beartype really doesn't want to ignore anything. Ignoring everything is for static type-checkers. They largely live in a hallucinatory world of make-believe, anyway. One more lie to a static type-checker doesn't mean much when you're already lying about everything.

@beartype and other runtime type-checkers like typeguard and Pydantic mostly never lie, though. It's best not to lie at runtime. Lies only defeat the purpose of type-checking, right? Might as well go cold turkey duck-typing at that point. 🦆 quack quack

I Still Want @beartype to Ignore Everything

...heh. Persistent, huh? Okay. Technically, what you want has been requested before. @beartype has been integrated into PyTorch's torch.onnx subpackage. That's good. If we recall correctly, PyTorch forces @beartype to do what you think you want to do here: unconditionally ignore all return type hints on PyTorch callables. That's bad, but good for you.

PyTorch probably has good reasons for doing this. Since PyTorch does this, this can't be bad. PyTorch is the future of humanity. Let's spec this out for the future of humanity:

# In "{your_package}.__init__":
from beartype import BeartypeConf
from beartype.claw import beartype_this_package

# Type-check this package with @beartype while ignoring *ALL* return type hints. This sucks! You know best.
beartype_this_package(conf=BeartypeConf(ignore_hints_param_names=('return',)))

Thus, we add a new ignore_hints_param_names: Iterable[str] = () parameter to our existing BeartypeConf dataclass. The API is a little obnoxious to use, but probably the most general-purpose means of achieving this. That's good. A single new parameter covers a gamut of use cases – including ignoring a combination of type hints annotating returns as well as parameters with various names.

Does "return" in the above example actually work? Theoretically, it does. "return" is Python's fake "parameter" name for the type hint annotating the return in every __annotations__ dunder dictionary storing type hints: e.g.,

>>> def muh_func() -> None: pass
>>> muh_func.__annotations__
{'return': 'None'}

This is why you can't define parameters named return in Python, interestingly. And that's how the QA was won. 💪 🐻

@leycec leycec changed the title [Feature Request] Configuration to stop beartype from checking output [Feature Request] Add a new BeartypeConf(ignore_hints_param_names option to ignore type hints on returns and parameters with various names Feb 19, 2024
@tvdboom
Copy link
Author

tvdboom commented Feb 19, 2024

thanks for the quick answer, but running

@beartype(conf=BeartypeConf(ignore_hints_param_names=("return",)))
class A:
   ...

returns

TypeError: BeartypeConf.__new__() got an unexpected keyword argument 'ignore_hints_param_names'

I am using beartype 0.17.2.

Ah, I think I understood your message wrongly. The feature is not implemented yet, but will be?

The issue I have with your first solution, is that the example I gave was somewhat simplified. In my real use case I am dealing with 5 optional dependencies with many more dataframe-like classes. I don't know how to add types to a typealias dynamically (mypy doesn't accept that), so how can I construct a series of try except blocks where I only add those classes from the available packages?

@leycec
Copy link
Member

leycec commented Feb 22, 2024

The feature is not implemented yet, but will be?

Exactamundo. Theoretically, anyway. However, this theory optimistically assumes I do something other than eat my wife's delicious spicy Thai curry and play video games. 🤔

In my real use case I am dealing with 5 optional dependencies with many more dataframe-like classes.

Oh. Now we're talking. A true typing puzzle emerges. The permutations rapidly grow out of control. Indeed, now I get it.

Thankfully, I know how to solve this one too. Doing so requires a bit more heavy lifting. Notably, we'd strongly benefit from an import_module() utility function that dynamically imports a module or package with the passed name and then returns that module or package if installed (i.e., importable) under the active Python interpreter or returns None otherwise. The implementation resembles:

from importlib import import_module as importlib_import_module
from types import ModuleType
from typing import Optional

def import_module(module_name: str) -> Optional[ModuleType]:
    try:
        return importlib_import_module(module_name)
    except ModuleNotFoundError:
        return None

Trivial, right? The above is (...more or less) exactly what @beartype itself internally does when it needs to do this sort of thing. So, you just know it's battle-hardened.

We now need one final piece of the puzzle. It turns out it's trivial to dynamically create new PEP 484-compliant union type hints from arbitrary sequences via an inscrutable one-liner that may make you go mad. Just sayin'. It works, but it's dark stuff. Nonetheless, this is yet again exactly what @beartype itself internally does when it needs to do this sort of thing – which is all the time, surprisingly.

Let's wrap that functionality in yet another utility function:

from typing import Iterable, Union

def make_union(iterable: Iterable) -> object:
    return Union.__getitem__(tuple(iterable))

Okay! We're now there. All the pieces of our puzzle are now in play. Let's generalize my simplistic solution above to dynamically support n arbitrary DataFrame-like classes:

from beartype.typing import TYPE_CHECKING, Iterable, Optional, TypeAlias, Union
from importlib import import_module as importlib_import_module
from types import ModuleType
import pandas as pd

def make_union(iterable: Iterable) -> object:
    return Union.__getitem__(tuple(iterable))

def import_module(module_name: str) -> Optional[ModuleType]:
    try:
        return importlib_import_module(module_name)
    except ModuleNotFoundError:
        return None

# Define a "GenericDataFrame" type hint matching both Pandas and Pandera type hints.
#
# If static type-checking, define an obsolete PEP 613 type alias. Look. Just. Do. It.
if TYPE_CHECKING:
    import polars as pl
    GenericDataFrame: TypeAlias = pd.DataFrame | pl.DataFrame
# If actually running Python, define an archaic PEP 484-style union type hint. It's
# best not to ask what's going on here. This is like the Bloodborne of typing.
else:
    # Tuple of the names of all packages providing "DataFrame"-like classes.
    DATAFRAME_PACKAGE_NAMES = ('pandas', 'polars', ...)  # <-- *INSERT OTHER PACKAGE NAMES HERE, YO*

    # List of all "DataFrame"-like classes defined by the subset of these packages
    # that are actually installed under the active Python interpreter.
    dataframe_types = []

    # For the name of each package providing a "DataFrame"-like class...
    for dataframe_package_name in DATAFRAME_PACKAGE_NAMES:
        # If this package is installed, append the "DataFrame"-like class
        # defined by this package to the above list.
        dataframe_package = import_module(dataframe_package_name)
        if dataframe_package:
            dataframe_types.append(dataframe_package.DataFrame)

    # Lastly, dynamically define a union covering these classes.
    GenericDataFrame = make_union(dataframe_types)

@beartype
class A:
    def transform(X: pd.DataFrame) -> GenericDataFrame:  # <-- @beartype again hugs and squeezes you
        some logic...

Does that work? No idea. Maybe it does. Maybe it explodes. Maybe you are now squinting and wondering why this is so hard. Just imagine trying to do this in any language other than Python, though. But don't. Your liver might explode. 💥

@tvdboom
Copy link
Author

tvdboom commented Feb 24, 2024

This solved it indeed. Thanks a lot!

@tvdboom tvdboom closed this as completed Feb 24, 2024
@leycec
Copy link
Member

leycec commented Feb 27, 2024

You're most welcome. This is such a fascinating technique, though. Would you mind if I quietly reopened this as a gentle reminder to myself? I'd like to write this up as a new FAQ entry. Since you found this useful, somebody else surely will, too. Let's pretend you just said:

"Sure! I code from a supermassive black hole at the centre of the Milky Way Galaxy. It's anything goes around here."

🕳️ ⚫

@leycec leycec reopened this Feb 27, 2024
@leycec leycec changed the title [Feature Request] Add a new BeartypeConf(ignore_hints_param_names option to ignore type hints on returns and parameters with various names [Docs] Add a new "How do I type-check classes across optional dependencies that might not be installed on user machines?" Feb 27, 2024
@leycec leycec changed the title [Docs] Add a new "How do I type-check classes across optional dependencies that might not be installed on user machines?" [Docs] Add a new FAQ entry, Q: "How do I type-check types from optional dependencies that might not be installed on user machines?" A: "*Very* carefully." Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants