Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exponentially slow typechecking when overloaded numpy functions used in generic containers like list/dict #14718

Open
Hnasar opened this issue Feb 16, 2023 · 0 comments
Labels
bug mypy got something wrong performance

Comments

@Hnasar
Copy link
Contributor

Hnasar commented Feb 16, 2023

Bug Report

If you have a list or dict with numpy functions, it takes too-long to typecheck.

We have a lot of code that uses numpy and I found that one of our 300-line modules was taking 2 hours to typecheck. Using --line-checking-stats I noticed that a single line had stats in the billions (1000x more than other lines). It was something like this:

    def get_real_df(self) -> DataFrame:
        groupby_agg = {
            "size": np.sum,
            "vol": np.sum,
            "price": np.sum,
            "last_size": np.sum,
            "last_vol": np.max,
            "is_active": np.all,
            ...
        }
        ... # <snip>
        return df.groupby("id").agg(groupby_agg)

To Reproduce

from numpy import sum
bad = [sum, sum, sum]  # try adding even more 'sum, ...' 
reveal_type(bad)

Expected Behavior

This to type check quickly.

Actual Behavior

This takes 4 seconds.

bad.py:8: note: Revealed type is "builtins.list[Overload(def [_SCT <: numpy.generic] (a: Union[numpy._typing._array_like._SupportsArray[numpy.dtype[_SCT`-1]], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype[_SCT`-1]]]], axis: None =, dtype: None =, out: None =, keepdims: builtins.bool =, initial: Union[builtins.int, builtins.float, builtins.complex, numpy.number[Any], numpy.bool_] =, where: Union[numpy._typing._array_like._SupportsArray[numpy.dtype[numpy.bool_]], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype[numpy.bool_]]], builtins.bool, numpy._typing._nested_sequence._NestedSequence[builtins.bool]] =) -> _SCT`-1, def [_SCT <: numpy.generic] (a: Union[numpy._typing._array_like._SupportsArray[numpy.dtype[_SCT`-1]], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype[_SCT`-1]]]], axis: None =, dtype: None =, out: None =, keepdims: builtins.bool =, initial: Union[builtins.int, builtins.float, builtins.complex, numpy.number[Any], numpy.bool_] =, where: Union[numpy._typing._array_like._SupportsArray[numpy.dtype[numpy.bool_]], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype[numpy.bool_]]], builtins.bool, numpy._typing._nested_sequence._NestedSequence[builtins.bool]] =) -> _SCT`-1, def [_SCT <: numpy.generic] (a: Union[numpy._typing._array_like._SupportsArray[numpy.dtype[_SCT`-1]], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype[_SCT`-1]]]], axis: None =, dtype: None =, out: None =, keepdims: builtins.bool =, initial: Union[builtins.int, builtins.float, builtins.complex, numpy.number[Any], numpy.bool_] =, where: Union[numpy._typing._array_like._SupportsArray[numpy.dtype[numpy.bool_]], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype[numpy.bool_]]], builtins.bool, numpy._typing._nested_sequence._NestedSequence[builtins.bool]] =) -> _SCT`-1, def [_SCT <: numpy.generic] (a: Union[numpy._typing._array_like._SupportsArray[numpy.dtype[_SCT`-1]], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype[_SCT`-1]]]], axis: None =, dtype: None =, out: None =, keepdims: builtins.bool =, initial: Union[builtins.int, builtins.float, builtins.complex, numpy.number[Any], numpy.bool_] =, where: Union[numpy._typing._array_like._SupportsArray[numpy.dtype[numpy.bool_]], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype[numpy.bool_]]], builtins.bool, numpy._typing._nested_sequence._NestedSequence[builtins.bool]] =) -> _SCT`-1, def (a: Union[numpy._typing._array_like._SupportsArray[numpy.dtype[Any]], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype[Any]]], builtins.bool, builtins.int, builtins.float, builtins.complex, builtins.str, builtins.bytes, numpy._typing._nested_sequence._NestedSequence[Union[builtins.bool, builtins.int, builtins.float, builtins.complex, builtins.str, builtins.bytes]]], axis: Union[None, Union[typing.SupportsIndex, typing.Sequence[typing.SupportsIndex]]] =, dtype: Union[numpy.dtype[Any], None, Type[Any], numpy._typing._dtype_like._SupportsDType[numpy.dtype[Any]], builtins.str, Union[Tuple[Any, builtins.int], Tuple[Any, Union[typing.SupportsIndex, typing.Sequence[typing.SupportsIndex]]], builtins.list[Any], TypedDict('numpy._typing._dtype_like._DTypeDict', {'names': typing.Sequence[builtins.str], 'formats': typing.Sequence[Any], 'offsets'?: typing.Sequence[builtins.int], 'titles'?: typing.Sequence[Any], 'itemsize'?: builtins.int, 'aligned'?: builtins.bool}), Tuple[Any, Any]]] =, out: None =, keepdims: builtins.bool =, initial: Union[builtins.int, builtins.float, builtins.complex, numpy.number[Any], numpy.bool_] =, where: Union[numpy._typing._array_like._SupportsArray[numpy.dtype[numpy.bool_]], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype[numpy.bool_]]], builtins.bool, numpy._typing._nested_sequence._NestedSequence[builtins.bool]] =) -> Any, def (a: Union[numpy._typing._array_like._SupportsArray[numpy.dtype[Any]], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype[Any]]], builtins.bool, builtins.int, builtins.float, builtins.complex, builtins.str, builtins.bytes, numpy._typing._nested_sequence._NestedSequence[Union[builtins.bool, builtins.int, builtins.float, builtins.complex, builtins.str, builtins.bytes]]], axis: Union[None, Union[typing.SupportsIndex, typing.Sequence[typing.SupportsIndex]]] =, dtype: Union[numpy.dtype[Any], None, Type[Any], numpy._typing._dtype_like._SupportsDType[numpy.dtype[Any]], builtins.str, Union[Tuple[Any, builtins.int], Tuple[Any, Union[typing.SupportsIndex, typing.Sequence[typing.SupportsIndex]]], builtins.list[Any], TypedDict('numpy._typing._dtype_like._DTypeDict', {'names': typing.Sequence[builtins.str], 'formats': typing.Sequence[Any], 'offsets'?: typing.Sequence[builtins.int], 'titles'?: typing.Sequence[Any], 'itemsize'?: builtins.int, 'aligned'?: builtins.bool}), Tuple[Any, Any]]] =, out: None =, keepdims: builtins.bool =, initial: Union[builtins.int, builtins.float, builtins.complex, numpy.number[Any], numpy.bool_] =, where: Union[numpy._typing._array_like._SupportsArray[numpy.dtype[numpy.bool_]], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype[numpy.bool_]]], builtins.bool, numpy._typing._nested_sequence._NestedSequence[builtins.bool]] =) -> Any, def (a: Union[numpy._typing._array_like._SupportsArray[numpy.dtype[Any]], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype[Any]]], builtins.bool, builtins.int, builtins.float, builtins.complex, builtins.str, builtins.bytes, numpy._typing._nested_sequence._NestedSequence[Union[builtins.bool, builtins.int, builtins.float, builtins.complex, builtins.str, builtins.bytes]]], axis: Union[None, Union[typing.SupportsIndex, typing.Sequence[typing.SupportsIndex]]] =, dtype: Union[numpy.dtype[Any], None, Type[Any], numpy._typing._dtype_like._SupportsDType[numpy.dtype[Any]], builtins.str, Union[Tuple[Any, builtins.int], Tuple[Any, Union[typing.SupportsIndex, typing.Sequence[typing.SupportsIndex]]], builtins.list[Any], TypedDict('numpy._typing._dtype_like._DTypeDict', {'names': typing.Sequence[builtins.str], 'formats': typing.Sequence[Any], 'offsets'?: typing.Sequence[builtins.int], 'titles'?: typing.Sequence[Any], 'itemsize'?: builtins.int, 'aligned'?: builtins.bool}), Tuple[Any, Any]]] =, out: None =, keepdims: builtins.bool =, initial: Union[builtins.int, builtins.float, builtins.complex, numpy.number[Any], numpy.bool_] =, where: Union[numpy._typing._array_like._SupportsArray[numpy.dtype[numpy.bool_]], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype[numpy.bool_]]], builtins.bool, numpy._typing._nested_sequence._NestedSequence[builtins.bool]] =) -> Any, def (a: Union[numpy._typing._array_like._SupportsArray[numpy.dtype[Any]], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype[Any]]], builtins.bool, builtins.int, builtins.float, builtins.complex, builtins.str, builtins.bytes, numpy._typing._nested_sequence._NestedSequence[Union[builtins.bool, builtins.int, builtins.float, builtins.complex, builtins.str, builtins.bytes]]], axis: Union[None, Union[typing.SupportsIndex, typing.Sequence[typing.SupportsIndex]]] =, dtype: Union[numpy.dtype[Any], None, Type[Any], numpy._typing._dtype_like._SupportsDType[numpy.dtype[Any]], builtins.str, Tuple[Any, builtins.int], Tuple[Any, Union[typing.SupportsIndex, typing.Sequence[typing.SupportsIndex]]], builtins.list[Any], TypedDict('numpy._typing._dtype_like._DTypeDict', {'names': typing.Sequence[builtins.str], 'formats': typing.Sequence[Any], 'offsets'?: typing.Sequence[builtins.int], 'titles'?: typing.Sequence[Any], 'itemsize'?: builtins.int, 'aligned'?: builtins.bool}), Tuple[Any, Any]] =, out: None =, keepdims: builtins.bool =, initial: Union[builtins.int, builtins.float, builtins.complex, numpy.number[Any], numpy.bool_] =, where: Union[numpy._typing._array_like._SupportsArray[numpy.dtype[numpy.bool_]], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype[numpy.bool_]]], builtins.bool, numpy._typing._nested_sequence._NestedSequence[builtins.bool]] =) -> Any, def [_ArrayType <: numpy.ndarray[Any, numpy.dtype[Any]]] (a: Union[numpy._typing._array_like._SupportsArray[numpy.dtype[Any]], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype[Any]]], builtins.bool, builtins.int, builtins.float, builtins.complex, builtins.str, builtins.bytes, numpy._typing._nested_sequence._NestedSequence[Union[builtins.bool, builtins.int, builtins.float, builtins.complex, builtins.str, builtins.bytes]]], axis: Union[None, Union[typing.SupportsIndex, typing.Sequence[typing.SupportsIndex]]] =, dtype: Union[numpy.dtype[Any], None, Type[Any], numpy._typing._dtype_like._SupportsDType[numpy.dtype[Any]], builtins.str, Union[Tuple[Any, builtins.int], Tuple[Any, Union[typing.SupportsIndex, typing.Sequence[typing.SupportsIndex]]], builtins.list[Any], TypedDict('numpy._typing._dtype_like._DTypeDict', {'names': typing.Sequence[builtins.str], 'formats': typing.Sequence[Any], 'offsets'?: typing.Sequence[builtins.int], 'titles'?: typing.Sequence[Any], 'itemsize'?: builtins.int, 'aligned'?: builtins.bool}), Tuple[Any, Any]]] =, out: _ArrayType`-1 =, keepdims: builtins.bool =, initial: Union[builtins.int, builtins.float, builtins.complex, numpy.number[Any], numpy.bool_] =, where: Union[numpy._typing._array_like._SupportsArray[numpy.dtype[numpy.bool_]], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype[numpy.bool_]]], builtins.bool, numpy._typing._nested_sequence._NestedSequence[builtins.bool]] =) -> _ArrayType`-1)]"
Success: no issues found in 1 source file

real    0m4.010s

Workaround

If you add an explicit type then it typechecks quickly:

import numpy as np
from typing import Any, Callable

okay: list[Callable[..., Any]] = [np.sum, np.sum, np.sum]

Your Environment

  • Mypy version used: 1.0.0 (compiled: no -- because I wanted to profile with py-spy; pip install --force mypy --no-binary :all)
  • Mypy command-line flags: none
  • Mypy configuration options from mypy.ini (and other config files): none
  • Python version used: 3.10
  • Numpy version: 1.24.2

Note: I also tried mypy 0.982 and it was about 10x slower. The performance improvements (e.g. #13821) definitely helped!

Profile

I graphed the relation between the size of the list and the time to typecheck:
image

and I ran it through py-spy:

py-spy record -o profile.svg -- python3.10 -m mypy bad.py

image

@Hnasar Hnasar added the bug mypy got something wrong label Feb 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug mypy got something wrong performance
Projects
None yet
Development

No branches or pull requests

2 participants