Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chained calls with medium/large dict/list literals are slow to typecheck #9427

Closed
huguesb opened this issue Sep 8, 2020 · 1 comment
Closed
Labels
bug mypy got something wrong performance

Comments

@huguesb
Copy link
Contributor

huguesb commented Sep 8, 2020

🐛 Bug Report

Variable assignments with chained calls converting dict/list literals are surprisingly slow to typecheck. Interestingly, breaking down the assignment into multiple steps often yields noticeable performance improvement.

To Reproduce

For instance, consider:

from decimal import Decimal
from collections import OrderedDict

GRID = OrderedDict(sorted({
    (93, 659): Decimal('0.3000'), (93, 714): Decimal('0.2950'), (93, 749): Decimal('0.2820'), (93, 799): Decimal('0.2690'), (93, 850): Decimal('0.1990'),  # noqa: E501
    (94, 659): Decimal('0.3000'), (94, 714): Decimal('0.2900'), (94, 749): Decimal('0.2745'), (94, 799): Decimal('0.2590'), (94, 850): Decimal('0.1820'),  # noqa: E501
    (95, 659): Decimal('0.3000'), (95, 714): Decimal('0.2850'), (95, 749): Decimal('0.2620'), (95, 799): Decimal('0.2390'), (95, 850): Decimal('0.1620'),  # noqa: E501
    (96, 659): Decimal('0.3000'), (96, 714): Decimal('0.2800'), (96, 749): Decimal('0.2495'), (96, 799): Decimal('0.2190'), (96, 850): Decimal('0.1570'),  # noqa: E501
    (97, 659): Decimal('0.2800'), (97, 714): Decimal('0.2750'), (97, 749): Decimal('0.2370'), (97, 799): Decimal('0.1990'), (97, 850): Decimal('0.1520'),  # noqa: E501
    (98, 659): Decimal('0.2710'), (98, 714): Decimal('0.2650'), (98, 749): Decimal('0.2220'), (98, 799): Decimal('0.1790'), (98, 850): Decimal('0.1470'),  # noqa: E501
    (99, 659): Decimal('0.2500'), (99, 714): Decimal('0.1990'), (99, 749): Decimal('0.1975'), (99, 799): Decimal('0.1670'), (99, 850): Decimal('0.1170'),  # noqa: E501
    (100, 659): Decimal('0.2500'), (100, 714): Decimal('0.1890'), (100, 749): Decimal('0.1500'), (100, 799): Decimal('0.1084'), (100, 850): Decimal('0.1000'),  # noqa: E501
}.items()))

and the same broken down into two steps:

from decimal import Decimal
from collections import OrderedDict

GRID = {
    (93, 659): Decimal('0.3000'), (93, 714): Decimal('0.2950'), (93, 749): Decimal('0.2820'), (93, 799): Decimal('0.2690'), (93, 850): Decimal('0.1990'),  # noqa: E501
    (94, 659): Decimal('0.3000'), (94, 714): Decimal('0.2900'), (94, 749): Decimal('0.2745'), (94, 799): Decimal('0.2590'), (94, 850): Decimal('0.1820'),  # noqa: E501
    (95, 659): Decimal('0.3000'), (95, 714): Decimal('0.2850'), (95, 749): Decimal('0.2620'), (95, 799): Decimal('0.2390'), (95, 850): Decimal('0.1620'),  # noqa: E501
    (96, 659): Decimal('0.3000'), (96, 714): Decimal('0.2800'), (96, 749): Decimal('0.2495'), (96, 799): Decimal('0.2190'), (96, 850): Decimal('0.1570'),  # noqa: E501
    (97, 659): Decimal('0.2800'), (97, 714): Decimal('0.2750'), (97, 749): Decimal('0.2370'), (97, 799): Decimal('0.1990'), (97, 850): Decimal('0.1520'),  # noqa: E501
    (98, 659): Decimal('0.2710'), (98, 714): Decimal('0.2650'), (98, 749): Decimal('0.2220'), (98, 799): Decimal('0.1790'), (98, 850): Decimal('0.1470'),  # noqa: E501
    (99, 659): Decimal('0.2500'), (99, 714): Decimal('0.1990'), (99, 749): Decimal('0.1975'), (99, 799): Decimal('0.1670'), (99, 850): Decimal('0.1170'),  # noqa: E501
    (100, 659): Decimal('0.2500'), (100, 714): Decimal('0.1890'), (100, 749): Decimal('0.1500'), (100, 799): Decimal('0.1084'), (100, 850): Decimal('0.1000'),  # noqa: E501
}

GG = OrderedDict(sorted(GRID.items()))

Expected Behavior

I would expect the two samples above to typecheck in roughly equivalent amounts of time.

Actual Behavior

Variant 1 is much slower to typecheck than variant 2: roughly 700ms vs 70ms (not using mypyc-compiled version)

This seems to arise from a combination of repeated computations when typechecking overloaded methods such as sorted, typechecking from root to leaf instead of leaf to root, and not caching resolved types for dict/list exprs.

Your Environment

  • Mypy version used: 0.790-dev 5906a5d
  • Mypy command-line flags: --ignore-missing-imports --show-traceback
  • Python version used: 3.7
  • Operating system and version: macOS 10.14.6
@huguesb huguesb added the bug mypy got something wrong label Sep 8, 2020
huguesb added a commit to huguesb/mypy that referenced this issue May 1, 2022
`fast_dict_type` and `fast_container_type` only allowed Instance but not Tuple of Instances
which was mostly an oversight, as opposed to an intentional omission.

For python#9427
huguesb added a commit to huguesb/mypy that referenced this issue May 1, 2022
`fast_dict_type` and `fast_container_type` only allowed Instance but not Tuple of Instances
which was mostly an oversight, as opposed to an intentional omission.

For python#9427
huguesb added a commit to huguesb/mypy that referenced this issue May 1, 2022
When a container (list, set, tuple, or dict) literal expression is
used as an argument to an overloaded function it will get repeatedly
typechecked. This becomes particularly problematic when the expression
is somewhat large, as seen in python#9427

To avoid repeated work, add a new field in the relevant AST nodes to
cache the resolved type of the expression. Right now the cache is
only used in the fast path, although it could conceivably be leveraged
for the slow path as well in a follow-up commit.

To further reduce duplicate work, when the fast-path doesn't work, we
use the cache to make a note of that, to avoid repeatedly attempting to
take the fast path.

Fixes python#9427
huguesb added a commit to huguesb/mypy that referenced this issue May 1, 2022
When a container (list, set, tuple, or dict) literal expression is
used as an argument to an overloaded function it will get repeatedly
typechecked. This becomes particularly problematic when the expression
is somewhat large, as seen in python#9427

To avoid repeated work, add a new field in the relevant AST nodes to
cache the resolved type of the expression. Right now the cache is
only used in the fast path, although it could conceivably be leveraged
for the slow path as well in a follow-up commit.

To further reduce duplicate work, when the fast-path doesn't work, we
use the cache to make a note of that, to avoid repeatedly attempting to
take the fast path.

Fixes python#9427
@huguesb
Copy link
Contributor Author

huguesb commented May 1, 2022

The combination of #12706 and #12707 fixes this particular reproducer. For a more robust fix we may consider a follow-up to #12707 that more broadly enables caching of resolved types in AST nodes

hauntsaninja pushed a commit that referenced this issue May 1, 2022
…ies (#12706)

`fast_dict_type` and `fast_container_type` only allowed Instance but not Tuple of Instances
which was mostly an oversight, as opposed to an intentional omission.

For #9427
huguesb added a commit to huguesb/mypy that referenced this issue May 3, 2022
When a container (list, set, tuple, or dict) literal expression is
used as an argument to an overloaded function it will get repeatedly
typechecked. This becomes particularly problematic when the expression
is somewhat large, as seen in python#9427

To avoid repeated work, add a new field in the relevant AST nodes to
cache the resolved type of the expression. Right now the cache is
only used in the fast path, although it could conceivably be leveraged
for the slow path as well in a follow-up commit.

To further reduce duplicate work, when the fast-path doesn't work, we
use the cache to make a note of that, to avoid repeatedly attempting to
take the fast path.

Fixes python#9427
huguesb added a commit to huguesb/mypy that referenced this issue May 16, 2022
When a container (list, set, tuple, or dict) literal expression is
used as an argument to an overloaded function it will get repeatedly
typechecked. This becomes particularly problematic when the expression
is somewhat large, as seen in python#9427

To avoid repeated work, add a new cache in ExprChecker, mapping the AST
node to the resolved type of the expression. Right now the cache is
only used in the fast path, although it could conceivably be leveraged
for the slow path as well in a follow-up commit.

To further reduce duplicate work, when the fast-path doesn't work, we
use the cache to make a note of that, to avoid repeatedly attempting to
take the fast path.

Fixes python#9427
@JukkaL JukkaL closed this as completed in 1b7e33f May 20, 2022
JukkaL pushed a commit that referenced this issue May 20, 2022
When a container (list, set, tuple, or dict) literal expression is
used as an argument to an overloaded function it will get repeatedly
typechecked. This becomes particularly problematic when the expression
is somewhat large, as seen in #9427

To avoid repeated work, add a new cache in ExprChecker, mapping the AST
node to the resolved type of the expression. Right now the cache is
only used in the fast path, although it could conceivably be leveraged
for the slow path as well in a follow-up commit.

To further reduce duplicate work, when the fast-path doesn't work, we
use the cache to make a note of that, to avoid repeatedly attempting to
take the fast path.

Fixes #9427
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug mypy got something wrong performance
Projects
None yet
Development

No branches or pull requests

2 participants