Chained calls with medium/large dict/list literals are slow to typecheck #9427

huguesb · 2020-09-08T03:24:25Z

🐛 Bug Report

Variable assignments with chained calls converting dict/list literals are surprisingly slow to typecheck. Interestingly, breaking down the assignment into multiple steps often yields noticeable performance improvement.

To Reproduce

For instance, consider:

from decimal import Decimal
from collections import OrderedDict

GRID = OrderedDict(sorted({
    (93, 659): Decimal('0.3000'), (93, 714): Decimal('0.2950'), (93, 749): Decimal('0.2820'), (93, 799): Decimal('0.2690'), (93, 850): Decimal('0.1990'),  # noqa: E501
    (94, 659): Decimal('0.3000'), (94, 714): Decimal('0.2900'), (94, 749): Decimal('0.2745'), (94, 799): Decimal('0.2590'), (94, 850): Decimal('0.1820'),  # noqa: E501
    (95, 659): Decimal('0.3000'), (95, 714): Decimal('0.2850'), (95, 749): Decimal('0.2620'), (95, 799): Decimal('0.2390'), (95, 850): Decimal('0.1620'),  # noqa: E501
    (96, 659): Decimal('0.3000'), (96, 714): Decimal('0.2800'), (96, 749): Decimal('0.2495'), (96, 799): Decimal('0.2190'), (96, 850): Decimal('0.1570'),  # noqa: E501
    (97, 659): Decimal('0.2800'), (97, 714): Decimal('0.2750'), (97, 749): Decimal('0.2370'), (97, 799): Decimal('0.1990'), (97, 850): Decimal('0.1520'),  # noqa: E501
    (98, 659): Decimal('0.2710'), (98, 714): Decimal('0.2650'), (98, 749): Decimal('0.2220'), (98, 799): Decimal('0.1790'), (98, 850): Decimal('0.1470'),  # noqa: E501
    (99, 659): Decimal('0.2500'), (99, 714): Decimal('0.1990'), (99, 749): Decimal('0.1975'), (99, 799): Decimal('0.1670'), (99, 850): Decimal('0.1170'),  # noqa: E501
    (100, 659): Decimal('0.2500'), (100, 714): Decimal('0.1890'), (100, 749): Decimal('0.1500'), (100, 799): Decimal('0.1084'), (100, 850): Decimal('0.1000'),  # noqa: E501
}.items()))

and the same broken down into two steps:

from decimal import Decimal
from collections import OrderedDict

GRID = {
    (93, 659): Decimal('0.3000'), (93, 714): Decimal('0.2950'), (93, 749): Decimal('0.2820'), (93, 799): Decimal('0.2690'), (93, 850): Decimal('0.1990'),  # noqa: E501
    (94, 659): Decimal('0.3000'), (94, 714): Decimal('0.2900'), (94, 749): Decimal('0.2745'), (94, 799): Decimal('0.2590'), (94, 850): Decimal('0.1820'),  # noqa: E501
    (95, 659): Decimal('0.3000'), (95, 714): Decimal('0.2850'), (95, 749): Decimal('0.2620'), (95, 799): Decimal('0.2390'), (95, 850): Decimal('0.1620'),  # noqa: E501
    (96, 659): Decimal('0.3000'), (96, 714): Decimal('0.2800'), (96, 749): Decimal('0.2495'), (96, 799): Decimal('0.2190'), (96, 850): Decimal('0.1570'),  # noqa: E501
    (97, 659): Decimal('0.2800'), (97, 714): Decimal('0.2750'), (97, 749): Decimal('0.2370'), (97, 799): Decimal('0.1990'), (97, 850): Decimal('0.1520'),  # noqa: E501
    (98, 659): Decimal('0.2710'), (98, 714): Decimal('0.2650'), (98, 749): Decimal('0.2220'), (98, 799): Decimal('0.1790'), (98, 850): Decimal('0.1470'),  # noqa: E501
    (99, 659): Decimal('0.2500'), (99, 714): Decimal('0.1990'), (99, 749): Decimal('0.1975'), (99, 799): Decimal('0.1670'), (99, 850): Decimal('0.1170'),  # noqa: E501
    (100, 659): Decimal('0.2500'), (100, 714): Decimal('0.1890'), (100, 749): Decimal('0.1500'), (100, 799): Decimal('0.1084'), (100, 850): Decimal('0.1000'),  # noqa: E501
}

GG = OrderedDict(sorted(GRID.items()))

Expected Behavior

I would expect the two samples above to typecheck in roughly equivalent amounts of time.

Actual Behavior

Variant 1 is much slower to typecheck than variant 2: roughly 700ms vs 70ms (not using mypyc-compiled version)

This seems to arise from a combination of repeated computations when typechecking overloaded methods such as sorted, typechecking from root to leaf instead of leaf to root, and not caching resolved types for dict/list exprs.

Your Environment

Mypy version used: 0.790-dev 5906a5d
Mypy command-line flags: --ignore-missing-imports --show-traceback
Python version used: 3.7
Operating system and version: macOS 10.14.6

The text was updated successfully, but these errors were encountered:

`fast_dict_type` and `fast_container_type` only allowed Instance but not Tuple of Instances which was mostly an oversight, as opposed to an intentional omission. For python#9427

When a container (list, set, tuple, or dict) literal expression is used as an argument to an overloaded function it will get repeatedly typechecked. This becomes particularly problematic when the expression is somewhat large, as seen in python#9427 To avoid repeated work, add a new field in the relevant AST nodes to cache the resolved type of the expression. Right now the cache is only used in the fast path, although it could conceivably be leveraged for the slow path as well in a follow-up commit. To further reduce duplicate work, when the fast-path doesn't work, we use the cache to make a note of that, to avoid repeatedly attempting to take the fast path. Fixes python#9427

huguesb · 2022-05-01T07:29:58Z

The combination of #12706 and #12707 fixes this particular reproducer. For a more robust fix we may consider a follow-up to #12707 that more broadly enables caching of resolved types in AST nodes

…ies (#12706) `fast_dict_type` and `fast_container_type` only allowed Instance but not Tuple of Instances which was mostly an oversight, as opposed to an intentional omission. For #9427

When a container (list, set, tuple, or dict) literal expression is used as an argument to an overloaded function it will get repeatedly typechecked. This becomes particularly problematic when the expression is somewhat large, as seen in python#9427 To avoid repeated work, add a new field in the relevant AST nodes to cache the resolved type of the expression. Right now the cache is only used in the fast path, although it could conceivably be leveraged for the slow path as well in a follow-up commit. To further reduce duplicate work, when the fast-path doesn't work, we use the cache to make a note of that, to avoid repeatedly attempting to take the fast path. Fixes python#9427

When a container (list, set, tuple, or dict) literal expression is used as an argument to an overloaded function it will get repeatedly typechecked. This becomes particularly problematic when the expression is somewhat large, as seen in python#9427 To avoid repeated work, add a new cache in ExprChecker, mapping the AST node to the resolved type of the expression. Right now the cache is only used in the fast path, although it could conceivably be leveraged for the slow path as well in a follow-up commit. To further reduce duplicate work, when the fast-path doesn't work, we use the cache to make a note of that, to avoid repeatedly attempting to take the fast path. Fixes python#9427

When a container (list, set, tuple, or dict) literal expression is used as an argument to an overloaded function it will get repeatedly typechecked. This becomes particularly problematic when the expression is somewhat large, as seen in #9427 To avoid repeated work, add a new cache in ExprChecker, mapping the AST node to the resolved type of the expression. Right now the cache is only used in the fast path, although it could conceivably be leveraged for the slow path as well in a follow-up commit. To further reduce duplicate work, when the fast-path doesn't work, we use the cache to make a note of that, to avoid repeatedly attempting to take the fast path. Fixes #9427

huguesb added the bug mypy got something wrong label Sep 8, 2020

AlexWaygood added the performance label Mar 27, 2022

huguesb mentioned this issue May 1, 2022

checkexpr: speedup typechecking of container literals with tuple entries #12706

Merged

huguesb mentioned this issue May 1, 2022

checkexpr: cache type of container literals when possible #12707

Merged

JukkaL closed this as completed in 1b7e33f May 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chained calls with medium/large dict/list literals are slow to typecheck #9427

Chained calls with medium/large dict/list literals are slow to typecheck #9427

huguesb commented Sep 8, 2020

huguesb commented May 1, 2022

Chained calls with medium/large dict/list literals are slow to typecheck #9427

Chained calls with medium/large dict/list literals are slow to typecheck #9427

Comments

huguesb commented Sep 8, 2020

🐛 Bug Report

To Reproduce

Expected Behavior

Actual Behavior

Your Environment

huguesb commented May 1, 2022