mypy 0.982 run 15 minutes (previous version needs only one minute) #13821

Behoston · 2022-10-05T21:55:22Z

Bug Report

My project is not open-source but I can give you (mypy maintainers) access to repository on gitlab.

After updating mypy to newest version it takes 15 minutes to complete on CI and 8 minutes on my PC. When I rollback to previous release it will works fine (about 15 seconds on local machine and about 45 on CI).

I have base class for about 100 subclasses and mypy hangs always in same place.

LOG:  Deleting dostanesie.core.formulas.university.ur.base dostanesie/core/formulas/university/ur/base.py dostanesie/core/formulas/university/ur/base.meta.json dostanesie/core/formulas/university/ur/base.data.json
LOG:  Processing SCC singleton (dostanesie.core.formulas.university.uph.base) as inherently stale with stale deps (dostanesie.core.formulas.code)
LOG:  Deleting dostanesie.core.formulas.university.uph.base dostanesie/core/formulas/university/uph/base.py dostanesie/core/formulas/university/uph/base.meta.json dostanesie/core/formulas/university/uph/base.data.json
LOG:  Processing SCC singleton (dostanesie.core.formulas.university.up_wroclaw.base) as inherently stale with stale deps (dostanesie.core.formulas.code)
LOG:  Deleting dostanesie.core.formulas.university.up_wroclaw.base dostanesie/core/formulas/university/up_wroclaw/base.py dostanesie/core/formulas/university/up_wroclaw/base.meta.json dostanesie/core/formulas/university/up_wroclaw/base.data.json
LOG:  Processing SCC singleton (dostanesie.core.formulas.university.up_poznan.base) as inherently stale with stale deps (dostanesie.core.formulas.code)
LOG:  Deleting dostanesie.core.formulas.university.up_poznan.base dostanesie/core/formulas/university/up_poznan/base.py dostanesie/core/formulas/university/up_poznan/base.meta.json dostanesie/core/formulas/university/up_poznan/base.data.json
LOG:  Processing SCC singleton (dostanesie.core.formulas.university.up_lublin.base) as inherently stale with stale deps (dostanesie.core.formulas.code)

I tried deleting mypy cache dir locally, on CI we build new docker image each run so there is no mypy cache.

To Reproduce

I really don't know :(

Expected Behavior

Mypy should run about one minute for version 0.971.

LOG:  Build finished in 14.859 seconds with 276 modules, and 0 errors

Actual Behavior

On CI it takes more than 15 minutes, on local machine 8 minutes.

Your Environment

Mypy version used: 0.982
Mypy command-line flags: none
Mypy configuration options from mypy.ini (and other config files):

[mypy]
ignore_missing_imports = true

Python version used: Python 3.10.7+

Full mypy log

The text was updated successfully, but these errors were encountered:

ilevkivskyi · 2022-10-09T23:46:18Z

Such things are hard to diagnose. You can try bisecting it to a specific mypy commit, then it will be much easier.

Behoston · 2022-10-10T10:33:41Z

I definitely will try today. Also I will try to extract problematic part of our code and share it on GH

Behoston · 2022-10-10T11:02:47Z

3ae19a2 this takes more than 25 minutes

I run this in the background and killed after 25 minutes, it takes longer than previous because I use my PC.

JukkaL · 2022-10-10T13:10:58Z

I looked at the commit, and a few things seemed potentially suspicious: changes in sum(), subprocess and unittest. If the file where mypy is very slow uses some of these, could you try commenting out the relevant lines to see if it speeds things up?

Behoston · 2022-10-10T13:19:45Z

replaced almost all sum with any and it takes 30 seconds for commit above (not compiled) and 14 seconds for newest version (compiled). We have something between 100 and 200 sum in this package.

JukkaL · 2022-10-10T14:07:08Z

That would indicate that the sum change is likely the root cause for the regression, since when using sum the run was was over 25 minutes. Does this sound right?

Behoston · 2022-10-10T14:46:02Z

In my case: True

we can stick to 0.971, newest version of mypy is not crucial. However, it can affect many other projects CI/CD, especially calculation-dependent (our backend is one big calculator with hundreds calculator classes and some REST API)

JukkaL · 2022-10-10T15:59:50Z

Yeah, this seems pretty bad and we should try to fix the regression.

It would be very helpful if you could you try to narrow this down a bit -- it's possible that there is some specific call or calls to sum that are very slow. For example, replace the first 50% of the sum calls and then the remaining 50%, in order to narrow things down/bisect (and measure afterwards), and continue until you have a small set of calls that trigger a significant performance regression. This way you may be able to produce a small code example that triggers slow type checking.

huguesb · 2022-10-11T02:09:35Z

@Behoston you could try adding --timing-stats <path> to your mypy invocation. It will output a simple text file with per-module timings (one module per line). That should help narrow down which files are most impacted by the change, and from there you could further refine which statements in the specific files are most impacted.

KelSolaar · 2022-10-28T20:52:00Z

Hello,

We are experiencing similar issues with Colour: We went from ~30mins to ~1h 20mins on CI when updating our dependencies, one of them was mypy==0.961 --> mypy==0.982.

Cheers,

Thomas

headtr1ck · 2022-10-29T16:29:39Z

Our project also went from ~5min to ~25min going from 0.971 to 0.981/2.

hauntsaninja · 2022-10-29T18:19:59Z

It would be useful if you could post the details of your projects, so we can reproduce the slow down. As is mentioned above, it's very hard to diagnose these things in the abstract.

If your project is not open source, it would be very helpful if you could make a shareable repro of the bad behaviour. For example, try some of the strategies mentioned above:

Use --timing-stats to narrow down which module is slow
Potentially look for invocations of sum (since that's related to OP's problem, lots of thanks to Behoston for narrowing that down)

hauntsaninja · 2022-10-29T18:39:23Z

We can probably just revert the change to sum, it's a somewhat weird change to have: #13961

In all cases, would love to have small repros of the performance regression

Behoston · 2022-11-07T11:39:40Z

https://github.com/dostanesie-pl/mypy-bug @hauntsaninja I published part of our code.

headtr1ck · 2022-11-07T17:19:10Z

Unfortunately I cannot publish our code.
But looking at --timing-stats I noticed that in 0.971 all external libs are displayed with "0", while in 0.982 they are ">0".

Maybe that has nothing to do with the cause for this, some of our modules have increased by a factor of 10, while others stay the same.
We do not use pythons builtin sum though.

Behoston · 2022-11-10T15:35:32Z

Works much better on version 0.990

JMMarchant · 2022-11-11T13:01:53Z

We're still seeing substantial slowdowns on version 0.990; is there an easy way to see which lines are slow (have already used --timing-stats to narrow down the files)

Behoston · 2022-11-11T16:04:49Z

In my case time is acceptable (at least locally, I need to fix some code so not tested on Ci yet), so I cannot help with 0.990 pt., 11 lis 2022, 14:02 użytkownik James Marchant ***@***.***> napisał:

…

We're still seeing substantial slowdowns on version 0.990; is there an easy way to see which lines are slow (have already used --timing-stats to narrow down the files) — Reply to this email directly, view it on GitHub <#13821 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB3WCSOZ3NLFXLQB75EGAODWHY7U3ANCNFSM6AAAAAAQ56WG3I> . You are receiving this because you were mentioned.Message ID: ***@***.***>

KelSolaar · 2022-11-13T05:10:43Z

0.990 did not really change anything for us, still 1h20+ check times here.

JMMarchant · 2022-11-14T12:17:09Z

Have bisected the commits between 0.971 and 0.981 to identify which one causes the slowdown for us. For us it seems to be d27bff62f71a8b914b1df239467148e81d2e88a2:

Commit	No cache runtime	Cached runtime
`d2063d2`	4m 55.01s	3.846s
`d27bff6`	Stopped after 1hr 11m	N/A

ilevkivskyi · 2022-11-14T12:41:48Z

Hm, that PR is almost a pure refactor. Actually I have something to suspect. Potentially you have some literal type with a lot of values or some large enum. There is a function try_contracting_literals_in_union() that may be computationally intensive, previously it was (inconsistently) only used in is_subtype(), after that PR it is used also by is_proper_subtype(). But this is just a wild guess. If you don't want to share your code, you will likely need to use mypy with a profiler to compare before and after. Note that CProfile will only work with interpreted mypy, for compiled version may need to use https://pypi.org/project/py-spy/

JMMarchant · 2022-11-15T10:39:40Z

I ran py-spy on interpreted mypy 0.990 (mypy 0.990 (compiled: no)) as slow down is still present in there; it unfortunately isn't a full run as it got to 90 minutes but it looks to capture where it's spending most of the time during that process. I've attached the profiling with personal details removed but it looks like you're right and that is_subtype() and is_proper_subtype() are potentially the culprits.

JMMarchant · 2022-11-15T10:57:32Z

For comparison here's the mypy 0.971 (compiled: no) profile (only minor changes to address any issues in 0.990 which were mostly now unused type: ignores)

ilevkivskyi · 2022-11-15T17:22:59Z

@JMMarchant thanks for the data. Unfortunately, I still can't figure out what is going on. Would it be possible to profile on the last good and first bad commit? Just to exclude other possible changes to the call graph (there were quite a lot in recent couple releases). Also could you please post your mypy config?

JMMarchant · 2022-11-16T14:41:17Z

@JMMarchant thanks for the data. Unfortunately, I still can't figure out what is going on. Would it be possible to profile on the last good and first bad commit? Just to exclude other possible changes to the call graph (there were quite a lot in recent couple releases). Also could you please post your mypy config?

Commit	No cache runtime	Cached runtime
`d2063d2`	~5m	~4s
`d27bff6`	~2hrs	~4s

mypy.ini:

[mypy]
mypy_path = $MYPY_CONFIG_FILE_DIR/stubs
files =
    src/,
    dev-scripts/,
    docs/,
    scripts/,
    setup.py,
    tests/

; Error message config
show_error_context = True
show_column_numbers = True
show_error_codes = True
pretty = True

; Untyped definitions and calls
disallow_untyped_defs = True
disallow_incomplete_defs = True
check_untyped_defs = True
disallow_untyped_decorators = True
; None and optional
no_implicit_optional = True
; Warnings
warn_redundant_casts = True
warn_unused_ignores = True
warn_no_return = True
warn_return_any = True
warn_unreachable = True

; These are all the third-party libraries we have generated partial or full stubs
; for. We are a bit more permissive for what is allowed in these stubs/how detailed
; they need to be. They can all be found in the stubs/ folder.
;    - albumentations
;    - envyaml
;    - fire
;    - GPUtil
;    - grpc
;    - intermine
;    - lightgbm
;    - marshmallow_enum
;    - marshmallow_union
;    - matplotlib
;    - methodtools
;    - opacus
;    - pandasql
;    - pdoc
;    - pyarrow
;    - pydicom
;    - pytorch_tabnet
;    - scipy
;    - skimage
;    - sklearn
;    - snsql
;    - sphobjinv
;    - sqlparse
;    - sqlvalidator
;    - statsmodels
;    - torchvision
;    - wirerope
[mypy-albumentations.*,envyaml.*,fire.*,GPUtil.*,grpc.*,intermine.*,lightgbm.*,marshmallow_enum.*,marshmallow_union.*,matplotlib.*,methodtools.*,opacus.*,pandasql.*,pdoc.*,pyarrow.*,pydicom.*,pytorch_tabnet.*,scipy.*,skimage.*,sklearn.*,snsql.*,sphobjinv.*,sqlparse.*,sqlvalidator.*,statsmodels.*,torchvision.*,wirerope.*]
allow_incomplete_defs = True
allow_untyped_defs = True
disallow_any_unimported = True


[mypy-docs.*]
disallow_any_expr = True


; Suppress less useful errors for tests
[mypy-tests.*]
; Tests with multiple asserts may cause code to be marked as unreachable even
; though it's fine. See https://github.com/python/mypy/issues/9457
warn_unreachable = False

; Suppress errors for untyped third party test libraries:
; - chromedriver_autoinstaller
; - jupytext
; - nbclient
; - nbformat
; - pytest_flask
; - requests_toolbelt
; - testbook
[mypy-chromedriver_autoinstaller.*,jupytext.*,nbclient.*,nbformat.*,pytest_flask.*,requests_toolbelt.*,testbook.*]
ignore_missing_imports = True

This should help with the investigation of tricky performance regressions like #13821. I tried to implement in such a way that it will give minimal impact when not used (since I am touching a hot method).

ilevkivskyi · 2022-11-23T14:35:25Z

@JMMarchant After looking a bit more at these, it seems like something may be slow with union handling and/or overloads. I just merged a PR that adds an optimization to overloads, could you please:

Check mypy perf on current master
Try the new internal flag --line-checking-stats (also merged today) to see where in your code mypy is stuck

huguesb · 2022-12-10T09:48:03Z

I am also seeing significant performance regression between 0.971 and master: 5min -> 20min for a mypyc-compiled run on my codebase.

The per-file time stats point to a file that has a large function with many if/else branches depending on values of a very large Enum (~13min for a single file).

However the per-line time stats seem to be inconsistent with the per-file stats, and while the same file does show up with some slow lines, the numbers don't add up to the same amount as the per-file time stats. The largest lines on the per-line stats point to overloaded methods taking the enum value as a parameter (specifically dict.get), which at that point I expect to be a gigantic Union of Literal values due to type narrowing in the if branches.

I will try to reduce this file into a more practical reproducer and profile it in the next few days.

ilevkivskyi · 2022-12-10T09:57:18Z

Times are expected to not add up exactly because line stats only count time spent in expressions (and only for type checking). There may be some additional time spent analyzing enclosing statement, but it is hard to attribute a line number to this and such time is usually small.

huguesb · 2022-12-10T17:12:19Z

Times are expected to not add up exactly because line stats only count time spent in expressions (and only for type checking). There may be some additional time spent analyzing enclosing statement, but it is hard to attribute a line number to this and such time is usually small.

I did expect a small difference, but this is not even close:

per-file timing: 790953050us
sum of per-line timings: 1287098.9us

There's a >600x mismatch!

huguesb · 2022-12-10T19:46:36Z

Profiling results are interesting! There are ~20 if statement branches to visit, and each branch spends ~2min (pure-python run) in the update_from_options method of binder.frame_context after processing the meat of the if statement. Pretty much the entire time is spent in subtype checks.

This also explains the mismatch in timings, as the per-line timings are only collected in ExpressionChecker, and not in TypeChecker, which is were all of this work is happening.

Relevant speedscope output for a single if statement branch:

@ilevkivskyi I think the merging of subtype visitors is likely to blame for that. I have a hunch that some of the optimizations in the proper subtype check got lost along the way. For instance _is_proper_subtype used to have the following fast path, which seems relevant for the kind of large Union of Literal produced by enum narrowing:

    if isinstance(right, UnionType) and not isinstance(left, UnionType):
        return any(
            is_proper_subtype(
                orig_left,
                item,
                ignore_promotions=ignore_promotions,
                erase_instances=erase_instances,
                keep_erased_types=keep_erased_types,
            )
            for item in right.items
        )

This sort of issue/regression has come up a number of times. It seems to me it might be worth revisiting the way Enum narrowing is done. Maybe instead of always exploding them into Union types and then having to special case those unions in a number of places, only to try to reduce them back to the original Enum later on, we should always keep the enum type itself, and attach extra info to it to indicate relevant narrowing results? Similar to the way Instance maintains last_known_value

ilevkivskyi · 2022-12-10T21:26:51Z

Yep, when you mentioned the mismatch, binder was my first guess (and it indeed uses proper subtypes). I am not sure what is the best way to go here. One option is to restore the previous logic for proper subtypes for unions. Although it would be inconsistent (the whole point of the merge was that proper vs non-proper subtype checks should go through the same code paths except when an Any type is involved), this should hopefully fix the perf issue. Having a "proper" fix for enums may take time (and may be also risky).

At least I can try to find a minimal PR that would fix the regression. I will post here when I will have something.

ilevkivskyi · 2022-12-10T21:56:06Z

@huguesb @JMMarchant (and others). Could you please try #14276 to see if it fixes performance for you?

Enums are exploded into Union of Literal when narrowed. Conditional branches on enum values can result in multiple distinct narrowing of the same enum which are later subject to subtype checks (most notably via `is_same_type`, when exiting frame context in the binder). Such checks would have quadratic complexity: `O(N*M)` where `N` and `M` are the number of entries in each narrowed enum variable, and led to drastic slowdown if any of the enums involved has a large number of valuees. Implemement a linear-time fast path where literals are quickly filtered, with a fallback to the slow path for more complex values. Fixes python#13821

huguesb · 2022-12-10T23:24:07Z

@huguesb @JMMarchant (and others). Could you please try #14276 to see if it fixes performance for you?

I have not tried it but I doubt it will help much since the profiling clearly shows the bulk of the time being spent in the generator expr above, at line 302. I have a different fix in mind: #14277

Enums are exploded into Union of Literal when narrowed. Conditional branches on enum values can result in multiple distinct narrowing of the same enum which are later subject to subtype checks (most notably via `is_same_type`, when exiting frame context in the binder). Such checks would have quadratic complexity: `O(N*M)` where `N` and `M` are the number of entries in each narrowed enum variable, and led to drastic slowdown if any of the enums involved has a large number of valuees. Implemement a linear-time fast path where literals are quickly filtered, with a fallback to the slow path for more complex values. In our codebase there is one method with a chain of a dozen if statements operating on instances of an enum with a hundreds of values. Prior to the regression it was typechecked in less than 1s. After the regression it takes over 13min to typecheck. This patch fully fixes the regression for us. Fixes python#13821

Enums are exploded into Union of Literal when narrowed. Conditional branches on enum values can result in multiple distinct narrowing of the same enum which are later subject to subtype checks (most notably via `is_same_type`, when exiting frame context in the binder). Such checks would have quadratic complexity: `O(N*M)` where `N` and `M` are the number of entries in each narrowed enum variable, and led to drastic slowdown if any of the enums involved has a large number of values. Implement a linear-time fast path where literals are quickly filtered, with a fallback to the slow path for more complex values. In our codebase there is one method with a chain of a dozen `if` statements operating on instances of an enum with a hundreds of values. Prior to the regression it was typechecked in less than 1s. After the regression it takes over 13min to typecheck. This patch fully fixes the regression for us. Fixes #13821.

Behoston added the bug mypy got something wrong label Oct 5, 2022

AlexWaygood added the performance label Oct 5, 2022

hauntsaninja mentioned this issue Oct 11, 2022

Mypy 0.990 Release Planning #13871

Closed

ilevkivskyi mentioned this issue Nov 23, 2022

Add intenal flag for per-line type checking peformance #14173

Merged

ilevkivskyi mentioned this issue Dec 10, 2022

A random hack to try fixing performace issues #14276

Closed

huguesb mentioned this issue Dec 10, 2022

subtypes: fast path for Union/Union subtype check #14277

Merged

JukkaL closed this as completed in #14277 Dec 28, 2022

Hnasar mentioned this issue Feb 16, 2023

exponentially slow typechecking when overloaded numpy functions used in generic containers like list/dict #14718

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mypy 0.982 run 15 minutes (previous version needs only one minute) #13821

mypy 0.982 run 15 minutes (previous version needs only one minute) #13821

Behoston commented Oct 5, 2022 •

edited

ilevkivskyi commented Oct 9, 2022

Behoston commented Oct 10, 2022

Behoston commented Oct 10, 2022 •

edited

JukkaL commented Oct 10, 2022

Behoston commented Oct 10, 2022 •

edited

JukkaL commented Oct 10, 2022

Behoston commented Oct 10, 2022

JukkaL commented Oct 10, 2022

huguesb commented Oct 11, 2022 •

edited

KelSolaar commented Oct 28, 2022

headtr1ck commented Oct 29, 2022

hauntsaninja commented Oct 29, 2022 •

edited

hauntsaninja commented Oct 29, 2022

Behoston commented Nov 7, 2022

headtr1ck commented Nov 7, 2022

Behoston commented Nov 10, 2022

JMMarchant commented Nov 11, 2022

Behoston commented Nov 11, 2022 via email

KelSolaar commented Nov 13, 2022

JMMarchant commented Nov 14, 2022 •

edited

ilevkivskyi commented Nov 14, 2022

JMMarchant commented Nov 15, 2022

JMMarchant commented Nov 15, 2022

ilevkivskyi commented Nov 15, 2022

JMMarchant commented Nov 16, 2022

ilevkivskyi commented Nov 23, 2022

huguesb commented Dec 10, 2022

ilevkivskyi commented Dec 10, 2022

huguesb commented Dec 10, 2022 •

edited

huguesb commented Dec 10, 2022 •

edited

ilevkivskyi commented Dec 10, 2022

ilevkivskyi commented Dec 10, 2022

huguesb commented Dec 10, 2022

mypy 0.982 run 15 minutes (previous version needs only one minute) #13821

mypy 0.982 run 15 minutes (previous version needs only one minute) #13821

Comments

Behoston commented Oct 5, 2022 • edited

ilevkivskyi commented Oct 9, 2022

Behoston commented Oct 10, 2022

Behoston commented Oct 10, 2022 • edited

JukkaL commented Oct 10, 2022

Behoston commented Oct 10, 2022 • edited

JukkaL commented Oct 10, 2022

Behoston commented Oct 10, 2022

JukkaL commented Oct 10, 2022

huguesb commented Oct 11, 2022 • edited

KelSolaar commented Oct 28, 2022

headtr1ck commented Oct 29, 2022

hauntsaninja commented Oct 29, 2022 • edited

hauntsaninja commented Oct 29, 2022

Behoston commented Nov 7, 2022

headtr1ck commented Nov 7, 2022

Behoston commented Nov 10, 2022

JMMarchant commented Nov 11, 2022

Behoston commented Nov 11, 2022 via email

KelSolaar commented Nov 13, 2022

JMMarchant commented Nov 14, 2022 • edited

ilevkivskyi commented Nov 14, 2022

JMMarchant commented Nov 15, 2022

JMMarchant commented Nov 15, 2022

ilevkivskyi commented Nov 15, 2022

JMMarchant commented Nov 16, 2022

ilevkivskyi commented Nov 23, 2022

huguesb commented Dec 10, 2022

ilevkivskyi commented Dec 10, 2022

huguesb commented Dec 10, 2022 • edited

huguesb commented Dec 10, 2022 • edited

ilevkivskyi commented Dec 10, 2022

ilevkivskyi commented Dec 10, 2022

huguesb commented Dec 10, 2022

Behoston commented Oct 5, 2022 •

edited

Behoston commented Oct 10, 2022 •

edited

Behoston commented Oct 10, 2022 •

edited

huguesb commented Oct 11, 2022 •

edited

hauntsaninja commented Oct 29, 2022 •

edited

JMMarchant commented Nov 14, 2022 •

edited

huguesb commented Dec 10, 2022 •

edited

huguesb commented Dec 10, 2022 •

edited