Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mypy 0.982 run 15 minutes (previous version needs only one minute) #13821

Closed
Behoston opened this issue Oct 5, 2022 · 33 comments · Fixed by #14277
Closed

mypy 0.982 run 15 minutes (previous version needs only one minute) #13821

Behoston opened this issue Oct 5, 2022 · 33 comments · Fixed by #14277
Labels
bug mypy got something wrong performance

Comments

@Behoston
Copy link

Behoston commented Oct 5, 2022

Bug Report

My project is not open-source but I can give you (mypy maintainers) access to repository on gitlab.

After updating mypy to newest version it takes 15 minutes to complete on CI and 8 minutes on my PC. When I rollback to previous release it will works fine (about 15 seconds on local machine and about 45 on CI).

I have base class for about 100 subclasses and mypy hangs always in same place.

LOG:  Deleting dostanesie.core.formulas.university.ur.base dostanesie/core/formulas/university/ur/base.py dostanesie/core/formulas/university/ur/base.meta.json dostanesie/core/formulas/university/ur/base.data.json
LOG:  Processing SCC singleton (dostanesie.core.formulas.university.uph.base) as inherently stale with stale deps (dostanesie.core.formulas.code)
LOG:  Deleting dostanesie.core.formulas.university.uph.base dostanesie/core/formulas/university/uph/base.py dostanesie/core/formulas/university/uph/base.meta.json dostanesie/core/formulas/university/uph/base.data.json
LOG:  Processing SCC singleton (dostanesie.core.formulas.university.up_wroclaw.base) as inherently stale with stale deps (dostanesie.core.formulas.code)
LOG:  Deleting dostanesie.core.formulas.university.up_wroclaw.base dostanesie/core/formulas/university/up_wroclaw/base.py dostanesie/core/formulas/university/up_wroclaw/base.meta.json dostanesie/core/formulas/university/up_wroclaw/base.data.json
LOG:  Processing SCC singleton (dostanesie.core.formulas.university.up_poznan.base) as inherently stale with stale deps (dostanesie.core.formulas.code)
LOG:  Deleting dostanesie.core.formulas.university.up_poznan.base dostanesie/core/formulas/university/up_poznan/base.py dostanesie/core/formulas/university/up_poznan/base.meta.json dostanesie/core/formulas/university/up_poznan/base.data.json
LOG:  Processing SCC singleton (dostanesie.core.formulas.university.up_lublin.base) as inherently stale with stale deps (dostanesie.core.formulas.code)

I tried deleting mypy cache dir locally, on CI we build new docker image each run so there is no mypy cache.

To Reproduce

I really don't know :(

Expected Behavior

Mypy should run about one minute for version 0.971.

LOG:  Build finished in 14.859 seconds with 276 modules, and 0 errors

Actual Behavior

On CI it takes more than 15 minutes, on local machine 8 minutes.

Your Environment

  • Mypy version used: 0.982
  • Mypy command-line flags: none
  • Mypy configuration options from mypy.ini (and other config files):
[mypy]
ignore_missing_imports = true
  • Python version used: Python 3.10.7+

Full mypy log

@Behoston Behoston added the bug mypy got something wrong label Oct 5, 2022
@ilevkivskyi
Copy link
Member

Such things are hard to diagnose. You can try bisecting it to a specific mypy commit, then it will be much easier.

@Behoston
Copy link
Author

I definitely will try today. Also I will try to extract problematic part of our code and share it on GH

@Behoston
Copy link
Author

Behoston commented Oct 10, 2022

3ae19a2 this takes more than 25 minutes

I run this in the background and killed after 25 minutes, it takes longer than previous because I use my PC.

@JukkaL
Copy link
Collaborator

JukkaL commented Oct 10, 2022

I looked at the commit, and a few things seemed potentially suspicious: changes in sum(), subprocess and unittest. If the file where mypy is very slow uses some of these, could you try commenting out the relevant lines to see if it speeds things up?

@Behoston
Copy link
Author

Behoston commented Oct 10, 2022

replaced almost all sum with any and it takes 30 seconds for commit above (not compiled) and 14 seconds for newest version (compiled). We have something between 100 and 200 sum in this package.

@JukkaL
Copy link
Collaborator

JukkaL commented Oct 10, 2022

That would indicate that the sum change is likely the root cause for the regression, since when using sum the run was was over 25 minutes. Does this sound right?

@Behoston
Copy link
Author

In my case: True

we can stick to 0.971, newest version of mypy is not crucial. However, it can affect many other projects CI/CD, especially calculation-dependent (our backend is one big calculator with hundreds calculator classes and some REST API)

@JukkaL
Copy link
Collaborator

JukkaL commented Oct 10, 2022

Yeah, this seems pretty bad and we should try to fix the regression.

It would be very helpful if you could you try to narrow this down a bit -- it's possible that there is some specific call or calls to sum that are very slow. For example, replace the first 50% of the sum calls and then the remaining 50%, in order to narrow things down/bisect (and measure afterwards), and continue until you have a small set of calls that trigger a significant performance regression. This way you may be able to produce a small code example that triggers slow type checking.

@huguesb
Copy link
Contributor

huguesb commented Oct 11, 2022

@Behoston you could try adding --timing-stats <path> to your mypy invocation. It will output a simple text file with per-module timings (one module per line). That should help narrow down which files are most impacted by the change, and from there you could further refine which statements in the specific files are most impacted.

@KelSolaar
Copy link

Hello,

We are experiencing similar issues with Colour: We went from ~30mins to ~1h 20mins on CI when updating our dependencies, one of them was mypy==0.961 --> mypy==0.982.

Cheers,

Thomas

@headtr1ck
Copy link

Our project also went from ~5min to ~25min going from 0.971 to 0.981/2.

@hauntsaninja
Copy link
Collaborator

hauntsaninja commented Oct 29, 2022

It would be useful if you could post the details of your projects, so we can reproduce the slow down. As is mentioned above, it's very hard to diagnose these things in the abstract.

If your project is not open source, it would be very helpful if you could make a shareable repro of the bad behaviour. For example, try some of the strategies mentioned above:

  • Use --timing-stats to narrow down which module is slow
  • Potentially look for invocations of sum (since that's related to OP's problem, lots of thanks to Behoston for narrowing that down)

@hauntsaninja
Copy link
Collaborator

We can probably just revert the change to sum, it's a somewhat weird change to have: #13961

In all cases, would love to have small repros of the performance regression

@Behoston
Copy link
Author

Behoston commented Nov 7, 2022

https://github.com/dostanesie-pl/mypy-bug @hauntsaninja I published part of our code.

@headtr1ck
Copy link

Unfortunately I cannot publish our code.
But looking at --timing-stats I noticed that in 0.971 all external libs are displayed with "0", while in 0.982 they are ">0".

Maybe that has nothing to do with the cause for this, some of our modules have increased by a factor of 10, while others stay the same.
We do not use pythons builtin sum though.

@Behoston
Copy link
Author

Works much better on version 0.990

@JMMarchant
Copy link

We're still seeing substantial slowdowns on version 0.990; is there an easy way to see which lines are slow (have already used --timing-stats to narrow down the files)

@Behoston
Copy link
Author

Behoston commented Nov 11, 2022 via email

@KelSolaar
Copy link

0.990 did not really change anything for us, still 1h20+ check times here.

@JMMarchant
Copy link

JMMarchant commented Nov 14, 2022

Have bisected the commits between 0.971 and 0.981 to identify which one causes the slowdown for us. For us it seems to be d27bff62f71a8b914b1df239467148e81d2e88a2:

Commit No cache runtime Cached runtime
d2063d2 4m 55.01s 3.846s
d27bff6 Stopped after 1hr 11m N/A

@ilevkivskyi
Copy link
Member

Hm, that PR is almost a pure refactor. Actually I have something to suspect. Potentially you have some literal type with a lot of values or some large enum. There is a function try_contracting_literals_in_union() that may be computationally intensive, previously it was (inconsistently) only used in is_subtype(), after that PR it is used also by is_proper_subtype(). But this is just a wild guess. If you don't want to share your code, you will likely need to use mypy with a profiler to compare before and after. Note that CProfile will only work with interpreted mypy, for compiled version may need to use https://pypi.org/project/py-spy/

@JMMarchant
Copy link

I ran py-spy on interpreted mypy 0.990 (mypy 0.990 (compiled: no)) as slow down is still present in there; it unfortunately isn't a full run as it got to 90 minutes but it looks to capture where it's spending most of the time during that process. I've attached the profiling with personal details removed but it looks like you're right and that is_subtype() and is_proper_subtype() are potentially the culprits.

profile

@JMMarchant
Copy link

For comparison here's the mypy 0.971 (compiled: no) profile (only minor changes to address any issues in 0.990 which were mostly now unused type: ignores)

profile-0 971

@ilevkivskyi
Copy link
Member

@JMMarchant thanks for the data. Unfortunately, I still can't figure out what is going on. Would it be possible to profile on the last good and first bad commit? Just to exclude other possible changes to the call graph (there were quite a lot in recent couple releases). Also could you please post your mypy config?

@JMMarchant
Copy link

@JMMarchant thanks for the data. Unfortunately, I still can't figure out what is going on. Would it be possible to profile on the last good and first bad commit? Just to exclude other possible changes to the call graph (there were quite a lot in recent couple releases). Also could you please post your mypy config?

Commit No cache runtime Cached runtime
d2063d2 ~5m ~4s
d27bff6 ~2hrs ~4s

mypy.ini:

[mypy]
mypy_path = $MYPY_CONFIG_FILE_DIR/stubs
files =
    src/,
    dev-scripts/,
    docs/,
    scripts/,
    setup.py,
    tests/

; Error message config
show_error_context = True
show_column_numbers = True
show_error_codes = True
pretty = True

; Untyped definitions and calls
disallow_untyped_defs = True
disallow_incomplete_defs = True
check_untyped_defs = True
disallow_untyped_decorators = True
; None and optional
no_implicit_optional = True
; Warnings
warn_redundant_casts = True
warn_unused_ignores = True
warn_no_return = True
warn_return_any = True
warn_unreachable = True

; These are all the third-party libraries we have generated partial or full stubs
; for. We are a bit more permissive for what is allowed in these stubs/how detailed
; they need to be. They can all be found in the stubs/ folder.
;    - albumentations
;    - envyaml
;    - fire
;    - GPUtil
;    - grpc
;    - intermine
;    - lightgbm
;    - marshmallow_enum
;    - marshmallow_union
;    - matplotlib
;    - methodtools
;    - opacus
;    - pandasql
;    - pdoc
;    - pyarrow
;    - pydicom
;    - pytorch_tabnet
;    - scipy
;    - skimage
;    - sklearn
;    - snsql
;    - sphobjinv
;    - sqlparse
;    - sqlvalidator
;    - statsmodels
;    - torchvision
;    - wirerope
[mypy-albumentations.*,envyaml.*,fire.*,GPUtil.*,grpc.*,intermine.*,lightgbm.*,marshmallow_enum.*,marshmallow_union.*,matplotlib.*,methodtools.*,opacus.*,pandasql.*,pdoc.*,pyarrow.*,pydicom.*,pytorch_tabnet.*,scipy.*,skimage.*,sklearn.*,snsql.*,sphobjinv.*,sqlparse.*,sqlvalidator.*,statsmodels.*,torchvision.*,wirerope.*]
allow_incomplete_defs = True
allow_untyped_defs = True
disallow_any_unimported = True


[mypy-docs.*]
disallow_any_expr = True


; Suppress less useful errors for tests
[mypy-tests.*]
; Tests with multiple asserts may cause code to be marked as unreachable even
; though it's fine. See https://github.com/python/mypy/issues/9457
warn_unreachable = False

; Suppress errors for untyped third party test libraries:
; - chromedriver_autoinstaller
; - jupytext
; - nbclient
; - nbformat
; - pytest_flask
; - requests_toolbelt
; - testbook
[mypy-chromedriver_autoinstaller.*,jupytext.*,nbclient.*,nbformat.*,pytest_flask.*,requests_toolbelt.*,testbook.*]
ignore_missing_imports = True

ilevkivskyi added a commit that referenced this issue Nov 23, 2022
This should help with the investigation of tricky performance
regressions like #13821.

I tried to implement in such a way that it will give minimal impact when
not used (since I am touching a hot method).
@ilevkivskyi
Copy link
Member

@JMMarchant After looking a bit more at these, it seems like something may be slow with union handling and/or overloads. I just merged a PR that adds an optimization to overloads, could you please:

  • Check mypy perf on current master
  • Try the new internal flag --line-checking-stats (also merged today) to see where in your code mypy is stuck

@huguesb
Copy link
Contributor

huguesb commented Dec 10, 2022

I am also seeing significant performance regression between 0.971 and master: 5min -> 20min for a mypyc-compiled run on my codebase.

The per-file time stats point to a file that has a large function with many if/else branches depending on values of a very large Enum (~13min for a single file).

However the per-line time stats seem to be inconsistent with the per-file stats, and while the same file does show up with some slow lines, the numbers don't add up to the same amount as the per-file time stats. The largest lines on the per-line stats point to overloaded methods taking the enum value as a parameter (specifically dict.get), which at that point I expect to be a gigantic Union of Literal values due to type narrowing in the if branches.

I will try to reduce this file into a more practical reproducer and profile it in the next few days.

@ilevkivskyi
Copy link
Member

Times are expected to not add up exactly because line stats only count time spent in expressions (and only for type checking). There may be some additional time spent analyzing enclosing statement, but it is hard to attribute a line number to this and such time is usually small.

@huguesb
Copy link
Contributor

huguesb commented Dec 10, 2022

Times are expected to not add up exactly because line stats only count time spent in expressions (and only for type checking). There may be some additional time spent analyzing enclosing statement, but it is hard to attribute a line number to this and such time is usually small.

I did expect a small difference, but this is not even close:

per-file timing: 790953050us
sum of per-line timings: 1287098.9us

There's a >600x mismatch!

@huguesb
Copy link
Contributor

huguesb commented Dec 10, 2022

Profiling results are interesting! There are ~20 if statement branches to visit, and each branch spends ~2min (pure-python run) in the update_from_options method of binder.frame_context after processing the meat of the if statement. Pretty much the entire time is spent in subtype checks.

This also explains the mismatch in timings, as the per-line timings are only collected in ExpressionChecker, and not in TypeChecker, which is were all of this work is happening.

Relevant speedscope output for a single if statement branch:
Screen Shot 2022-12-10 at 11 35 36 AM

@ilevkivskyi I think the merging of subtype visitors is likely to blame for that. I have a hunch that some of the optimizations in the proper subtype check got lost along the way. For instance _is_proper_subtype used to have the following fast path, which seems relevant for the kind of large Union of Literal produced by enum narrowing:

    if isinstance(right, UnionType) and not isinstance(left, UnionType):
        return any(
            is_proper_subtype(
                orig_left,
                item,
                ignore_promotions=ignore_promotions,
                erase_instances=erase_instances,
                keep_erased_types=keep_erased_types,
            )
            for item in right.items
        )

This sort of issue/regression has come up a number of times. It seems to me it might be worth revisiting the way Enum narrowing is done. Maybe instead of always exploding them into Union types and then having to special case those unions in a number of places, only to try to reduce them back to the original Enum later on, we should always keep the enum type itself, and attach extra info to it to indicate relevant narrowing results? Similar to the way Instance maintains last_known_value

@ilevkivskyi
Copy link
Member

Yep, when you mentioned the mismatch, binder was my first guess (and it indeed uses proper subtypes). I am not sure what is the best way to go here. One option is to restore the previous logic for proper subtypes for unions. Although it would be inconsistent (the whole point of the merge was that proper vs non-proper subtype checks should go through the same code paths except when an Any type is involved), this should hopefully fix the perf issue. Having a "proper" fix for enums may take time (and may be also risky).

At least I can try to find a minimal PR that would fix the regression. I will post here when I will have something.

@ilevkivskyi
Copy link
Member

@huguesb @JMMarchant (and others). Could you please try #14276 to see if it fixes performance for you?

huguesb added a commit to huguesb/mypy that referenced this issue Dec 10, 2022
Enums are exploded into Union of Literal when narrowed.

Conditional branches on enum values can result in multiple distinct narrowing
of the same enum which are later subject to subtype checks (most notably via
`is_same_type`, when exiting frame context in the binder). Such checks would
have quadratic complexity: `O(N*M)` where `N` and `M` are the number of entries
in each narrowed enum variable, and led to drastic slowdown if any of the enums
involved has a large number of valuees.

Implemement a linear-time fast path where literals are quickly filtered, with
a fallback to the slow path for more complex values.

Fixes python#13821
@huguesb
Copy link
Contributor

huguesb commented Dec 10, 2022

@huguesb @JMMarchant (and others). Could you please try #14276 to see if it fixes performance for you?

I have not tried it but I doubt it will help much since the profiling clearly shows the bulk of the time being spent in the generator expr above, at line 302. I have a different fix in mind: #14277

huguesb added a commit to huguesb/mypy that referenced this issue Dec 21, 2022
Enums are exploded into Union of Literal when narrowed.

Conditional branches on enum values can result in multiple distinct narrowing
of the same enum which are later subject to subtype checks (most notably via
`is_same_type`, when exiting frame context in the binder). Such checks would
have quadratic complexity: `O(N*M)` where `N` and `M` are the number of entries
in each narrowed enum variable, and led to drastic slowdown if any of the enums
involved has a large number of valuees.

Implemement a linear-time fast path where literals are quickly filtered, with
a fallback to the slow path for more complex values.

In our codebase there is one method with a chain of a dozen if statements
operating on instances of an enum with a hundreds of values. Prior to the
regression it was typechecked in less than 1s. After the regression it takes
over 13min to typecheck. This patch fully fixes the regression for us.

Fixes python#13821
huguesb added a commit to huguesb/mypy that referenced this issue Dec 21, 2022
Enums are exploded into Union of Literal when narrowed.

Conditional branches on enum values can result in multiple distinct narrowing
of the same enum which are later subject to subtype checks (most notably via
`is_same_type`, when exiting frame context in the binder). Such checks would
have quadratic complexity: `O(N*M)` where `N` and `M` are the number of entries
in each narrowed enum variable, and led to drastic slowdown if any of the enums
involved has a large number of valuees.

Implemement a linear-time fast path where literals are quickly filtered, with
a fallback to the slow path for more complex values.

In our codebase there is one method with a chain of a dozen if statements
operating on instances of an enum with a hundreds of values. Prior to the
regression it was typechecked in less than 1s. After the regression it takes
over 13min to typecheck. This patch fully fixes the regression for us.

Fixes python#13821
huguesb added a commit to huguesb/mypy that referenced this issue Dec 22, 2022
Enums are exploded into Union of Literal when narrowed.

Conditional branches on enum values can result in multiple distinct narrowing
of the same enum which are later subject to subtype checks (most notably via
`is_same_type`, when exiting frame context in the binder). Such checks would
have quadratic complexity: `O(N*M)` where `N` and `M` are the number of entries
in each narrowed enum variable, and led to drastic slowdown if any of the enums
involved has a large number of valuees.

Implemement a linear-time fast path where literals are quickly filtered, with
a fallback to the slow path for more complex values.

In our codebase there is one method with a chain of a dozen if statements
operating on instances of an enum with a hundreds of values. Prior to the
regression it was typechecked in less than 1s. After the regression it takes
over 13min to typecheck. This patch fully fixes the regression for us.

Fixes python#13821
JukkaL pushed a commit that referenced this issue Dec 28, 2022
Enums are exploded into Union of Literal when narrowed.

Conditional branches on enum values can result in multiple distinct
narrowing of the same enum which are later subject to subtype checks 
(most notably via `is_same_type`, when exiting frame context in the binder). 
Such checks would have quadratic complexity: `O(N*M)` where `N` and `M` 
are the number of entries in each narrowed enum variable, and led to drastic 
slowdown if any of the enums involved has a large number of values.

Implement a linear-time fast path where literals are quickly filtered, with
a fallback to the slow path for more complex values.

In our codebase there is one method with a chain of a dozen `if`
statements operating on instances of an enum with a hundreds of values. 
Prior to the regression it was typechecked in less than 1s. After the regression 
it takes over 13min to typecheck. This patch fully fixes the regression for us.

Fixes #13821.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug mypy got something wrong performance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants