Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mypy appears to become inefficient for nested ifs (13+ nested ifs) #9591

Closed
stmlange opened this issue Oct 13, 2020 · 9 comments · Fixed by #12700
Closed

mypy appears to become inefficient for nested ifs (13+ nested ifs) #9591

stmlange opened this issue Oct 13, 2020 · 9 comments · Fixed by #12700
Labels
bug mypy got something wrong performance

Comments

@stmlange
Copy link

Bug Report

mypy appears to become inefficient for nested ifs (see example)

To Reproduce

Consider the following (bad) code example:

cat << EOF > lint_me.py
special_overrides = True
print(sorted(["c", "b", "a"], key=lambda kv: kv[0] if not special_overrides else \\
                                       'a' if kv[0] == 'a_01' else \\
                                       'b' if kv[0] == 'b_02' else \\
                                       'c' if kv[0] == 'c_03' else \\
                                       'd' if kv[0] == 'd_04' else \\
                                       'e' if kv[0] == 'e_05' else \\
                                       'f' if kv[0] == 'f_06' else \\
                                       'g' if kv[0] == 'g_07' else \\
                                       'h' if kv[0] == 'h_08' else \\
                                       'i' if kv[0] == 'i_09' else \\
                                       'j' if kv[0] == 'j_10' else \\
                                       'k' if kv[0] == 'k_11' else \\
                                       'l' if kv[0] == 'l_12' else \\
                                       'm' if kv[0] == 'm_13' else \\
                                       'n' if kv[0] == 'n_14' else \\
                                       'o' if kv[0] == 'o_15' else \\
                                       'p' if kv[0] == 'p_16' else \\
                                       'q' if kv[0] == 'q_17' else \\
                                       kv[0]))
EOF

Expected Behavior
mypy can complain about very bad code in a reasonable time

Actual Behavior
with each additional if the total runtime seems to double:

13 nested ifs (until m): real    0m33.827s
14 nested ifs (until n): real    1m10.013s
15 nested ifs (until o): real    2m3.904s
16 nested ifs (until p): real    4m3.223s
17 nested ifs (until q): real    8m27.333s

Your Environment

  • Mypy version used: 0.790
  • Mypy command-line flags: None (just python -m mypy lint_me.py)
  • Mypy configuration options from mypy.ini (and other config files): None
  • Python version used: Python 3.8.1
  • Operating system and version: Windows 10, Version 10.0.19041 Build 19041
@stmlange stmlange added the bug mypy got something wrong label Oct 13, 2020
@PurvaT
Copy link

PurvaT commented Oct 25, 2020

Hi, can you elaborate the issue a bit? I guess if i understand the problem well, I could try solving it.

@stmlange
Copy link
Author

Hello,
apologies if the description was somewhat vage. The issue is about that mypy seems to become very inefficient with an increasing nestedness of if-blocks. If you consider the example above and only have a structure of 13 nested ifs mypy takes 0m33.827s to evaluate the statement for bad-code. With 14 nested if-blocks mypy takes 1m10.013s and essentially with each additional nested if the runtime doubles. The above example has 17 nested ifs and it consumes 8m27.333s to lint the code with mypy.

In case it is not clear (because I don't know how to phrase this differently) -- here are the full-examples with the time consumed by mypy:

Example with 13 nested ifs (mypy runtime 0m33.827s)
cat << EOF > lint_me.py
special_overrides = True
print(sorted(["c", "b", "a"], key=lambda kv: kv[0] if not special_overrides else \\
					     'a' if kv[0] == 'a_01' else \\
					     'b' if kv[0] == 'b_02' else \\
					     'c' if kv[0] == 'c_03' else \\
					     'd' if kv[0] == 'd_04' else \\
					     'e' if kv[0] == 'e_05' else \\
					     'f' if kv[0] == 'f_06' else \\
					     'g' if kv[0] == 'g_07' else \\
					     'h' if kv[0] == 'h_08' else \\
					     'i' if kv[0] == 'i_09' else \\
					     'j' if kv[0] == 'j_10' else \\
					     'k' if kv[0] == 'k_11' else \\
					     'l' if kv[0] == 'l_12' else \\
					     'm' if kv[0] == 'm_13' else \\
					     kv[0]))
EOF

time python3 -m mypy lint_me.py # reports 0m33.827s
Example with 14 nested ifs (mypy runtime 1m10.013s)
cat << EOF > lint_me.py
special_overrides = True
print(sorted(["c", "b", "a"], key=lambda kv: kv[0] if not special_overrides else \\
					     'a' if kv[0] == 'a_01' else \\
					     'b' if kv[0] == 'b_02' else \\
					     'c' if kv[0] == 'c_03' else \\
					     'd' if kv[0] == 'd_04' else \\
					     'e' if kv[0] == 'e_05' else \\
					     'f' if kv[0] == 'f_06' else \\
					     'g' if kv[0] == 'g_07' else \\
					     'h' if kv[0] == 'h_08' else \\
					     'i' if kv[0] == 'i_09' else \\
					     'j' if kv[0] == 'j_10' else \\
					     'k' if kv[0] == 'k_11' else \\
					     'l' if kv[0] == 'l_12' else \\
					     'm' if kv[0] == 'm_13' else \\
					     'n' if kv[0] == 'n_14' else \\
					     kv[0]))
EOF

time python3 -m mypy lint_me.py # reports 1m10.013s
Example with 15 nested ifs (mypy runtime 2m3.904s)
cat << EOF > lint_me.py
special_overrides = True
print(sorted(["c", "b", "a"], key=lambda kv: kv[0] if not special_overrides else \\
					     'a' if kv[0] == 'a_01' else \\
					     'b' if kv[0] == 'b_02' else \\
					     'c' if kv[0] == 'c_03' else \\
					     'd' if kv[0] == 'd_04' else \\
					     'e' if kv[0] == 'e_05' else \\
					     'f' if kv[0] == 'f_06' else \\
					     'g' if kv[0] == 'g_07' else \\
					     'h' if kv[0] == 'h_08' else \\
					     'i' if kv[0] == 'i_09' else \\
					     'j' if kv[0] == 'j_10' else \\
					     'k' if kv[0] == 'k_11' else \\
					     'l' if kv[0] == 'l_12' else \\
					     'm' if kv[0] == 'm_13' else \\
					     'n' if kv[0] == 'n_14' else \\
					     'o' if kv[0] == 'o_15' else \\
					     kv[0]))
EOF

time python3 -m mypy lint_me.py # reports 2m3.904s
Example with 16 nested ifs (mypy runtime 4m3.223s)
cat << EOF > lint_me.py
special_overrides = True
print(sorted(["c", "b", "a"], key=lambda kv: kv[0] if not special_overrides else \\
					     'a' if kv[0] == 'a_01' else \\
					     'b' if kv[0] == 'b_02' else \\
					     'c' if kv[0] == 'c_03' else \\
					     'd' if kv[0] == 'd_04' else \\
					     'e' if kv[0] == 'e_05' else \\
					     'f' if kv[0] == 'f_06' else \\
					     'g' if kv[0] == 'g_07' else \\
					     'h' if kv[0] == 'h_08' else \\
					     'i' if kv[0] == 'i_09' else \\
					     'j' if kv[0] == 'j_10' else \\
					     'k' if kv[0] == 'k_11' else \\
					     'l' if kv[0] == 'l_12' else \\
					     'm' if kv[0] == 'm_13' else \\
					     'n' if kv[0] == 'n_14' else \\
					     'o' if kv[0] == 'o_15' else \\
					     'p' if kv[0] == 'p_16' else \\
					     kv[0]))
EOF

time python3 -m mypy lint_me.py # reports 4m3.223s
Example with 17 nested ifs (mypy runtime 8m27.333s)
cat << EOF > lint_me.py
special_overrides = True
print(sorted(["c", "b", "a"], key=lambda kv: kv[0] if not special_overrides else \\
					     'a' if kv[0] == 'a_01' else \\
					     'b' if kv[0] == 'b_02' else \\
					     'c' if kv[0] == 'c_03' else \\
					     'd' if kv[0] == 'd_04' else \\
					     'e' if kv[0] == 'e_05' else \\
					     'f' if kv[0] == 'f_06' else \\
					     'g' if kv[0] == 'g_07' else \\
					     'h' if kv[0] == 'h_08' else \\
					     'i' if kv[0] == 'i_09' else \\
					     'j' if kv[0] == 'j_10' else \\
					     'k' if kv[0] == 'k_11' else \\
					     'l' if kv[0] == 'l_12' else \\
					     'm' if kv[0] == 'm_13' else \\
					     'n' if kv[0] == 'n_14' else \\
					     'o' if kv[0] == 'o_15' else \\
					     'p' if kv[0] == 'p_16' else \\
					     'q' if kv[0] == 'q_17' else \\
					     kv[0]))
EOF

time python3 -m mypy lint_me.py # reports 8m27.333s

For me (or the reason why I created the issue) is that it sounds wrong that the mypy runtime doubles, while the complexity of the code only grows linear.

In the end this might even expected and considered a "won't fix".

@PurvaT
Copy link

PurvaT commented Oct 26, 2020

Thanks for the clarification. Can you assign the issue to me? I am interested in finding solution to it.

@stmlange
Copy link
Author

I don't seem to have permission to assign the issue. I'd guess everyone who stumbles over it will see your comment that you intent to work on it :-)

@ethanhs
Copy link
Collaborator

ethanhs commented Oct 26, 2020

@PurvaT we generally don't assign issues, but feel free to get started!

@dhananjay1210
Copy link

When I run this code on the Windows platform, I got the following error -

C:\Users\dhananjay\Desktop>time python3 -m mypy lint_me.py
The system cannot accept the time entered.
Enter the new time:

I am unable to run the given code.

@stmlange
Copy link
Author

time is a unix command that measures the execution time of a process. Unless you have cygwin installed (or any additional packages that provide this command) this won't work like that on windows.

On Windows you could try:

@echo off
echo %time% < nul
python3 -m mypy lint_me.py
echo %time% < nul

@PurvaT
Copy link

PurvaT commented Oct 27, 2020

I used the above lines to run the code on windows. I got the following output -

  1. if up to 'n':
      15:33:51.70
      15:33:52.03
       Total = 0.33 sec

  2. if up to 'o':
      15:39:59.07
      15:39:59.47
       Total = 0.40 sec

  3. if up to 'p':
      15:40:56.53
      15:40:56.90
       Total = 0.37 sec

  4. if up to 'q':
      15:41:52.22
      15:41:52.63
       Total = 0.41 sec

I think the running time highly depends on the system as we are measuring a program running time not the asymptotic run rime.  Program execution time highly depends on underlying hardware.

huguesb added a commit to huguesb/mypy that referenced this issue Apr 30, 2022
Deeply nested if/else expressions have a worst-case exponential behavior.

This will for instance manifest when returning literal values which cause
repeated analysis of conditional branches with subtly different type context
for each literal.

This can be optimized by observing that a simple literal context will yield
the same analysis as its fallback type, and likewise, two literals of the
same fallback type will yield the same analysis. In those case we can avoid
the repeated analysis and prevent the worst-case exponential behavior.

Fixes python#9591
@huguesb
Copy link
Contributor

huguesb commented Apr 30, 2022

This still manifests in master. The root cause is that the typechecker tries to narrow the type of the nested if/else expression and blows up into an exponential number of typechecks with subtly different type contexts. I have a tentative fix for that in #12700

huguesb added a commit to huguesb/mypy that referenced this issue May 1, 2022
Deeply nested if/else expressions have a worst-case exponential behavior.

This will for instance manifest when returning literal values which cause
repeated analysis of conditional branches with subtly different type context
for each literal.

This can be optimized by observing that a simple literal context will yield
the same analysis as its fallback type, and likewise, two literals of the
same fallback type will yield the same analysis. In those case we can avoid
the repeated analysis and prevent the worst-case exponential behavior.

Fixes python#9591
huguesb added a commit to huguesb/mypy that referenced this issue May 16, 2022
Deeply nested if/else expressions have a worst-case exponential behavior.

This will for instance manifest when returning literal values which cause
repeated analysis of conditional branches with subtly different type context
for each literal.

This can be optimized by observing that a simple literal context will yield
the same analysis as its fallback type, and likewise, two literals of the
same fallback type will yield the same analysis. In those case we can avoid
the repeated analysis and prevent the worst-case exponential behavior.

Fixes python#9591
JukkaL pushed a commit that referenced this issue May 20, 2022
Deeply nested if/else expressions have a worst-case exponential behavior.

This will for instance manifest when returning literal values which cause
repeated analysis of conditional branches with subtly different type context
for each literal.

This can be optimized by observing that a simple literal context will yield
the same analysis as its fallback type, and likewise, two literals of the
same fallback type will yield the same analysis. In those case we can avoid
the repeated analysis and prevent the worst-case exponential behavior.

Fixes #9591
JukkaL pushed a commit that referenced this issue May 20, 2022
Deeply nested if/else expressions have a worst-case exponential behavior.

This will for instance manifest when returning literal values which cause
repeated analysis of conditional branches with subtly different type context
for each literal.

This can be optimized by observing that a simple literal context will yield
the same analysis as its fallback type, and likewise, two literals of the
same fallback type will yield the same analysis. In those case we can avoid
the repeated analysis and prevent the worst-case exponential behavior.

Fixes #9591
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug mypy got something wrong performance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants