Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IntelliSense is not reactive with sympy #7159

Closed
denismazzucato opened this issue Jan 29, 2024 · 26 comments
Closed

IntelliSense is not reactive with sympy #7159

denismazzucato opened this issue Jan 29, 2024 · 26 comments
Labels
addressed in next version Issue is fixed and will appear in next published version bug Something isn't working

Comments

@denismazzucato
Copy link

Environment data

  • Language Server version: 2023.12.1
  • OS and version: macOS Sonoma 14.2
  • Python version: 3.12.0
  • Conda version 23.10.0
  • Sympy version: 1.12

Code Snippet

from sympy import Symbol

x = Symbol("x")

Expected behavior

Hovering over the the variable x should show that x is a Symbol in no-time, or auto-completion should appear instantly when typing anywhere in the code above.

Actual behavior

It takes about 5 seconds to show the type information when hover with the cursor on x, and about 12 seconds to show any auto-completion.

The problem is using Symbol from sympy as all the other features seems to not run any delay. For example, the code below does not halt IntelliSense:

import sympy as sym

x = sym.var("x")

Logs

pylance-sympy-trace.log

@debonte
Copy link
Collaborator

debonte commented Jan 29, 2024

Sympy perf was recently mentioned in #7130. Eric was looking for someone to provide a concrete example, so I'll transfer this issue over to Pyright.

@debonte debonte removed their assignment Jan 29, 2024
@debonte debonte transferred this issue from microsoft/pylance-release Jan 29, 2024
@erictraut
Copy link
Collaborator

The sympy library claims to be a "typed library" by virtue of the fact that it includes a "py.typed" marker file (as outlined in PEP 561). However, sympy unfortunately contains almost no type information. That means pyright will attempt to infer type information (function return types, etc.) from the code.

You would normally be able to disable this behavior in pyright by setting the useLibraryCodeForTypes option to false, but this setting applies only to libraries that are not marked as "py.typed". When a library claims to be typed, pyright honors that designation and acts accordingly.

I ran pyright --verifytypes sympy to determine the "type completeness" of the library. Most "py.typed" libraries are 50% or more "type complete". Sympy is currently just 6%. It really has no right to claim that it's a typed library.

Symbols exported by "sympy": 35039
  With known type: 2107
  With ambiguous type: 1808
  With unknown type: 31124

Type completeness score: 6%

I'm not sure why sympy maintainers decided to add a "py.typed" marker file when the library is not typed. Perhaps they didn't understand the implications of doing so.

For most libraries, pyright can infer return types relatively quickly. Sympy, however, contains many functions and methods that have complex code flow graphs and large numbers of variables, which makes type inference very costly.

If this library is important to you, I recommend reaching out to the maintainers to ask them to 1) remove the "py.typed" marker, since the library is clearly not typed, or 2) add more type annotations to the library's public interface.

In the meantime, you could manually delete the "py.typed" marker and then set useLibraryCodeForTypes to "false" within your project. This will improve performance but it will also unfortunately eliminate all language server features that rely on type information (including completion suggestions).

You might also be able to find type stub packages for the sympy library. If installed, they would override the sympy library itself. If you can't find an existing type stub package, you could attempt to create a partial set of stubs yourself.

I'm going to close this issue because I don't see anything that we can do in pyright to address this issue.

@erictraut erictraut closed this as not planned Won't fix, can't repro, duplicate, stale Jan 30, 2024
@erictraut erictraut added the as designed Not a bug, working as intended label Jan 30, 2024
@oscarbenjamin
Copy link

Most "py.typed" libraries are 50% or more "type complete". Sympy is currently just 6%. It really has no right to claim that it's a typed library

Fair enough. I think we should remove the py.typed file from sympy then. I merged the PR (see sympy/sympy#22337 (comment)) to add the py.typed file to sympy without really understanding what the implications would be. The effect on checking downstream codebases was what worried me although I was more worried about the type checkers misreporting types rather than editors locking up the CPU and consuming many gigabytes of RAM.

I do think that pyright has performance problems highlighted by this. I usually have the pyright language server disabled in my editor because of this but I have just tested what happens if I reenable it and open a file from the sympy codebase. I can see that 1 core is at 100% for many minutes. I haven't waited long enough to see how long it would take for pyright to finish what it is doing. The memory used by node seems to climb up to 2GB, then drop down and climb up again repeatedly. I'm testing this with the py.typed file removed which doesn't seem to make any difference in this case.

I don't see how I could enable a language server that can perform like this just because I opened a file in the editor.

@oscarbenjamin
Copy link

I opened a SymPy PR about a year ago (sympy/sympy#25103) and at the time I said:

Running pyright like this over the codebase takes about 5 minutes on this computer

Something seems to have become less efficient since then though. I just tested with the latest version of pyright (1.1.349) on the same computer and it not only takes longer than 5 minutes but ultimately crashes:

$ time pyright sympy

<--- Last few GCs --->

[21851:0x58bbd80]   814593 ms: Scavenge 1987.2 (2078.7) -> 1979.9 (2078.9) MB, 7.5 / 0.0 ms  (average mu = 0.795, current mu = 0.560) allocation failure; 
[21851:0x58bbd80]   814673 ms: Scavenge 1988.4 (2079.6) -> 1981.1 (2080.9) MB, 4.7 / 0.0 ms  (average mu = 0.795, current mu = 0.560) allocation failure; 
[21851:0x58bbd80]   814712 ms: Scavenge 1988.8 (2080.9) -> 1981.5 (2096.9) MB, 5.4 / 0.0 ms  (average mu = 0.795, current mu = 0.560) allocation failure; 


<--- JS stacktrace --->

FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
 1: 0xb7a940 node::Abort() [/home/oscar/.nvm/versions/node/v18.16.0/bin/node]
 2: 0xa8e823  [/home/oscar/.nvm/versions/node/v18.16.0/bin/node]
 3: 0xd5c940 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [/home/oscar/.nvm/versions/node/v18.16.0/bin/node]
 4: 0xd5cce7 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [/home/oscar/.nvm/versions/node/v18.16.0/bin/node]
 5: 0xf3a3e5  [/home/oscar/.nvm/versions/node/v18.16.0/bin/node]
 6: 0xf4c8cd v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [/home/oscar/.nvm/versions/node/v18.16.0/bin/node]
 7: 0xf26fce v8::internal::HeapAllocator::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/home/oscar/.nvm/versions/node/v18.16.0/bin/node]
 8: 0xf28397 v8::internal::HeapAllocator::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/home/oscar/.nvm/versions/node/v18.16.0/bin/node]
 9: 0xf0956a v8::internal::Factory::NewFillerObject(int, v8::internal::AllocationAlignment, v8::internal::AllocationType, v8::internal::AllocationOrigin) [/home/oscar/.nvm/versions/node/v18.16.0/bin/node]
10: 0x12ce7af v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long*, v8::internal::Isolate*) [/home/oscar/.nvm/versions/node/v18.16.0/bin/node]
11: 0x16fb6b9  [/home/oscar/.nvm/versions/node/v18.16.0/bin/node]

real	13m35.492s
user	19m44.680s
sys	0m33.144s

@oscarbenjamin
Copy link

With pyright 1.1.300 I can run the full check in 7 minutes:

$ time PYRIGHT_PYTHON_FORCE_VERSION=1.1.300 pyright sympy | tee pyright.txt 
No configuration file found.
pyproject.toml file found at /home/oscar/current/active/sympy.
Loading pyproject.toml file at /home/oscar/current/active/sympy/pyproject.toml
Assuming Python version 3.11
Assuming Python platform Linux
Auto-excluding **/node_modules
Auto-excluding **/__pycache__
Auto-excluding **/.*
Searching for source files
Found 1517 source files
Emptying type cache to avoid heap overflow. Used 1888MB out of 2096MB.
Emptying type cache to avoid heap overflow. Used 1887MB out of 2096MB.
pyright 1.1.300
...
28040 errors, 48 warnings, 0 informations 
Completed in 439.575sec
WARNING: there is a new pyright version available (v1.1.300 -> v1.1.349).
Please install the new version or set PYRIGHT_PYTHON_FORCE_VERSION to `latest`

I suppose that the messages

Emptying type cache to avoid heap overflow. Used 1887MB out of 2096MB.

explain why memory usage goes up to 2GB and drops down again. Maybe this cache emptying is not being handled as well in pyright 1.1.349 which might explain why it crashes after running out of memory.

@rchiodo
Copy link
Collaborator

rchiodo commented Feb 21, 2024

I did some bisecting to see where sympy became slow. It looks like a change between 1.1.307 and 1.1.308 caused a large perf regression:

I tested with this code (and py.typed removed from the sympy site-packages folder)

import sympy

def f():
    sympy.acos(42)

1.1.307 Results:

pyright 1.1.307
0 errors, 0 warnings, 0 informations
Completed in 2.709sec

Analysis stats
Total files parsed and bound: 223
Total files checked: 1

Timing stats
Find Source Files:    0sec
Read Source Files:    0.15sec
Tokenize:             0.18sec
Parse:                0.32sec
Resolve Imports:      0.28sec
Bind:                 0.33sec
Check:                1.05sec
Detect Cycles:        0sec

1.1.308 Results:

pyright 1.1.308
0 errors, 0 warnings, 0 informations
Completed in 12.676sec

Analysis stats
Total files parsed and bound: 223
Total files checked: 1

Timing stats
Find Source Files:    0sec
Read Source Files:    0.16sec
Tokenize:             0.18sec
Parse:                0.33sec
Resolve Imports:      0.3sec
Bind:                 0.32sec
Check:                11.01sec
Detect Cycles:        0sec

@rchiodo rchiodo reopened this Feb 21, 2024
@rchiodo rchiodo removed the as designed Not a bug, working as intended label Feb 21, 2024
@erictraut erictraut added the needs investigation Requires additional investigation to determine course of action label Feb 28, 2024
@erictraut
Copy link
Collaborator

@rchido, were you able to narrow the problem to a specific commit?

@rchiodo
Copy link
Collaborator

rchiodo commented Feb 28, 2024

Sorry but I didn't check individual commits. Just the difference between the two releases so far.

@rchiodo
Copy link
Collaborator

rchiodo commented Feb 28, 2024

It's this one:
6753a5d

Previous commit timing:

C:\Users\rchiodo\source\testing\Testing_Pyright>node c:\users\rchiodo\source\repos\pyright\packages\pyright\index.js --stats test_sympy.py
Loading configuration file at C:\Users\rchiodo\source\testing\Testing_Pyright\pyrightconfig.json
Assuming Python version 3.11
Auto-excluding **/node_modules
Auto-excluding **/__pycache__
Auto-excluding **/.*
Found 1 source file
[FG] Long operation: checking: C:\Users\rchiodo\source\testing\Testing_Pyright\test_sympy.py (2018ms)
[FG] Long operation: analyzing: C:\Users\rchiodo\source\testing\Testing_Pyright\test_sympy.py (2245ms)
pyright 1.1.307
0 errors, 0 warnings, 0 informations
Completed in 2.607sec

Analysis stats
Total files parsed and bound: 223
Total files checked: 1

Timing stats
Find Source Files:    0sec
Read Source Files:    0.05sec
Tokenize:             0.2sec
Parse:                0.3sec
Resolve Imports:      0.28sec
Bind:                 0.32sec
Check:                1.07sec
Detect Cycles:        0sec

6753a5d's timing:

C:\Users\rchiodo\source\testing\Testing_Pyright>node c:\users\rchiodo\source\repos\pyright\packages\pyright\index.js --stats test_sympy.py
Loading configuration file at C:\Users\rchiodo\source\testing\Testing_Pyright\pyrightconfig.json
Assuming Python version 3.11
Auto-excluding **/node_modules
Auto-excluding **/__pycache__
Auto-excluding **/.*
Found 1 source file
[FG] Long operation: checking: C:\Users\rchiodo\source\testing\Testing_Pyright\test_sympy.py (10976ms)
[FG] Long operation: analyzing: C:\Users\rchiodo\source\testing\Testing_Pyright\test_sympy.py (11196ms)
pyright 1.1.307
0 errors, 0 warnings, 0 informations
Completed in 11.764sec

Analysis stats
Total files parsed and bound: 223
Total files checked: 1

Timing stats
Find Source Files:    0sec
Read Source Files:    0.06sec
Tokenize:             0.2sec
Parse:                0.32sec
Resolve Imports:      0.31sec
Bind:                 0.37sec
Check:                9.9sec
Detect Cycles:        0sec

@rchiodo
Copy link
Collaborator

rchiodo commented Mar 13, 2024

@erictraut, I've been playing around with this to see what the difference was between those two changes. It seems like the newer change is just a lot more thorough in its evaluation of things. I added some instrumentation and found for the example, we:

  • Call assignTypes 50x more than we do in the previous change
  • Call validateArgs 10x more than we do in the previous change

I think this is because we go farther down the chain of subtypes to compute things. For instance, I put a filter in this function here to skip finding subtypes when the type comes from something in sympy:

https://github.com/microsoft/pyright/blob/0334a44b837cf799c134d4ac6405de3bf26978f6/packages/pyright-internal/src/analyzer/typeEvaluator.ts#L3867

That completely changes the amount of time we spend computing types, actually making it faster than the previous change.

Would that be a viable option? Have some way of making certain packages only explore subtypes to a certain level? (I actually implemented this in Pylance to see how it might work in practice). I don't think there's any obvious ways to fix the perf for every package though. It's really just computing the types for all of the untyped information in sympy that takes so long.

The alternative, I think, would be to generate type stubs for sympy and ship them with Pylance.

@rchiodo
Copy link
Collaborator

rchiodo commented Mar 13, 2024

I'll submit a PR with the proposed change to see if that's an option or not.

@rchiodo
Copy link
Collaborator

rchiodo commented Mar 13, 2024

I added a bunch of logging around validateArgType and assignTypes and found this difference:

These are number of times the function is called:

Post change

validateArgType:      3654
assignType:      1807613

Pre change:

validateArgType:      460
assignType:      36298

So then I did some investigation into when they start to differ, At __iterable from typing, the newer change returns isCompatible false the first time validate args is called:

Post change

validateArgType(["__iterable",7,96547) -> {"isCompatible":false,"argType":"ref","isTypeIncomplete":false,"skippedBareTypeVarExpectedType":false}]

Pre change

validateArgType(["__iterable",7,96547])->{"isCompatible":true,"argType":"ref","isTypeIncomplete":false,"skippedBareTypeVarExpectedType":false}]

So I started debugging the difference and it comes to this function call:

typeArgValue = transformExpectedType(typeArgValue, liveTypeVarScopes);

That's where's a code difference (I mean that makes sense given the diff in output). The older code checks to see if a TypeVar is live or not. The newer code doesn't. That's where the difference comes in.

But given my limited knowledge of what this is doing, that seemed on purpose to me, so not sure if I'm even going down the correct path to figure out why one was slower.

@rchiodo
Copy link
Collaborator

rchiodo commented Mar 13, 2024

I guess there's a little bit more information that I have. When I added logging for validateArgType, I could see that it was being called over and over for what looked like the same argument? Or at least the same name and type of argument.
test.txt

The majority of the output there is for an argument named __iterable.

Here's how I logged that output:

function validateArgType(
        argParam: ValidateArgTypeParams,
        typeVarContext: TypeVarContext,
        signatureTracker: UniqueSignatureTracker,
        typeResult: TypeResult<FunctionType> | undefined,
        options: ValidateArgTypeOptions
    ): ArgResult {
        const result = validateArgTypeImpl(argParam, typeVarContext, signatureTracker, typeResult, options);
        timingStats.addCallstack(argParam.paramName, argParam.paramType.category, argParam.argument.node?.id, result); // Function I added to console log callstacks
        return result;
    }

I was wondering if it would be possible to cache the ArgResult? Maybe that doesn't work or isn't computable.

@erictraut
Copy link
Collaborator

If you see it being evaluated over and over again, that's because there's a loop present, and some types depend on others. We need to evaluate types until all of them "settle" to their final values. Until then, they are marked as "incomplete". Sympy's code contains many such loops (some of them doubly or triply nested) and many tricky dependencies between many variables whose types are not annotated. Pyright already aggressively caches the already-computed types for all parse nodes, but if the cached value is marked "incomplete", its value must be invalidated when some other node's type is updated within the loop.

These are tricky issues to diagnose. I've spent an hour on this one already, but in the past, similar issues have required many hours of focused time to really understand the root cause and then formulate potential mitigations (or conclude that no such mitigation is required).

@rchiodo
Copy link
Collaborator

rchiodo commented Mar 20, 2024

There's a workaround we're going to ship with Pylance. I added stubs to the https://github.com/microsoft/python-type-stubs repository. If you download the https://github.com/microsoft/python-type-stubs/tree/main/stubs/sympy and put that tree into your typings/sympy folder, Pyright will run a LOT faster with the stubs.

Without stubs:

Loading configuration file at c:\Users\rchiodo\source\testing\Testing_Pyright\pyrightconfig.json
Assuming Python version 3.11
No include entries specified; assuming c:\Users\rchiodo\source\testing\Testing_Pyright
Auto-excluding **/node_modules
Auto-excluding **/__pycache__
Auto-excluding **/.*
Found 1 source file
[FG] Long operation: checking: file:///c%3A/Users/rchiodo/source/testing/Testing_Pyright/test_sympy.py (15660ms)
[FG] Long operation: analyzing: file:///c%3A/Users/rchiodo/source/testing/Testing_Pyright/test_sympy.py (15956ms)
pyright 1.1.351
0 errors, 0 warnings, 0 informations
Completed in 17.158sec

Analysis stats
Total files parsed and bound: 366
Total files checked: 1

Timing stats
Find Source Files:    0sec
Read Source Files:    1.8sec
Tokenize:             0.35sec
Parse:                0.51sec
Resolve Imports:      0.54sec
Bind:                 0.68sec
Check:                11.97sec
Detect Cycles:        0sec

With stubs:

Loading configuration file at c:\Users\rchiodo\source\testing\Testing_Pyright\pyrightconfig.json
Assuming Python version 3.11
No include entries specified; assuming c:\Users\rchiodo\source\testing\Testing_Pyright
Auto-excluding **/node_modules
Auto-excluding **/__pycache__
Auto-excluding **/.*
Found 1 source file
pyright 1.1.351
0 errors, 0 warnings, 0 informations
Completed in 1.047sec

Analysis stats
Total files parsed and bound: 115
Total files checked: 1

Timing stats
Find Source Files:    0sec
Read Source Files:    0.01sec
Tokenize:             0.07sec
Parse:                0.09sec
Resolve Imports:      0.2sec
Bind:                 0.1sec
Check:                0.06sec
Detect Cycles:        0sec

@oscarbenjamin
Copy link

oscarbenjamin commented Mar 20, 2024

Thank you all for looking at this.

I don't fully understand the suggestion from @rchiodo. Is it that pyright would add those stub files or that sympy could add them to achieve the attended effect?

There has been some friction around adding type hints within sympy. It is possible that using stub files would be a more workable. I don't understand the implications of there being stub files in sympy as well as type hints in the code or stub files for sympy in pyright as well as type hints in the sympy codebase.

@rchiodo
Copy link
Collaborator

rchiodo commented Mar 20, 2024

You would add those stub files to your workspace typings folder. That's essentially what Pylance is doing (we have a folder we add to the list of places to search for stubs in).

I believe sympy is working on adding types, but they haven't released in more than a year. These stubs should be a temporary workaround until sympy is fully typed.

@oscarbenjamin
Copy link

You would add those stub files to your workspace typings folder.

To be clear I am a sympy maintainer. I am asking if you are suggesting that we (sympy) should do something and/or to understand what the implications are of whatever pyright is planning to do in relation to what we are doing or might do in future.

@rchiodo
Copy link
Collaborator

rchiodo commented Mar 20, 2024

To be clear, Pyright isn't doing anything with relation to the stubs. The stubs won't every be as correct as the real sympy code and using them may cause Pyright to not catch errors/show completions/etc.

The stubs are really for Pylance. Pyright users can install them locally too (if perf of sympy is a concern), but the stubs are mostly there to make Pylance go faster.

Longer term, I hope sympy becomes fully typed and we can just delete these extra type stubs. As a maintainer of sympy, it would be great if you could get sympy typed enough that Pyright is fast to analyze sympy. That'd probably be the point where we don't need these stubs anymore.

@rchiodo
Copy link
Collaborator

rchiodo commented Mar 20, 2024

Sorry I didn't answer this question:

There has been some friction around adding type hints within sympy. It is possible that using stub files would be a more workable. I don't understand the implications of there being stub files in sympy as well as type hints in the code or stub files for sympy in pyright as well as type hints in the sympy codebase.

Sympy can also take ownership of the stubs too. I believe it works if you simply add these pyi files to your source tree. The ones I generated were from the 1.12 release of sympy though. They don't match the dev branch.

@rchiodo
Copy link
Collaborator

rchiodo commented Mar 20, 2024

The main thing that makes Pyright/Pylance slow is unannotated return values. That's mostly what's in the stubs. Annotating all of the return values in sympy would make Pyright/Pylance a lot faster.

@heejaechang
Copy link
Collaborator

You don't need to annotate every single symbol. You can

  1. annotate all public symbols.
  2. If that is too much, annotate most of the main symbols.
  3. If that is still too much, then you can annotate the core ones that others depend on.

If needed, we could provide a way for you to find (3), so you can strategically add annotations to symbols that matter most.

@oscarbenjamin
Copy link

As a maintainer of sympy, it would be great if you could get sympy typed enough that Pyright is fast to analyze sympy.

I agree but it is really hard to do. Symbolic computing is to some extent inherently weakly typed but also there are many historic design decisions in SymPy that make this harder. The big one is that in SymPy every different kind of symbolic expression node is a different Python class:

In [1]: e1 = sin(x)

In [2]: e2 = e1 + 1

In [3]: e1
Out[3]: sin(x)

In [4]: e2
Out[4]: sin(x) + 1

In [5]: type(e1)
Out[5]: sin

In [6]: type(e2)
Out[6]: sympy.core.add.Add

These types can also return different types from their constructors:

In [7]: e3 = Add(x, x) # We try to construct an Add but we get a Mul

In [8]: e3
Out[8]: 2x

In [9]: type(e3)
Out[9]: sympy.core.mul.Mul

More discussion of this is in sympy/sympy#25103 and python/mypy#15182.

@erictraut
Copy link
Collaborator

I had a chance to look more deeply at this issue. I've tracked down the problem to a particular expression that spans about 750 lines in the assumptions_generated.py file. It is assigned to a variable named full_implications. I'll look for ways to improve the type evaluation performance for this expression.

@oscarbenjamin
Copy link

Just in case everyone is not aware there was a SymPy PR (sympy/sympy#26528) related to this from @heejaechang that got merged. That probably means that there are differences between SymPy 1.12 vs current master relating to this issue.

erictraut added a commit that referenced this issue May 21, 2024
…ntry types if the tuple expression is embedded within another tuple, set, list, or dictionary expression. This addresses #7159.
@erictraut erictraut added bug Something isn't working and removed needs investigation Requires additional investigation to determine course of action labels May 22, 2024
erictraut added a commit that referenced this issue May 22, 2024
…ntry types if the tuple expression is embedded within another tuple, set, list, or dictionary expression. This addresses #7159. (#7970)
@erictraut erictraut added the addressed in next version Issue is fixed and will appear in next published version label May 22, 2024
@erictraut
Copy link
Collaborator

This is included in pyright 1.1.365.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
addressed in next version Issue is fixed and will appear in next published version bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants