Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test collection is very slow for test class TestXYZ #8415

Closed
seberg opened this issue Mar 7, 2021 · 6 comments
Closed

Test collection is very slow for test class TestXYZ #8415

seberg opened this issue Mar 7, 2021 · 6 comments
Labels
topic: collection related to the collection phase type: performance performance or memory problem/improvement

Comments

@seberg
Copy link

seberg commented Mar 7, 2021

The NumPy test collection always seemed to have crept up over time, which has been bugging me a bit for a while. Right now test collection takes 1/3 or the whole execution time (for the typical test run, it was especially annoying me when running in python3.9-dbg where the collection-phase takes even longer).

Here is the relevant part of the cProfile output (sorted by cumtime):

        1    0.000    0.000   47.352   47.352 main.py:589(perform_collect)
62851/669    0.008    0.000   46.542    0.070 {method 'extend' of 'list' objects}
93099/16003    0.050    0.000   46.539    0.003 main.py:806(genitems)
     2141    0.005    0.000   46.222    0.022 runner.py:541(collect_one_node)
     2141    0.005    0.000   46.128    0.022 runner.py:370(pytest_make_collect_report)
     2141    0.006    0.000   46.118    0.022 runner.py:317(from_call)
     2141    0.002    0.000   46.108    0.022 runner.py:371(<lambda>)
     1134    0.033    0.000   42.511    0.037 python.py:409(collect)
      161    0.001    0.000   42.299    0.263 python.py:504(collect)
     8972    0.015    0.000   39.868    0.004 {method 'sort' of 'list' objects}
    16944    0.019    0.000   39.840    0.002 python.py:440(sort_key)
    16944    0.050    0.000   39.818    0.002 python.py:324(reportinfo)
    16944    0.026    0.000   39.476    0.002 code.py:1189(getfslineno)
      974    0.945    0.001   38.952    0.040 source.py:119(findsource)
      974    0.010    0.000   37.794    0.039 inspect.py:809(findsource)

Digging into it, at least for NumPy that findsource is only used to get the lineno of class TestFunction style test-classes. If I do a silly change (this is obviously nonsense, so probably might as ):

diff --git a/src/_pytest/_code/code.py b/src/_pytest/_code/code.py
index b85217560..45d985f8e 100644
--- a/src/_pytest/_code/code.py
+++ b/src/_pytest/_code/code.py
@@ -1210,11 +1210,6 @@ def getfslineno(obj: object) -> Tuple[Union[str, Path], int]:
 
         fspath = fn and absolutepath(fn) or ""
         lineno = -1
-        if fspath:
-            try:
-                _, lineno = findsource(obj)
-            except OSError:
-                pass
         return fspath, lineno
 
     return code.path, code.firstlineno

(or any other logic). EDIT: Forgot to say, this gives me about ~6x speedup of collection, I think.

The problem here is can be semi-mitigated in NumPy, or be blamed on the slow AST parser called by inspect. It is that we have large test files with quite many class TestFunction classes, and that effectively scales quadratic, since the AST parser takes longer the more tests there are!

Anyway, this might just be a duplicate of gh-2206 but I am wondering if there isn't some trivial solution, maybe the lineno can just be avoided since the module.__dict__.keys() should is probably ordered, at least on newer Python versions?

@RonnyPfannschmidt
Copy link
Member

this could be a easy win

@RonnyPfannschmidt
Copy link
Member

its using the wrong findsource, it should use inspect instead of source one

@seberg
Copy link
Author

seberg commented Mar 7, 2021

But it seems the source.py one (in pytest) just uses the inspect module one here? If its an easy win, that would be great! I admit, that while it seems to make a big difference in NumPy, the by far worst file does have 9000 lines of code and 70 test classes (or close to that)... So it might just be a backburner that doesn't matter much...

@Zac-HD Zac-HD added topic: collection related to the collection phase type: performance performance or memory problem/improvement labels Mar 7, 2021
@RonnyPfannschmidt
Copy link
Member

hmm, i took a glance again, i think its fair to just skip it for classes, but i wont get to try that and the rammifications this week

@asottile
Copy link
Member

#7848 (incomplete) plans to use dict ordering to solve this same problem -- just need to handle the inheritance case which I've been too busy to take another look at

@bluetech
Copy link
Member

bluetech commented Oct 9, 2021

Using Python 3.9.7, running pytest --co -k xxxxxx numpy/

  • pytest main (upcoming 6.3.0): 5.66s
  • pytest 6.2.5: 54.23s (!)

The main culprit was indeed #7848. Closing as fixed by #9144.

@bluetech bluetech closed this as completed Oct 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: collection related to the collection phase type: performance performance or memory problem/improvement
Projects
None yet
Development

No branches or pull requests

5 participants