Performance fixes #4237

boxed · 2018-10-25T14:12:53Z

TODO:

merge _getconftestmodules: use functools.lru_cache #4247
merge collection: _recurse: skip __pycache__ #4250
rebase on features

RonnyPfannschmidt · 2018-10-25T15:02:31Z

src/_pytest/pathlib.py

@@ -305,3 +305,8 @@ def fnmatch_ex(pattern, path):
    else:
        name = six.text_type(path)
    return fnmatch.fnmatch(name, pattern)
+
+
+def parts(s):


please take a look if a Path.parents or Path.parts works out here as well as this looks like a nice potential match

Using p = Path(s); list(p.parents) + [p] makes it slower.
This would only make sense if it would be a Path already I think.

No surprise there really. Path.parents is a pretty complex chain of stuff that is hard to follow. Would be amazing if it wasn't slower than the simple string operations in this function.

RonnyPfannschmidt · 2018-10-25T15:09:53Z

i believe a more critical element is actually turning the paths variable into a set, at first glance it doesn't seem to be used anywhere else, and using issubset instead of any should be a massive boost since it turns a linear search loop at the python level into a hash membership test loop on the c level

boxed · 2018-10-25T15:27:59Z

I don't think a set operator is a big win because we're talking about small numbers here, but I'll certainly give it a try tomorrow.

There's another issue with this code in that it searches all the way up to / which makes no sense. It should stop at the base directory of tests (current working directory?). Otherwise pytest is needlessly slower the deeper in the directory tree you have your code.

blueyed · 2018-10-25T15:34:01Z

should stop at the base directory of tests (current working directory?)

That would be rootdir, but in this case the argpath could/should be used probably.

issubset

Currently it checks if any of the parts are in already handled paths, which appears to be different from using issubset, no?
That would rather be isdisjoint then (https://docs.python.org/3.7/library/stdtypes.html?highlight=issubset#frozenset.isdisjoint).

blueyed · 2018-10-25T15:38:17Z

I'm looking into this now also - @boxed, is it OK to push things here, or should I rather post diffs/patches?

boxed · 2018-10-25T15:52:39Z

Push away

boxed · 2018-10-25T15:53:57Z

I tried to find out why for 1000 files there were 2 million calls to stat but I had trouble figuring this out. Maybe a lot of micro optimizations is what's needed for that issue.

codecov · 2018-10-25T16:06:28Z

Codecov Report

Merging #4237 into features will increase coverage by 0.09%.
The diff coverage is 100%.

@@             Coverage Diff              @@
##           features    #4237      +/-   ##
============================================
+ Coverage     95.75%   95.84%   +0.09%     
============================================
  Files           111      111              
  Lines         24794    24804      +10     
  Branches       2420     2422       +2     
============================================
+ Hits          23741    23773      +32     
+ Misses          751      735      -16     
+ Partials        302      296       -6

Flag	Coverage Δ
#docs	`28.95% <91.3%> (?)`
#doctesting	`28.95% <91.3%> (?)`
#linting	`28.95% <91.3%> (?)`
#linux	`95.62% <100%> (ø)`	⬆️
#nobyte	`91.84% <100%> (+0.63%)`	⬆️
#numpy	`93% <100%> (+51.34%)`	⬆️
#pexpect	`41.69% <91.3%> (+0.02%)`	⬆️
#py27	`94.01% <100%> (ø)`	⬆️
#py34	`92.17% <100%> (+0.19%)`	⬆️
#py35	`92.19% <100%> (+0.2%)`	⬆️
#py36	`93.93% <100%> (+0.17%)`	⬆️
#py37	`92.24% <100%> (+0.06%)`	⬆️
#trial	`93% <100%> (+51.34%)`	⬆️
#windows	`93.98% <100%> (+2.99%)`	⬆️
#xdist	`93.73% <100%> (+0.15%)`	⬆️

Impacted Files	Coverage Δ
src/_pytest/pathlib.py	`90.34% <100%> (+0.16%)`	⬆️
src/_pytest/python.py	`95.58% <100%> (ø)`	⬆️
src/_pytest/main.py	`96.85% <100%> (+0.04%)`	⬆️
testing/acceptance_test.py	`98.27% <0%> (+1.07%)`	⬆️
src/_pytest/capture.py	`93.87% <0%> (+1.36%)`	⬆️
src/_pytest/assertion/util.py	`98.13% <0%> (+5.14%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f2cebce...023e1c7. Read the comment docs.

blueyed · 2018-10-25T16:06:45Z

I tried to find out why for 1000 files there were 2 million calls to stat

Some of this might also be due to .pyc files being considered (which I've changed by skipping __pycache__).

blueyed · 2018-10-25T16:10:12Z

Re stopping at the base dir: this appears to be the case already, no? parts is "only" used for checking if it is disjoint with already collected paths, but it could be only the relevant parts then, i.e. up to rootdir - but that is likely not a big deal I would assume.

RonnyPfannschmidt · 2018-10-25T16:18:44Z

how much improvement did the set usage give?

blueyed · 2018-10-25T16:22:45Z

It's in the commit message: 4.45s => 3.55s
Would likely be bigger without the previous change to handle dirs only once.

boxed · 2018-10-25T16:46:52Z

Re skipping pycache: we should explicitly skip pyc files for py2.7 too right? I did that change and it gave a modest speed boost so I skipped it while looking for bigger fish. Should have kept it in hindsight.

boxed · 2018-10-25T16:48:04Z

Re stopping at the base die: I mean the paths function should stop at the base path. Currently it returns "/" as the first element always which is rather silly.

boxed · 2018-10-25T16:48:50Z

Actually, why aren't we skipping non-py-files explicitly?

blueyed · 2018-10-25T16:49:02Z

I also noticed that between 3.7.0 (which I tested my performance fix on) and master, more performance regressions have landed, resulting in my test going from 7 seconds (in 3.7 with my patch) to 9.3 seconds.

My times (using https://github.com/blueyed/dotfiles/blob/abe59a331eb0aeebccba55516ffbf885c577e8c6/usr/bin/timeit-shell),
and the script from #2206 (comment):

3.6.4: 2.69
3.7.0: 7.18
features branch (f466105): 7.78
master (ccdb248): 7.96
Performance fixes #4237 (initial): 4.9
Performance fixes #4237 (2eab084): 3.58
Performance fixes #4237 (2eab084, rebased on features): 3.58

@RonnyPfannschmidt
With #4237 (2eab084) and 1000x10 files pre-collection is not really noticeable anymore and times change from
351.50s user 10.07s system 70% cpu 8:31.79 total
to
72.00s user 4.34s system 98% cpu 1:17.18 total
(I might have had a cold cache there initially though)

boxed · 2018-10-25T16:53:46Z

Cool! Can't wait to try this when I get in tomorrow at work!

blueyed · 2018-10-25T16:55:04Z

Re skipping pycache: we should explicitly skip pyc files for py2.7 too right?

Yes, although IIRC it was easier / better to just check the dir upfront, but could also be done in a special way for py27, of course.

Re stopping at the base die: I mean the paths function should stop at the base path. Currently it returns "/" as the first element always which is rather silly.

You mean the "parts" function, right?
This is only string manipulation currently, and it might be more overhead to handle stopping after all, but have not looked to closely. AFAIK this should not cause any disk I/O though.

Actually, why aren't we skipping non-py-files explicitly?

pytest_collect_file handles this (via python_files setting) currently. I guess there are valid use cases for non-.py files (e.g. doctests).

boxed · 2018-10-25T17:04:51Z

Re paths function: yep. Could be make fast with just s[len(basepath):] before splitting shouldn't add too much complexity either. Maybe a bit premature though.

blueyed · 2018-10-25T18:23:36Z

Re ignoring .pyc files: _recurse gets called with directories only, so .pyc cannot be skipped there, but I've added it through a filter.
Currently investigating why there is _recurse twice (for session and python). It seems that python's collect could be improved in this regard altogether (#4241 came out of it already).

boxed · 2018-10-26T06:29:46Z

Yep, this brings time almost back to the 3.4 level on my machine at work for the example script.

Oooh, but it brings the time down to 4.5 on the test at work, where 6s is the benchmark (although 3.6.0 was 3s). This is good stuff!

RonnyPfannschmidt

this looks great so far 👍 , would you like to investigate some more of the loose ends or would you like to get this merged to proceed on a new pr?

boxed · 2018-10-26T06:54:36Z

The test suite at TriOptima now has 104k calls to posix.stat (compared to millions before). Huge improvement! But it's still 0.8 seconds just for all the posix.stat calls. 2906 files and folders (after I delete all pychache dirs and not counting things excluded by norecusedirs). That's 104273/2906 ~= 35.88 posix.stat per file. There are 1767 .py files, so it's 59.0 stat calls per file if you just count that.

RonnyPfannschmidt · 2018-10-26T07:05:38Z

@boxed lovely stats - i believe we may need to create a little trace helper to trace the stats/assign them context - i believe the import system of python generates dozens of them for trying different locations in sys.path for example - i believe there is room for improvement there but its not yet clear how to materialize it yet

right now we cant even clearly measure ^^

boxed · 2018-10-26T07:14:04Z

I guess some of those stat calls are for python to just check if it needs to rebuild a pyc file. I made a small script that basically just imports the entire app and I get 11586 stat calls for that, so that doesn't change the math much.

boxed · 2018-10-26T07:23:38Z

Hmm... interestingly we get 284957 calls to posix.stat for my toy example script, which is drastically more than for our production code base where the directory has a huge venv. I don't think this should be too hard to figure out. I'll give it a shot today.

boxed · 2018-10-26T07:25:27Z

Oh, and to answer the question on merging this.. I am not in a huge hurry to merge no. I'd like to spend today on this at least and if I can't find anything more then I guess it makes sense to merge and then start with a new PR.

boxed · 2018-10-26T13:14:50Z

Not at this time. But it might be worth a look if someone can get to the bottom of the assert rewrite import hook performance problems.

Also renames `_path2confmods` to `_dirpath2confmods` for clarity (it is expected to be a dirpath in `_importconftest`). Uses an explicit maxsize, since it appears to be only relevant for a short period [1]. 1: pytest-dev#4237 (comment)

Also renames `_path2confmods` to `_dirpath2confmods` for clarity (it is expected to be a dirpath in `_importconftest`). Uses an explicit maxsize, since it appears to be only relevant for a short period [1]. Removes the lru_cache on _getconftest_pathlist, which makes no difference when caching _getconftestmodules, at least with the performance test of 100x10 files (pytest-dev#4237). 1: pytest-dev#4237 (comment)

nicoddemus · 2018-10-30T11:29:28Z

After the PRs related to performance are merged, I think we are in good shape for a 3.10 release. 👍

Also renames `_path2confmods` to `_dirpath2confmods` for clarity (it is expected to be a dirpath in `_importconftest`). Uses an explicit maxsize, since it appears to be only relevant for a short period [1]. Removes the lru_cache on _getconftest_pathlist, which makes no difference when caching _getconftestmodules, at least with the performance test of 100x10 files (pytest-dev#4237). 1: pytest-dev#4237 (comment)

blueyed · 2018-11-01T15:25:25Z

Rebased (old HEAD: 5850de6).

I've run the timing on each commit again.
Looks like we might have some performance degration on features inbetween already, since the times are slower in general (see commit msgs) - might be worth investigating?!

Some data points:

master (56e6bb0): 11.4s (later: raw times: 11.6 sec, 11.7 sec, 11.8 sec, 11.9 sec, 12.2 sec)
features (currently f2cebce): 8.97s
f466105 (features~10): 12.3s (later: raw times: 11.8 sec, 12.5 sec, 12 sec, 12 sec, 11.8 sec)
(master and features~10) looks similar after all.

blueyed · 2018-11-01T21:00:40Z

src/_pytest/main.py

@@ -469,7 +470,7 @@ def _perform_collect(self, args, genitems):
            return items

    def collect(self):
-        for parts in self._initialparts:
+        for parts in self._initialparts:  # noqa: F402


I do not really like this (it is needed because of the imported path being shadowed).
Suggestions?

for initialpart in self._initialparts?

boxed · 2018-11-02T08:43:19Z

@blueyed So the performance on features is ok with this patch? I am confused by your last comment :P

boxed · 2018-11-02T13:52:24Z

Looking a bit at the performance again, using my test script with many dirs again and just running the collect, I get ~3.8s (with the profiler on), and 0.1 of that is spent in normpath (22k calls!) which seems weird to me. Looking at posixpath.py, normpath looks like this:

def normpath(path):
    """Normalize path, eliminating double slashes, etc."""
    path = os.fspath(path)
    if isinstance(path, bytes):
        sep = b'/'
        empty = b''
        dot = b'.'
        dotdot = b'..'
    else:
        sep = '/'
        empty = ''
        dot = '.'
        dotdot = '..'
    if path == empty:
        return dot
    initial_slashes = path.startswith(sep)
    # POSIX allows one or two initial slashes, but treats three or more
    # as single slash.
    if (initial_slashes and
        path.startswith(sep*2) and not path.startswith(sep*3)):
        initial_slashes = 2
    comps = path.split(sep)
    new_comps = []
    for comp in comps:
        if comp in (empty, dot):
            continue
        if (comp != dotdot or (not initial_slashes and not new_comps) or
             (new_comps and new_comps[-1] == dotdot)):
            new_comps.append(comp)
        elif new_comps:
            new_comps.pop()
    comps = new_comps
    path = sep.join(comps)
    if initial_slashes:
        path = sep*initial_slashes + path
    return path or dot

seems to be doing an awful lot! Trying it in ipython:

In [10]: %timeit normpath('/foo/bar/baz')                                                                                     
2.84 µs ± 49.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

hmm... I try this:

In [11]: %paste 
def fast_normpath(path):
    if '.' in path or '//' in path:
        return normpath(path)
    else:
        return path
In [12]: %timeit fast_normpath('/foo/bar/baz')                                                                                
177 ns ± 0.635 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

So this simple optimization seems to make it 83 times faster in the (normal?) case of the paths not needing normalization. It does slow down the normalizing case:

In [15]: %timeit normpath('/foo/bar/baz//asd/../123')                                                                         
4 µs ± 59.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [16]: %timeit fast_normpath('/foo/bar/baz//asd/../123')                                                                    
4.3 µs ± 85.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

but not by much. Is this stuff worth sending a PR to CPython you think?

Time: 8.53s => 5.73s

Time: 5.73s => 5.88s/5.82s

Time: 5.73s/5.88s => 5.36s (Before rebase: 4.86s => 4.45s)

Time: 5.36s => 4.85s (before rebase: 4.45s => 3.55s)

blueyed · 2018-11-02T16:46:49Z

So the performance on features is ok with this patch? I am confused by your last comment :P

Fixed the formatting.
It seemed like (initial) times were slower compared to when we started working on it (and that might still be the case). I've tried to see if there was some performance regression inbetween on features, but could not find it.
Initially this was also done against master (which I forgot about), so that might have added to my confusion.

Is this stuff worth sending a PR to CPython you think?

Sure.
OTOH 0.1 out of 3.8 is not very much for us, and 22k calls is really too much for the 1.3k files in the test case IMHO.

I've removed the noqa, and think we're good to merge this already.
normpath and other optimizations can be done as a followup.

nicoddemus · 2018-11-02T19:52:23Z

Hmm AppVeyor is having trouble to clone the repository...

blueyed · 2018-11-02T22:49:05Z

Re-started.. likely due to GitHub issues.

boxed · 2018-11-02T23:53:31Z

What does AppVeyor do that travis does not?

blueyed · 2018-11-03T02:51:52Z

Nothing likely, but it is usually behind, i.e. falls into other time windows.
Works by now: https://ci.appveyor.com/project/pytestbot/pytest/builds/20019105

btw: I still think it would be enough to only test a smaller subset on AppVeyor, given that it often takes hours to get results / finally green PRs.

boxed · 2018-11-03T13:45:28Z

🎉

boxed mentioned this pull request Oct 25, 2018

rough performance analysis of the current internals #2206

Open

RonnyPfannschmidt reviewed Oct 25, 2018

View reviewed changes

blueyed mentioned this pull request Oct 25, 2018

Move handling of duplicate files #4241

Merged

blueyed force-pushed the master branch from e609bf1 to 5850de6 Compare October 25, 2018 18:16

RonnyPfannschmidt approved these changes Oct 26, 2018

View reviewed changes

blueyed mentioned this pull request Oct 26, 2018

collection: _recurse: skip __pycache__ #4250

Merged

nicoddemus changed the title ~~Performance fix~~ WIP Performance fix (rebase on features) Oct 27, 2018

blueyed mentioned this pull request Oct 30, 2018

[WIP] Track visited files and directories when collecting #4203

Closed

blueyed force-pushed the master branch from 5850de6 to 6edcefa Compare November 1, 2018 15:24

blueyed changed the base branch from master to features November 1, 2018 15:52

blueyed changed the title ~~WIP Performance fix (rebase on features)~~ Performance fixes Nov 1, 2018

blueyed reviewed Nov 1, 2018

View reviewed changes

boxed and others added 4 commits November 2, 2018 17:42

collection: performance: use optimized parts function

a41820f

Time: 8.53s => 5.73s

Minor refactor for readability

2b50911

Time: 5.73s => 5.88s/5.82s

Handle dirs only once

6ffa347

Time: 5.73s/5.88s => 5.36s (Before rebase: 4.86s => 4.45s)

paths: use set and isdisjoint

023e1c7

Time: 5.36s => 4.85s (before rebase: 4.45s => 3.55s)

blueyed force-pushed the master branch from 6edcefa to 023e1c7 Compare November 2, 2018 16:46

nicoddemus merged commit 4cb838d into pytest-dev:features Nov 3, 2018

Performance fixes #4237

Performance fixes #4237

Conversation

boxed commented Oct 25, 2018 • edited by blueyed

RonnyPfannschmidt Oct 25, 2018

Choose a reason for hiding this comment

blueyed Oct 25, 2018

Choose a reason for hiding this comment

boxed Nov 2, 2018

Choose a reason for hiding this comment

RonnyPfannschmidt commented Oct 25, 2018

boxed commented Oct 25, 2018

blueyed commented Oct 25, 2018

blueyed commented Oct 25, 2018 • edited

boxed commented Oct 25, 2018

boxed commented Oct 25, 2018

codecov bot commented Oct 25, 2018 • edited

Codecov Report

blueyed commented Oct 25, 2018

blueyed commented Oct 25, 2018

RonnyPfannschmidt commented Oct 25, 2018

blueyed commented Oct 25, 2018

boxed commented Oct 25, 2018

boxed commented Oct 25, 2018

boxed commented Oct 25, 2018

blueyed commented Oct 25, 2018

boxed commented Oct 25, 2018

blueyed commented Oct 25, 2018

boxed commented Oct 25, 2018

blueyed commented Oct 25, 2018

boxed commented Oct 26, 2018

RonnyPfannschmidt left a comment

Choose a reason for hiding this comment

boxed commented Oct 26, 2018

RonnyPfannschmidt commented Oct 26, 2018

boxed commented Oct 26, 2018

boxed commented Oct 26, 2018

boxed commented Oct 26, 2018

boxed commented Oct 26, 2018

nicoddemus commented Oct 30, 2018

blueyed commented Nov 1, 2018 • edited

blueyed Nov 1, 2018

Choose a reason for hiding this comment

boxed Nov 2, 2018

Choose a reason for hiding this comment

boxed commented Nov 2, 2018

boxed commented Nov 2, 2018

blueyed commented Nov 2, 2018

nicoddemus commented Nov 2, 2018

blueyed commented Nov 2, 2018

boxed commented Nov 2, 2018

blueyed commented Nov 3, 2018 • edited

boxed commented Nov 3, 2018

boxed commented Oct 25, 2018 •

edited by blueyed

blueyed commented Oct 25, 2018 •

edited

codecov bot commented Oct 25, 2018 •

edited

blueyed commented Nov 1, 2018 •

edited

blueyed commented Nov 3, 2018 •

edited