Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance fixes #4237

Merged
merged 4 commits into from Nov 3, 2018
Merged

Performance fixes #4237

merged 4 commits into from Nov 3, 2018

Conversation

boxed
Copy link
Contributor

@boxed boxed commented Oct 25, 2018

@@ -305,3 +305,8 @@ def fnmatch_ex(pattern, path):
else:
name = six.text_type(path)
return fnmatch.fnmatch(name, pattern)


def parts(s):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please take a look if a Path.parents or Path.parts works out here as well as this looks like a nice potential match

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using p = Path(s); list(p.parents) + [p] makes it slower.
This would only make sense if it would be a Path already I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No surprise there really. Path.parents is a pretty complex chain of stuff that is hard to follow. Would be amazing if it wasn't slower than the simple string operations in this function.

@RonnyPfannschmidt
Copy link
Member

i believe a more critical element is actually turning the paths variable into a set, at first glance it doesn't seem to be used anywhere else, and using issubset instead of any should be a massive boost since it turns a linear search loop at the python level into a hash membership test loop on the c level

@boxed
Copy link
Contributor Author

boxed commented Oct 25, 2018

I don't think a set operator is a big win because we're talking about small numbers here, but I'll certainly give it a try tomorrow.

There's another issue with this code in that it searches all the way up to / which makes no sense. It should stop at the base directory of tests (current working directory?). Otherwise pytest is needlessly slower the deeper in the directory tree you have your code.

@blueyed
Copy link
Contributor

blueyed commented Oct 25, 2018

should stop at the base directory of tests (current working directory?)

That would be rootdir, but in this case the argpath could/should be used probably.

issubset

Currently it checks if any of the parts are in already handled paths, which appears to be different from using issubset, no?
That would rather be isdisjoint then (https://docs.python.org/3.7/library/stdtypes.html?highlight=issubset#frozenset.isdisjoint).

@blueyed
Copy link
Contributor

blueyed commented Oct 25, 2018

I'm looking into this now also - @boxed, is it OK to push things here, or should I rather post diffs/patches?

@boxed
Copy link
Contributor Author

boxed commented Oct 25, 2018

Push away

@boxed
Copy link
Contributor Author

boxed commented Oct 25, 2018

I tried to find out why for 1000 files there were 2 million calls to stat but I had trouble figuring this out. Maybe a lot of micro optimizations is what's needed for that issue.

@codecov
Copy link

codecov bot commented Oct 25, 2018

Codecov Report

Merging #4237 into features will increase coverage by 0.09%.
The diff coverage is 100%.

Impacted file tree graph

@@             Coverage Diff              @@
##           features    #4237      +/-   ##
============================================
+ Coverage     95.75%   95.84%   +0.09%     
============================================
  Files           111      111              
  Lines         24794    24804      +10     
  Branches       2420     2422       +2     
============================================
+ Hits          23741    23773      +32     
+ Misses          751      735      -16     
+ Partials        302      296       -6
Flag Coverage Δ
#docs 28.95% <91.3%> (?)
#doctesting 28.95% <91.3%> (?)
#linting 28.95% <91.3%> (?)
#linux 95.62% <100%> (ø) ⬆️
#nobyte 91.84% <100%> (+0.63%) ⬆️
#numpy 93% <100%> (+51.34%) ⬆️
#pexpect 41.69% <91.3%> (+0.02%) ⬆️
#py27 94.01% <100%> (ø) ⬆️
#py34 92.17% <100%> (+0.19%) ⬆️
#py35 92.19% <100%> (+0.2%) ⬆️
#py36 93.93% <100%> (+0.17%) ⬆️
#py37 92.24% <100%> (+0.06%) ⬆️
#trial 93% <100%> (+51.34%) ⬆️
#windows 93.98% <100%> (+2.99%) ⬆️
#xdist 93.73% <100%> (+0.15%) ⬆️
Impacted Files Coverage Δ
src/_pytest/pathlib.py 90.34% <100%> (+0.16%) ⬆️
src/_pytest/python.py 95.58% <100%> (ø) ⬆️
src/_pytest/main.py 96.85% <100%> (+0.04%) ⬆️
testing/acceptance_test.py 98.27% <0%> (+1.07%) ⬆️
src/_pytest/capture.py 93.87% <0%> (+1.36%) ⬆️
src/_pytest/assertion/util.py 98.13% <0%> (+5.14%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f2cebce...023e1c7. Read the comment docs.

@blueyed
Copy link
Contributor

blueyed commented Oct 25, 2018

I tried to find out why for 1000 files there were 2 million calls to stat

Some of this might also be due to .pyc files being considered (which I've changed by skipping __pycache__).

@blueyed
Copy link
Contributor

blueyed commented Oct 25, 2018

Re stopping at the base dir: this appears to be the case already, no? parts is "only" used for checking if it is disjoint with already collected paths, but it could be only the relevant parts then, i.e. up to rootdir - but that is likely not a big deal I would assume.

@RonnyPfannschmidt
Copy link
Member

how much improvement did the set usage give?

@blueyed
Copy link
Contributor

blueyed commented Oct 25, 2018

It's in the commit message: 4.45s => 3.55s
Would likely be bigger without the previous change to handle dirs only once.

@boxed
Copy link
Contributor Author

boxed commented Oct 25, 2018

Re skipping pycache: we should explicitly skip pyc files for py2.7 too right? I did that change and it gave a modest speed boost so I skipped it while looking for bigger fish. Should have kept it in hindsight.

@boxed
Copy link
Contributor Author

boxed commented Oct 25, 2018

Re stopping at the base die: I mean the paths function should stop at the base path. Currently it returns "/" as the first element always which is rather silly.

@boxed
Copy link
Contributor Author

boxed commented Oct 25, 2018

Actually, why aren't we skipping non-py-files explicitly?

@blueyed
Copy link
Contributor

blueyed commented Oct 25, 2018

I also noticed that between 3.7.0 (which I tested my performance fix on) and master, more performance regressions have landed, resulting in my test going from 7 seconds (in 3.7 with my patch) to 9.3 seconds.

My times (using https://github.com/blueyed/dotfiles/blob/abe59a331eb0aeebccba55516ffbf885c577e8c6/usr/bin/timeit-shell),
and the script from #2206 (comment):

@RonnyPfannschmidt
With #4237 (2eab084) and 1000x10 files pre-collection is not really noticeable anymore and times change from
351.50s user 10.07s system 70% cpu 8:31.79 total
to
72.00s user 4.34s system 98% cpu 1:17.18 total
(I might have had a cold cache there initially though)

@boxed
Copy link
Contributor Author

boxed commented Oct 25, 2018

Cool! Can't wait to try this when I get in tomorrow at work!

@blueyed
Copy link
Contributor

blueyed commented Oct 25, 2018

Re skipping pycache: we should explicitly skip pyc files for py2.7 too right?

Yes, although IIRC it was easier / better to just check the dir upfront, but could also be done in a special way for py27, of course.

Re stopping at the base die: I mean the paths function should stop at the base path. Currently it returns "/" as the first element always which is rather silly.

You mean the "parts" function, right?
This is only string manipulation currently, and it might be more overhead to handle stopping after all, but have not looked to closely. AFAIK this should not cause any disk I/O though.

Actually, why aren't we skipping non-py-files explicitly?

pytest_collect_file handles this (via python_files setting) currently. I guess there are valid use cases for non-.py files (e.g. doctests).

@boxed
Copy link
Contributor Author

boxed commented Oct 25, 2018

Re paths function: yep. Could be make fast with just s[len(basepath):] before splitting shouldn't add too much complexity either. Maybe a bit premature though.

@blueyed
Copy link
Contributor

blueyed commented Oct 25, 2018

Re ignoring .pyc files: _recurse gets called with directories only, so .pyc cannot be skipped there, but I've added it through a filter.
Currently investigating why there is _recurse twice (for session and python). It seems that python's collect could be improved in this regard altogether (#4241 came out of it already).

@boxed
Copy link
Contributor Author

boxed commented Oct 26, 2018

Yep, this brings time almost back to the 3.4 level on my machine at work for the example script.

Oooh, but it brings the time down to 4.5 on the test at work, where 6s is the benchmark (although 3.6.0 was 3s). This is good stuff!

Copy link
Member

@RonnyPfannschmidt RonnyPfannschmidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks great so far 👍 , would you like to investigate some more of the loose ends or would you like to get this merged to proceed on a new pr?

@boxed
Copy link
Contributor Author

boxed commented Oct 26, 2018

The test suite at TriOptima now has 104k calls to posix.stat (compared to millions before). Huge improvement! But it's still 0.8 seconds just for all the posix.stat calls. 2906 files and folders (after I delete all pychache dirs and not counting things excluded by norecusedirs). That's 104273/2906 ~= 35.88 posix.stat per file. There are 1767 .py files, so it's 59.0 stat calls per file if you just count that.

@RonnyPfannschmidt
Copy link
Member

@boxed lovely stats - i believe we may need to create a little trace helper to trace the stats/assign them context - i believe the import system of python generates dozens of them for trying different locations in sys.path for example - i believe there is room for improvement there but its not yet clear how to materialize it yet

right now we cant even clearly measure ^^

@boxed
Copy link
Contributor Author

boxed commented Oct 26, 2018

I guess some of those stat calls are for python to just check if it needs to rebuild a pyc file. I made a small script that basically just imports the entire app and I get 11586 stat calls for that, so that doesn't change the math much.

@boxed
Copy link
Contributor Author

boxed commented Oct 26, 2018

Hmm... interestingly we get 284957 calls to posix.stat for my toy example script, which is drastically more than for our production code base where the directory has a huge venv. I don't think this should be too hard to figure out. I'll give it a shot today.

@boxed
Copy link
Contributor Author

boxed commented Oct 26, 2018

Oh, and to answer the question on merging this.. I am not in a huge hurry to merge no. I'd like to spend today on this at least and if I can't find anything more then I guess it makes sense to merge and then start with a new PR.

@boxed
Copy link
Contributor Author

boxed commented Oct 26, 2018

Not at this time. But it might be worth a look if someone can get to the bottom of the assert rewrite import hook performance problems.

blueyed added a commit to blueyed/pytest that referenced this pull request Oct 26, 2018
Also renames `_path2confmods` to `_dirpath2confmods` for clarity (it is
expected to be a dirpath in `_importconftest`).

Uses an explicit maxsize, since it appears to be only relevant for a
short period [1].

1: pytest-dev#4237 (comment)
blueyed added a commit to blueyed/pytest that referenced this pull request Oct 26, 2018
Also renames `_path2confmods` to `_dirpath2confmods` for clarity (it is
expected to be a dirpath in `_importconftest`).

Uses an explicit maxsize, since it appears to be only relevant for a
short period [1].

Removes the lru_cache on _getconftest_pathlist, which makes no
difference when caching _getconftestmodules, at least with the
performance test of 100x10 files (pytest-dev#4237).

1: pytest-dev#4237 (comment)
@nicoddemus nicoddemus changed the title Performance fix WIP Performance fix (rebase on features) Oct 27, 2018
blueyed added a commit to blueyed/pytest that referenced this pull request Oct 27, 2018
Also renames `_path2confmods` to `_dirpath2confmods` for clarity (it is
expected to be a dirpath in `_importconftest`).

Uses an explicit maxsize, since it appears to be only relevant for a
short period [1].

Removes the lru_cache on _getconftest_pathlist, which makes no
difference when caching _getconftestmodules, at least with the
performance test of 100x10 files (pytest-dev#4237).

1: pytest-dev#4237 (comment)
blueyed added a commit to blueyed/pytest that referenced this pull request Oct 30, 2018
Also renames `_path2confmods` to `_dirpath2confmods` for clarity (it is
expected to be a dirpath in `_importconftest`).

Uses an explicit maxsize, since it appears to be only relevant for a
short period [1].

Removes the lru_cache on _getconftest_pathlist, which makes no
difference when caching _getconftestmodules, at least with the
performance test of 100x10 files (pytest-dev#4237).

1: pytest-dev#4237 (comment)
@nicoddemus
Copy link
Member

After the PRs related to performance are merged, I think we are in good shape for a 3.10 release. 👍

blueyed added a commit to blueyed/pytest that referenced this pull request Oct 30, 2018
Also renames `_path2confmods` to `_dirpath2confmods` for clarity (it is
expected to be a dirpath in `_importconftest`).

Uses an explicit maxsize, since it appears to be only relevant for a
short period [1].

Removes the lru_cache on _getconftest_pathlist, which makes no
difference when caching _getconftestmodules, at least with the
performance test of 100x10 files (pytest-dev#4237).

1: pytest-dev#4237 (comment)
blueyed added a commit to blueyed/pytest that referenced this pull request Oct 31, 2018
Also renames `_path2confmods` to `_dirpath2confmods` for clarity (it is
expected to be a dirpath in `_importconftest`).

Uses an explicit maxsize, since it appears to be only relevant for a
short period [1].

Removes the lru_cache on _getconftest_pathlist, which makes no
difference when caching _getconftestmodules, at least with the
performance test of 100x10 files (pytest-dev#4237).

1: pytest-dev#4237 (comment)
@blueyed
Copy link
Contributor

blueyed commented Nov 1, 2018

Rebased (old HEAD: 5850de6).

I've run the timing on each commit again.
Looks like we might have some performance degration on features inbetween already, since the times are slower in general (see commit msgs) - might be worth investigating?!

Some data points:

master (56e6bb0): 11.4s (later: raw times: 11.6 sec, 11.7 sec, 11.8 sec, 11.9 sec, 12.2 sec)
features (currently f2cebce): 8.97s
f466105 (features~10): 12.3s (later: raw times: 11.8 sec, 12.5 sec, 12 sec, 12 sec, 11.8 sec)
(master and features~10) looks similar after all.

@blueyed blueyed changed the base branch from master to features November 1, 2018 15:52
@blueyed blueyed changed the title WIP Performance fix (rebase on features) Performance fixes Nov 1, 2018
@@ -469,7 +470,7 @@ def _perform_collect(self, args, genitems):
return items

def collect(self):
for parts in self._initialparts:
for parts in self._initialparts: # noqa: F402
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not really like this (it is needed because of the imported path being shadowed).
Suggestions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for initialpart in self._initialparts?

@boxed
Copy link
Contributor Author

boxed commented Nov 2, 2018

@blueyed So the performance on features is ok with this patch? I am confused by your last comment :P

@boxed
Copy link
Contributor Author

boxed commented Nov 2, 2018

Looking a bit at the performance again, using my test script with many dirs again and just running the collect, I get ~3.8s (with the profiler on), and 0.1 of that is spent in normpath (22k calls!) which seems weird to me. Looking at posixpath.py, normpath looks like this:

def normpath(path):
    """Normalize path, eliminating double slashes, etc."""
    path = os.fspath(path)
    if isinstance(path, bytes):
        sep = b'/'
        empty = b''
        dot = b'.'
        dotdot = b'..'
    else:
        sep = '/'
        empty = ''
        dot = '.'
        dotdot = '..'
    if path == empty:
        return dot
    initial_slashes = path.startswith(sep)
    # POSIX allows one or two initial slashes, but treats three or more
    # as single slash.
    if (initial_slashes and
        path.startswith(sep*2) and not path.startswith(sep*3)):
        initial_slashes = 2
    comps = path.split(sep)
    new_comps = []
    for comp in comps:
        if comp in (empty, dot):
            continue
        if (comp != dotdot or (not initial_slashes and not new_comps) or
             (new_comps and new_comps[-1] == dotdot)):
            new_comps.append(comp)
        elif new_comps:
            new_comps.pop()
    comps = new_comps
    path = sep.join(comps)
    if initial_slashes:
        path = sep*initial_slashes + path
    return path or dot

seems to be doing an awful lot! Trying it in ipython:

In [10]: %timeit normpath('/foo/bar/baz')                                                                                     
2.84 µs ± 49.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

hmm... I try this:

In [11]: %paste 
def fast_normpath(path):
    if '.' in path or '//' in path:
        return normpath(path)
    else:
        return path
In [12]: %timeit fast_normpath('/foo/bar/baz')                                                                                
177 ns ± 0.635 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

So this simple optimization seems to make it 83 times faster in the (normal?) case of the paths not needing normalization. It does slow down the normalizing case:

In [15]: %timeit normpath('/foo/bar/baz//asd/../123')                                                                         
4 µs ± 59.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [16]: %timeit fast_normpath('/foo/bar/baz//asd/../123')                                                                    
4.3 µs ± 85.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

but not by much. Is this stuff worth sending a PR to CPython you think?

boxed and others added 4 commits November 2, 2018 17:42
Time: 5.73s => 5.88s/5.82s
Time: 5.73s/5.88s => 5.36s

(Before rebase: 4.86s => 4.45s)
Time: 5.36s => 4.85s (before rebase: 4.45s => 3.55s)
@blueyed
Copy link
Contributor

blueyed commented Nov 2, 2018

So the performance on features is ok with this patch? I am confused by your last comment :P

Fixed the formatting.
It seemed like (initial) times were slower compared to when we started working on it (and that might still be the case). I've tried to see if there was some performance regression inbetween on features, but could not find it.
Initially this was also done against master (which I forgot about), so that might have added to my confusion.

Is this stuff worth sending a PR to CPython you think?

Sure.
OTOH 0.1 out of 3.8 is not very much for us, and 22k calls is really too much for the 1.3k files in the test case IMHO.

I've removed the noqa, and think we're good to merge this already.
normpath and other optimizations can be done as a followup.

@nicoddemus
Copy link
Member

Hmm AppVeyor is having trouble to clone the repository...

@blueyed
Copy link
Contributor

blueyed commented Nov 2, 2018

Re-started.. likely due to GitHub issues.

@boxed
Copy link
Contributor Author

boxed commented Nov 2, 2018

What does AppVeyor do that travis does not?

@blueyed
Copy link
Contributor

blueyed commented Nov 3, 2018

Nothing likely, but it is usually behind, i.e. falls into other time windows.
Works by now: https://ci.appveyor.com/project/pytestbot/pytest/builds/20019105

btw: I still think it would be enough to only test a smaller subset on AppVeyor, given that it often takes hours to get results / finally green PRs.

@nicoddemus nicoddemus merged commit 4cb838d into pytest-dev:features Nov 3, 2018
@boxed
Copy link
Contributor Author

boxed commented Nov 3, 2018

🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants