Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speedup startup time #4768

Open
pradyunsg opened this issue Oct 5, 2017 · 44 comments
Open

Speedup startup time #4768

pradyunsg opened this issue Oct 5, 2017 · 44 comments
Labels
type: enhancement Improvements to functionality type: performance Commands take too long to run

Comments

@pradyunsg
Copy link
Member

There's a lot to gain from speeding up pip's startup time.

For one, pip takes around 600ms to just print the completion text, which is laggy. (as mentioned in #4755). Further, faster startup time might help with the test-suite situation too.

@pradyunsg pradyunsg added the type: maintenance Related to Development and Maintenance Processes label Oct 5, 2017
@floam
Copy link

floam commented Oct 5, 2017

I did notice one can save about 80ms by using the option to disable the version check - it'd probably make sense not to do that at all for completions by default. Still, a lot remains to be improved.

@pradyunsg pradyunsg mentioned this issue Oct 5, 2017
12 tasks
@dstufft
Copy link
Member

dstufft commented Oct 5, 2017

I wonder how much of this time is taken up by importing stuff.

@pradyunsg
Copy link
Member Author

$ time .tox/py36/bin/python -c "from pip._internal import main"

A rudimentary test of running the above 5 times gives me an average of 0.377s.

@benoit-pierre
Copy link
Member

Importing pip._vendor.pkg_resources is what's taking most of the time on my machine. They are a few changes in setuptools 36.4 and above that will help a little: https://github.com/pypa/setuptools/blob/master/CHANGES.rst#v3640

@pradyunsg
Copy link
Member Author

master is vendoring setuptools 36.4.0 currently...

setuptools==36.4.0

@benoit-pierre
Copy link
Member

Yes, and it's faster than pip 9.0, but there are some further changes in setuptools 36.5 that might help: https://github.com/pypa/setuptools/blob/master/CHANGES.rst#v3650.

@benoit-pierre
Copy link
Member

  • pip 9.0: python -c 'import pip._vendor.pkg_resources' 0.35s user 0.03s system 100% cpu 0.385 total
  • master: python -c "from pip._vendor import pkg_resources" 0.21s user 0.02s system 99% cpu 0.233 total
  • master+updated pkg_resources: python -c "from pip._vendor import pkg_resources" 0.19s user 0.00s system 99% cpu 0.192 total

@pradyunsg
Copy link
Member Author

pradyunsg commented Oct 5, 2017

Oh, nice. That means the next round of vendor updates would bring some speedup. :)

I, personally, am waiting on a new distlib release before giving the vendored libraries another round of updates.

@pradyunsg
Copy link
Member Author

pradyunsg commented Oct 5, 2017

I fired up the profiler and ran pip completion --fish. Here's what I got (all percentages in terms of total time):

  • initial import of pip._internal.__init__: 85%
    • pip._internal.cmdoptions.__init__: 79%
      • pip._internal.index: 78% (basically all of the above time is spent in this import)
        • This is where it gets interesting
        • pip._vendor.html5lib: 15.8%
        • pip._vendor.requests: 9.0%
        • pip._vendor.distlib: 5.2%
        • pip._vendor.packaging: 7.1%
        • pip._internal.download: 35.4%
          • pip._internal.utils.logging: 30.4%
            • pip._internal.utils.misc: 29.6%
              • pip._vendor.pkg_resources: 28.8%
  • pip._internal.__init__.main(): 15%
    • parseopts(): 0.19%
    • command.main(): 14.7%

@pradyunsg
Copy link
Member Author

PS: Need better profiling tools.

@dstufft
Copy link
Member

dstufft commented Oct 5, 2017

Lazy importing will probably solve some of those.

@pradyunsg pradyunsg added type: enhancement Improvements to functionality and removed type: maintenance Related to Development and Maintenance Processes labels Oct 5, 2017
@brettcannon
Copy link
Member

There are plans to (hopefully) make lazy importing easy to switch on for CLI apps like pip in Python 3.7. There is also now a -X importtime argument to CPython 3.7 as well as dtrace/systemtap support to help track where import time is going to help profile this sort of thing.

@boxed
Copy link

boxed commented Nov 25, 2017

@benoit-pierre

master+updated pkg_resources: python -c "from pip._vendor import pkg_resources" 0.19s user 0.00s system 99% cpu 0.192 total

What does "updated pkg_resources" mean? On my machine the initial import is still quite slow even though I have setuptools 38.1.0, so your comment seems very interesting to me! :P

@benoit-pierre
Copy link
Member

@boxed: it means with pip's vendored version of setuptools updated.

@boxed
Copy link

boxed commented Nov 26, 2017

Aha. I tried copying over pkg_resources from my main install over the one inside pip/_vendor, but I didn't see any difference in speed :/

@pradyunsg
Copy link
Member Author

Using CPython 3.7.0's -X importtime.

import time: self [us] | cumulative | imported package
[snip]
import time:       654 |     506881 | pip._internal
[snip]

@lorencarvalho
Copy link
Contributor

Just in case y'all are unaware, pkg_resources is tracking the slowness in pypa/setuptools#510, I didn't see it linked in this issue yet.

@CSDUMMI
Copy link

CSDUMMI commented May 19, 2019

Could you not move the imports from the top of the file
to the function, that needs it?

@boxed
Copy link

boxed commented May 19, 2019

@CSDUMMI I have a PR that does this. It helps somewhat.

@CSDUMMI
Copy link

CSDUMMI commented May 20, 2019

Could I have a link?

@boxed
Copy link

boxed commented May 20, 2019

#6346

@cjerdonek
Copy link
Member

Here is a PR that improves the import situation for the vcs imports: #6545 It removes the pip._internal.vcs imports from pip/internal/__init__.py. This will make it easy to remove vcs imports from the common case, if desired, which can be done in a subsequent commit.

@CSDUMMI

This comment has been minimized.

@CSDUMMI

This comment has been minimized.

@boxed
Copy link

boxed commented Jun 5, 2019

Let's move the discussion to boxed/p#4

@cjerdonek
Copy link
Member

FYI, PR #6694 ("Only import a Command class when needed") was recently merged, which helps with this.

@cjerdonek
Copy link
Member

I posted PR #6835 to help with this.

@cjerdonek
Copy link
Member

I just posted PR #6843 to continue the work in PR #6835. The PR trims unneeded imports by making it so that commands not requiring downloading / PackageFinder will no longer import that machinery.

@asottile
Copy link
Contributor

asottile commented Feb 5, 2022

I noticed a pretty significant slowdown in the latest released version -- I've tracked it down to here -- might be worth bumping pyparsing once that gets resolved: pyparsing/pyparsing#362

@bluetech
Copy link
Contributor

I also looked into pip startup time a bit. First thing I noticed is tenacity's import of asyncio but seems like @ichard26 already took care of it (thanks!).

Another one I noticed is chardet import. requests supports either chardet or charset_normalizer. From a quick experiment I did replacing the chardet vendor import with non-vendored charset_normalizer import, I get ~27ms for chardet vs. ~7ms for charset_normalizer, using -X importtime on my admittedly ~10 years old laptop.

If there is interest I can try to prepare a PR to replace the charset vendor with charset_normalizer vendor.

@pfmoore
Copy link
Member

pfmoore commented Apr 17, 2024

Note that we can only vendor pure Python libraries. charset_normalizer would at the very least be tricky to vendor because we'd need to find a way to make our vendoring tools ignore the platform-specific wheels. Also, are your benchmarks using the pure Python version? If not, they would need to be re-done to be meaningful.

I don't have any strong opinions on whether we should switch, I'm just noting these points as things to consider if we do.

@bluetech
Copy link
Contributor

Hmm the charset_normalizer github repo tagline says "in pure python" but I guess it's not :)
I just tried again with charset_normalizer-3.3.2-py3-none-any.whl and still get ~7ms so the pure python version looks good as well.

@notatallshaw
Copy link
Contributor

notatallshaw commented Apr 17, 2024

charset_normalizer was added as an optional dependency to requests when the apache-airflow team found the license for chardet wasn't suitable for them.

The developer did a lot to make it more acceptable to the requests maintainers, such as significantly reducing the amount of dependencies. My understanding is charset_normalizer only has binaries based on compiling pure Python code with mypyc, and isn't shipped by default.

If the developer is still as accommodating, I'd imagine pip would benefit in performance and ease of maintenance with charset_normalizer, but the first thing I would do is check with the developer that they are happy being vendored by pip.

@bluetech
Copy link
Contributor

OK, submitted PR #12638.

BTW, another big import-time hit is packaging -> pyparsing package. I see that packaging has already replaced pyparsing with a hand-rolled parser, and there is a PR #12300 to update pip to use it. I checked the import time with #12300 and it's indeed a nice improvement.

With pyparsing, asyncio and chardet gone it will be a decent improvement to startup time. The remaining big ones are rich and requests/urllib3 but I don't think there is much to do about these for pip install.

bluetech added a commit to bluetech/pip that referenced this issue Apr 18, 2024
@ichard26 ichard26 added the type: performance Commands take too long to run label Apr 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement Improvements to functionality type: performance Commands take too long to run
Projects
None yet
Development

No branches or pull requests