Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speedup startup of Python by importing less #3009

Merged
merged 7 commits into from Jan 8, 2022

Conversation

tiran
Copy link
Contributor

@tiran tiran commented Jan 6, 2022

_distutils_hack is imported by a .pth file at every start of a
Python interpreter. The import of costly modules like re and
contextlib almost doubles the initial startup time of an
interpreter.

  • replace contextlib with simple context manager and try/except
  • replace re with simple string match
  • move import of importlib into function body
  • remove warnings.filterwarnings(), which imports re, too.

Fixes: #3006
Signed-off-by: Christian Heimes christian@python.org

Summary of changes

Closes

Pull Request Checklist

@tiran tiran marked this pull request as ready for review January 7, 2022 10:04
Copy link
Contributor

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Christian!

@jaraco
Copy link
Member

jaraco commented Jan 8, 2022

I've merged #3014. I'd like to rebase these changes on those in order to measure the changes. I'll work on that.

@jaraco
Copy link
Member

jaraco commented Jan 8, 2022

I've rebased the commits and re-ordered the last two because I'm interested in seeing the performance difference due to cddc4e8.

@jaraco
Copy link
Member

jaraco commented Jan 8, 2022

Annoyingly, I'm not seeing the performance tests run. I'll need to investigate why.

``_distutils_hack`` is imported by a ``.pth`` file at every start of a
Python interpreter. The import of costly modules like ``re`` and
``contextlib`` almost doubles the initial startup time of an
interpreter.

- replace ``contextlib`` with simple context manager and try/except
- replace ``re`` with simple string match
- move import of ``importlib`` into function body
- remove ``warnings.filterwarnings()``, which imports ``re``, too.

Fixes: pypa#3006
Signed-off-by: Christian Heimes <christian@python.org>
@jaraco
Copy link
Member

jaraco commented Jan 8, 2022

Annoyingly, I'm not seeing the performance tests run. I'll need to investigate why.

Seems I hadn't done the rebase correctly. Oh, I see what I did. I rebased, then when re-ordering the commits, undid the rebase. Re-doing the rebase, I can confirm that exercises.py is present, so I can expect the performance tests to appear in CI.

@jaraco
Copy link
Member

jaraco commented Jan 8, 2022

For some reason, I'm still not seeing the exercises run. I see now the issue is that xdist interacts poorly perf, causing the output to be suppressed. Disabling xdist allows the output to be emitted.

More importantly, though, I'm not confident the performance tests are reliable enough, as they're only showing performance gains in the 1ms range:

setuptools issue3006_startup_speedup $ $SETUPTOOLS_USE_DISTUTILS='local' tox -- -k exercises -p no:cov -p no:xdist
python develop-inst-noop: /Users/jaraco/code/public/pypa/setuptools
python installed: alabaster==0.7.12,attrs==21.4.0,autocommand==2.2.1,Babel==2.9.1,black==21.12b0,build==0.7.0,certifi==2021.10.8,charset-normalizer==2.0.10,click==8.0.3,coverage==6.2,distlib==0.3.4,docutils==0.17.1,execnet==1.9.0,filelock==3.4.2,flake8==4.0.1,flake8-2020==1.6.1,foo==0.0.0,idna==3.3,imagesize==1.3.0,importlib-metadata==4.10.0,iniconfig==1.1.1,jaraco.context==4.1.1,jaraco.envs==2.2.0,jaraco.functools==3.5.0,jaraco.path==3.3.1,Jinja2==3.0.3,MarkupSafe==2.0.1,mccabe==0.6.1,mock==4.0.3,more-itertools==8.12.0,mypy==0.931,mypy-extensions==0.4.3,packaging==21.3,path==16.3.0,pathspec==0.9.0,Paver==1.3.4,pep517==0.12.0,pip-run==8.7.2,platformdirs==2.4.1,pluggy==1.0.0,py==1.11.0,pycodestyle==2.8.0,pyflakes==2.4.0,Pygments==2.11.2,pyobjc==8.1,pyobjc-core==8.1,pyobjc-framework-Accessibility==8.1,pyobjc-framework-Accounts==8.1,pyobjc-framework-AddressBook==8.1,pyobjc-framework-AdServices==8.1,pyobjc-framework-AdSupport==8.1,pyobjc-framework-AppleScriptKit==8.1,pyobjc-framework-AppleScriptObjC==8.1,pyobjc-framework-ApplicationServices==8.1,pyobjc-framework-AppTrackingTransparency==8.1,pyobjc-framework-AudioVideoBridging==8.1,pyobjc-framework-AuthenticationServices==8.1,pyobjc-framework-AutomaticAssessmentConfiguration==8.1,pyobjc-framework-Automator==8.1,pyobjc-framework-AVFoundation==8.1,pyobjc-framework-AVKit==8.1,pyobjc-framework-BusinessChat==8.1,pyobjc-framework-CalendarStore==8.1,pyobjc-framework-CallKit==8.1,pyobjc-framework-CFNetwork==8.1,pyobjc-framework-ClassKit==8.1,pyobjc-framework-CloudKit==8.1,pyobjc-framework-Cocoa==8.1,pyobjc-framework-Collaboration==8.1,pyobjc-framework-ColorSync==8.1,pyobjc-framework-Contacts==8.1,pyobjc-framework-ContactsUI==8.1,pyobjc-framework-CoreAudio==8.1,pyobjc-framework-CoreAudioKit==8.1,pyobjc-framework-CoreBluetooth==8.1,pyobjc-framework-CoreData==8.1,pyobjc-framework-CoreHaptics==8.1,pyobjc-framework-CoreLocation==8.1,pyobjc-framework-CoreMedia==8.1,pyobjc-framework-CoreMediaIO==8.1,pyobjc-framework-CoreMIDI==8.1,pyobjc-framework-CoreML==8.1,pyobjc-framework-CoreMotion==8.1,pyobjc-framework-CoreServices==8.1,pyobjc-framework-CoreSpotlight==8.1,pyobjc-framework-CoreText==8.1,pyobjc-framework-CoreWLAN==8.1,pyobjc-framework-CryptoTokenKit==8.1,pyobjc-framework-DataDetection==8.1,pyobjc-framework-DeviceCheck==8.1,pyobjc-framework-DictionaryServices==8.1,pyobjc-framework-DiscRecording==8.1,pyobjc-framework-DiscRecordingUI==8.1,pyobjc-framework-DiskArbitration==8.1,pyobjc-framework-DVDPlayback==8.1,pyobjc-framework-EventKit==8.1,pyobjc-framework-ExceptionHandling==8.1,pyobjc-framework-ExecutionPolicy==8.1,pyobjc-framework-ExternalAccessory==8.1,pyobjc-framework-FileProvider==8.1,pyobjc-framework-FileProviderUI==8.1,pyobjc-framework-FinderSync==8.1,pyobjc-framework-FSEvents==8.1,pyobjc-framework-GameCenter==8.1,pyobjc-framework-GameController==8.1,pyobjc-framework-GameKit==8.1,pyobjc-framework-GameplayKit==8.1,pyobjc-framework-ImageCaptureCore==8.1,pyobjc-framework-IMServicePlugIn==8.1,pyobjc-framework-InputMethodKit==8.1,pyobjc-framework-InstallerPlugins==8.1,pyobjc-framework-InstantMessage==8.1,pyobjc-framework-Intents==8.1,pyobjc-framework-IntentsUI==8.1,pyobjc-framework-IOSurface==8.1,pyobjc-framework-iTunesLibrary==8.1,pyobjc-framework-KernelManagement==8.1,pyobjc-framework-LatentSemanticMapping==8.1,pyobjc-framework-LaunchServices==8.1,pyobjc-framework-libdispatch==8.1,pyobjc-framework-LinkPresentation==8.1,pyobjc-framework-LocalAuthentication==8.1,pyobjc-framework-LocalAuthenticationEmbeddedUI==8.1,pyobjc-framework-MailKit==8.1,pyobjc-framework-MapKit==8.1,pyobjc-framework-MediaAccessibility==8.1,pyobjc-framework-MediaLibrary==8.1,pyobjc-framework-MediaPlayer==8.1,pyobjc-framework-MediaToolbox==8.1,pyobjc-framework-Metal==8.1,pyobjc-framework-MetalKit==8.1,pyobjc-framework-MetalPerformanceShaders==8.1,pyobjc-framework-MetalPerformanceShadersGraph==8.1,pyobjc-framework-MetricKit==8.1,pyobjc-framework-MLCompute==8.1,pyobjc-framework-ModelIO==8.1,pyobjc-framework-MultipeerConnectivity==8.1,pyobjc-framework-NaturalLanguage==8.1,pyobjc-framework-NetFS==8.1,pyobjc-framework-Network==8.1,pyobjc-framework-NetworkExtension==8.1,pyobjc-framework-NotificationCenter==8.1,pyobjc-framework-OpenDirectory==8.1,pyobjc-framework-OSAKit==8.1,pyobjc-framework-OSLog==8.1,pyobjc-framework-PassKit==8.1,pyobjc-framework-PencilKit==8.1,pyobjc-framework-Photos==8.1,pyobjc-framework-PhotosUI==8.1,pyobjc-framework-PreferencePanes==8.1,pyobjc-framework-PushKit==8.1,pyobjc-framework-Quartz==8.1,pyobjc-framework-QuickLookThumbnailing==8.1,pyobjc-framework-ReplayKit==8.1,pyobjc-framework-SafariServices==8.1,pyobjc-framework-SceneKit==8.1,pyobjc-framework-ScreenSaver==8.1,pyobjc-framework-ScreenTime==8.1,pyobjc-framework-ScriptingBridge==8.1,pyobjc-framework-SearchKit==8.1,pyobjc-framework-Security==8.1,pyobjc-framework-SecurityFoundation==8.1,pyobjc-framework-SecurityInterface==8.1,pyobjc-framework-ServiceManagement==8.1,pyobjc-framework-ShazamKit==8.1,pyobjc-framework-Social==8.1,pyobjc-framework-SoundAnalysis==8.1,pyobjc-framework-Speech==8.1,pyobjc-framework-SpriteKit==8.1,pyobjc-framework-StoreKit==8.1,pyobjc-framework-SyncServices==8.1,pyobjc-framework-SystemConfiguration==8.1,pyobjc-framework-SystemExtensions==8.1,pyobjc-framework-UniformTypeIdentifiers==8.1,pyobjc-framework-UserNotifications==8.1,pyobjc-framework-UserNotificationsUI==8.1,pyobjc-framework-VideoSubscriberAccount==8.1,pyobjc-framework-VideoToolbox==8.1,pyobjc-framework-Virtualization==8.1,pyobjc-framework-Vision==8.1,pyobjc-framework-WebKit==8.1,pyparsing==3.0.6,pytest==6.2.5,pytest-black==0.3.12,pytest-checkdocs==2.7.1,pytest-cov==3.0.0,pytest-enabler==1.2.1,pytest-flake8==1.0.7,pytest-forked==1.4.0,pytest-mypy==0.8.1,pytest-perf==0.11.0,pytest-xdist==2.5.0,pytz==2021.3,requests==2.27.1,setuptools-bootstrap==1.0,singledispatch==3.7.0,six==1.16.0,snowballstemmer==2.2.0,Sphinx==4.3.2,sphinxcontrib-applehelp==1.0.2,sphinxcontrib-devhelp==1.0.2,sphinxcontrib-htmlhelp==2.0.0,sphinxcontrib-jsmath==1.0.1,sphinxcontrib-qthelp==1.0.3,sphinxcontrib-serializinghtml==1.1.5,tempora==4.1.2,toml==0.10.2,tomli==1.2.3,tox==3.24.5,typing_extensions==4.0.1,urllib3==1.26.8,virtualenv==20.13.0,zipp==3.7.0
python run-test-pre: PYTHONHASHSEED='2480643102'
python run-test: commands[0] | pytest -k exercises -p no:cov -p no:xdist
============================================================ test session starts ============================================================
platform darwin -- Python 3.10.1, pytest-6.2.5, py-1.11.0, pluggy-1.0.0
cachedir: .tox/python/.pytest_cache
rootdir: /Users/jaraco/code/public/pypa/setuptools, configfile: pytest.ini
plugins: flake8-1.0.7, enabler-1.2.1, xdist-2.5.0, forked-1.4.0, checkdocs-2.7.1, black-0.3.12, perf-0.11.0, mypy-0.8.1
collected 859 items / 855 deselected / 2 skipped / 2 selected                                                                               

exercises.py .s                                                                                                                       [100%]
=================================================================== perf ====================================================================

exercises.py:measure_startup_perf: 0:00:00.077200 (+-1 day, 23:59:59.999000, -1%)
========================================================== short test summary info ==========================================================
SKIPPED [2] setuptools/tests/test_msvc.py:17: could not import 'distutils.msvc9compiler': No module named 'winreg'
SKIPPED [1] .tox/python/lib/python3.10/site-packages/pytest_flake8.py:116: file(s) previously passed FLAKE8 checks
=============================================== 1 passed, 3 skipped, 855 deselected in 28.84s ===============================================
__________________________________________________________________ summary __________________________________________________________________
  python: commands succeeded
  congratulations :)

@gvanrossum
Copy link
Contributor

I'm not sure what performance you're trying to measure or in what environment. If you look at #3006 you'll see that we measured the overall time of python -c pass with the hack enabled vs. disabled. Most of the cost is not in the _distutils_hack package but in loading various stdlib modules, which are imported by that hack.

@jaraco
Copy link
Member

jaraco commented Jan 8, 2022

I'm not sure what performance you're trying to measure or in what environment.

The performance test as implemented attempts to measure the startup time of the interpreter, and to do it in a way that runs in continuous integration and thus is available to be measured and audited for future changes. Unfortunately, that test also necessarily includes the overhead of setting up the subprocess, which is likely adding noise to the measurement.

I was able to use tiran's technique to verify some numbers on my machine as well:

~ $ $SETUPTOOLS_USE_DISTUTILS='local' pip-run -q setuptools==60.3.1 -- -X importtime -c pass 2>&1 | grep _distutils_hack | cut -f 5 -w
2541
~ $ $SETUPTOOLS_USE_DISTUTILS='stdlib' pip-run -q setuptools==60.3.1 -- -X importtime -c pass 2>&1 | grep _distutils_hack | cut -f 5 -w
~ $ $SETUPTOOLS_USE_DISTUTILS='local' pip-run -q git+https://github.com/tiran/setuptools@7619852 -- -X importtime -c pass 2>&1 | grep _distutils_hack | cut -f 5 -w
1050
~ $ $SETUPTOOLS_USE_DISTUTILS='local' pip-run -q git+https://github.com/tiran/setuptools@issue3006_startup_speedup -- -X importtime -c pass 2>&1 | grep _distutils_hack | cut -f 5 -w
826
~ $ $SETUPTOOLS_USE_DISTUTILS='local' pip-run -q git+https://github.com/tiran/setuptools@7619852 -- -X importtime -c pass 2>&1 | grep _distutils_hack | cut -f 5 -w
506
~ $ $SETUPTOOLS_USE_DISTUTILS='local' pip-run -q git+https://github.com/tiran/setuptools@issue3006_startup_speedup -- -X importtime -c pass 2>&1 | grep _distutils_hack | cut -f 5 -w
605

It's clear from the output that there's a lot of jitter even in these runs (between 506µs and 1050µs). Still, that's noticeably better than the 2541µs without the improvements. I guess that also corroborates the numbers I was seeing from pytest-perf (~1-2 ms improvement on my macbook pro).

I still don't have any good evidence that the last commit (2b07b8e) is valuable or a premature optimization. I'll examine that next.

@gvanrossum
Copy link
Contributor

Sorry, I have no idea what pip-run does. Could it import some of the modules that the hack depends on, so they aren’t counted? In #3006 I measured ~10 msec, and I doubt your hardware is 4x as fast as mine.

Agreed that the perf impact of that commit should be minimal. Personally I find it easier to follow though.

@gvanrossum
Copy link
Contributor

I recommend manual inspection of the full output of -X importtime to see what’s what.

@jaraco
Copy link
Member

jaraco commented Jan 8, 2022

Looking at that last commit (2b07b8e), I notice that it's only going to affect the performance when a module is imported, so I want to devise a different test to measure its performance:

~ $ $SETUPTOOLS_USE_DISTUTILS='local' pip-run -q git+https://github.com/tiran/setuptools@2b07b8e -- -m timeit -s "import sys; mf = sys.meta_path[0]; assert mf.__class__.__name__ == 'DistutilsMetaFinder'" -c "mf.find_spec('anything', None)"
  WARNING: Did not find branch or tag '2b07b8e', assuming revision or ref.
1000000 loops, best of 5: 213 nsec per loop
~ $ $SETUPTOOLS_USE_DISTUTILS='local' pip-run -q git+https://github.com/tiran/setuptools@7619852 -- -m timeit -s "import sys; mf = sys.meta_path[0]; assert mf.__class__.__name__ == 'DistutilsMetaFinder'" -c "mf.find_spec('anything', None)"
  WARNING: Did not find branch or tag '7619852', assuming revision or ref.
200000 loops, best of 5: 1.26 usec per loop

Looks like the string formatting and method lookup and lambda call add ~1µs to each import. I suspect that's an acceptable cost, so I'd like to retain the existing implementation, which follows more of a functional paradigm and avoids repetition.

I have no idea what pip-run does

See pip-run for details and maybe give it a try. It's a nice way to on a single command run Python with specific requirements installed (in this case, specific versions of setuptools). I personally keep pip-run installed in my system python environments to rapidly test things like above.

I recommend manual inspection of the full output of -X importtime to see what’s what.

Good idea. Here are the full comparisons (released version versus this PR), I see only a few ms difference:

~ $ $SETUPTOOLS_USE_DISTUTILS='local' pip-run -q setuptools==60.3.1 -- -X importtime -c pass
import time: self [us] | cumulative | imported package
import time:       262 |        262 |   _io
import time:        60 |         60 |   marshal
import time:       449 |        449 |   posix
import time:      1129 |       1898 | _frozen_importlib_external
import time:       957 |        957 |   time
import time:       633 |       1589 | zipimport
import time:       275 |        275 |     _codecs
import time:      1686 |       1961 |   codecs
import time:      1221 |       1221 |   encodings.aliases
import time:      2009 |       5190 | encodings
import time:       561 |        561 | encodings.utf_8
import time:       147 |        147 | _signal
import time:        71 |         71 |     _abc
import time:       587 |        658 |   abc
import time:       661 |       1318 | io
import time:       602 |        602 |       types
import time:      1599 |       2200 |     enum
import time:       173 |        173 |       _sre
import time:       972 |        972 |         sre_constants
import time:      2378 |       3350 |       sre_parse
import time:      1606 |       5127 |     sre_compile
import time:      1666 |       1666 |         _collections_abc
import time:       288 |        288 |         itertools
import time:      1359 |       1359 |         keyword
import time:       125 |        125 |           _operator
import time:      1917 |       2042 |         operator
import time:       968 |        968 |         reprlib
import time:       133 |        133 |         _collections
import time:      2156 |       8610 |       collections
import time:       207 |        207 |       _functools
import time:      2227 |      11043 |     functools
import time:       395 |        395 |     _locale
import time:       683 |        683 |     copyreg
import time:      1365 |      20811 |   re
import time:      1123 |      21934 | warnings
import time:       131 |        131 |       _stat
import time:       917 |       1047 |     stat
import time:       688 |        688 |       genericpath
import time:      1108 |       1796 |     posixpath
import time:      3060 |       5902 |   os
import time:       839 |        839 |   _sitebuiltins
import time:       739 |        739 |     importlib
import time:      1614 |       1614 |     contextlib
import time:      1423 |       3775 |   _distutils_hack
import time:      6566 |       6566 |   sitecustomize
import time:      1099 |       1099 |   usercustomize
import time:      2796 |      20975 | site
~ $ $SETUPTOOLS_USE_DISTUTILS='local' pip-run -q git+https://github.com/tiran/setuptools@issue3006_startup_speedup -- -X importtime -c pass
import time: self [us] | cumulative | imported package
import time:       655 |        655 |   _io
import time:       304 |        304 |   marshal
import time:      1104 |       1104 |   posix
import time:      2151 |       4213 | _frozen_importlib_external
import time:       869 |        869 |   time
import time:       534 |       1402 | zipimport
import time:       365 |        365 |     _codecs
import time:      1721 |       2085 |   codecs
import time:      1736 |       1736 |   encodings.aliases
import time:      1916 |       5735 | encodings
import time:       774 |        774 | encodings.utf_8
import time:       206 |        206 | _signal
import time:       128 |        128 |     _abc
import time:      1014 |       1142 |   abc
import time:      1183 |       2324 | io
import time:      1675 |       1675 |       types
import time:      1736 |       3410 |     enum
import time:       123 |        123 |       _sre
import time:       707 |        707 |         sre_constants
import time:       886 |       1592 |       sre_parse
import time:       944 |       2659 |     sre_compile
import time:      1900 |       1900 |         _collections_abc
import time:       145 |        145 |         itertools
import time:       433 |        433 |         keyword
import time:       192 |        192 |           _operator
import time:      3106 |       3298 |         operator
import time:       940 |        940 |         reprlib
import time:       166 |        166 |         _collections
import time:      1786 |       8665 |       collections
import time:       221 |        221 |       _functools
import time:      1565 |      10450 |     functools
import time:       193 |        193 |     _locale
import time:      2093 |       2093 |     copyreg
import time:      1524 |      20326 |   re
import time:      1541 |      21867 | warnings
import time:       144 |        144 |       _stat
import time:      1174 |       1317 |     stat
import time:       472 |        472 |       genericpath
import time:       916 |       1387 |     posixpath
import time:      2090 |       4792 |   os
import time:       537 |        537 |   _sitebuiltins
import time:      1486 |       1486 |   _distutils_hack
import time:      7310 |       7310 |   sitecustomize
import time:      1983 |       1983 |   usercustomize
import time:      3630 |      19737 | site

But that's probably because warnings is still getting imported somehow. I don't have a pristine environment, so that's probably a factor.

Even when I create a pristine virtualenv with no pip, Python still encounters the warnings module on startup, consuming 26ms:

draft $ python -m venv env --without-pip
draft $ env/bin/python -X importtime -c pass 2>&1 | grep warnings
import time:      1504 |      26376 | warnings

So that begins explains why the performance benefit of this change isn't as great on my machine as on others. The biggest benefit was from not importing warnings/re, but that's happening anyway, presumably due to something with the homebrew install of Python I happen to be using.

I'm going to proceed to merge this change without 2b07b8e, and we can revisit that change if needed.

@jaraco jaraco merged commit fd966a3 into pypa:main Jan 8, 2022
@jaraco
Copy link
Member

jaraco commented Jan 8, 2022

Releasing as 60.4.0

@gvanrossum
Copy link
Contributor

Thanks! Tomorrow we should check speed.python.org and see what this did for the python startup benchmark in the timeline.

@gvanrossum
Copy link
Contributor

@jaraco
Copy link
Member

jaraco commented Jan 9, 2022 via email

jaraco added a commit that referenced this pull request Jan 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] The _distutils_hack pth file causes significant startup slowdown
3 participants