Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] The _distutils_hack pth file causes significant startup slowdown #3006

Closed
1 task done
pablogsal opened this issue Jan 6, 2022 · 10 comments · Fixed by #3009
Closed
1 task done

[BUG] The _distutils_hack pth file causes significant startup slowdown #3006

pablogsal opened this issue Jan 6, 2022 · 10 comments · Fixed by #3009
Labels
enhancement Needs Investigation Issues which are likely in scope but need investigation to figure out the cause

Comments

@pablogsal
Copy link
Contributor

pablogsal commented Jan 6, 2022

setuptools version

60.0.0

Python version

ALL

OS

ALL

Additional environment information

No response

Description

Since setuptools 60.0.0 has been released, Python startup has been impacted noticeably:

https://speed.python.org/timeline/#/?exe=12&ben=python_startup&env=1&revs=50&equid=off&quarts=on&extr=on

This is due to the new pth file that setuptools installs for injecting the distutils hack. The pth imports the _distutils_hack module and the ramifications of this import has a noticeable impact on startup time as you can see in the benchmark server.

Here is an estimation of the cost using -X importtime on my machine:

import time:       278 |        278 |         types
import time:       112 |        112 |           _operator
import time:       419 |        531 |         operator
import time:       129 |        129 |             itertools
import time:       325 |        325 |             keyword
import time:       468 |        468 |             reprlib
import time:       258 |        258 |             _collections
import time:       978 |       2156 |           collections
import time:        78 |         78 |           _functools
import time:       835 |       3068 |         functools
import time:      1359 |       5235 |       enum
import time:       138 |        138 |         _sre
import time:       497 |        497 |           sre_constants
import time:       528 |       1025 |         sre_parse
import time:       512 |       1674 |       sre_compile
import time:       109 |        109 |       _locale
import time:       886 |        886 |       copyreg
import time:       671 |       8574 |     re
import time:       471 |        471 |       warnings
import time:       330 |        801 |     importlib
import time:       906 |      10279 |   _distutils_hack

This almost doubles the startup time (from 13501 us to 24093 us) and unfortunately, as almost all Python installations will have the pth file installed by default, all Python installations will double its startup time.

We have been working very hard to reduce startup time with things like frozen modules (https://bugs.python.org/issue45020) and other optimizations and is unfortunate that this gains are being lost due to this.

See: https://bugs.python.org/issue46110 for more information and discussion

Expected behavior

The distutils hack doesn't impact startup time.

How to Reproduce

Measure the time spent in python -c pass. You can use python -Ximporttime -c pass to generate the import time table for convenience and debugging.

Output

Relevant output of python -X importtime -c pass:

...
import time:       278 |        278 |         types
import time:       112 |        112 |           _operator
import time:       419 |        531 |         operator
import time:       129 |        129 |             itertools
import time:       325 |        325 |             keyword
import time:       468 |        468 |             reprlib
import time:       258 |        258 |             _collections
import time:       978 |       2156 |           collections
import time:        78 |         78 |           _functools
import time:       835 |       3068 |         functools
import time:      1359 |       5235 |       enum
import time:       138 |        138 |         _sre
import time:       497 |        497 |           sre_constants
import time:       528 |       1025 |         sre_parse
import time:       512 |       1674 |       sre_compile
import time:       109 |        109 |       _locale
import time:       886 |        886 |       copyreg
import time:       671 |       8574 |     re
import time:       471 |        471 |       warnings
import time:       330 |        801 |     importlib
import time:       906 |      10279 |   _distutils_hack
...

Code of Conduct

  • I agree to follow the PSF Code of Conduct
@pablogsal pablogsal added bug Needs Triage Issues that need to be evaluated for severity and status. labels Jan 6, 2022
@pablogsal
Copy link
Contributor Author

@jaraco
Copy link
Member

jaraco commented Jan 6, 2022

Thanks for reporting. I was aware this was a risk.

I do believe the riskslowdown is justifiable, even at the reported level of 2x (11ms), given that:

  • This hack is meant to be temporary. As soon as is achievable, Setuptools aims to deprecate the use of import distutils and obviate the need for the hack.
  • Setuptools desires not to be installed at run time. It is now possible for most if not all environments to move away from having Setuptools installed except during builds. Many environments today can reclaim startup speed by uninstalling Setuptools (either early and relying on PIP_USE_PEP517 or after installing packages into the environment).
  • This hack provides enormous value as an essential transition to a world without distutils in the stdlib. It's enabling the consolidation of the dozens of forks of distutils that exist as monkeypatches of the stdlib.
  • The hack also provides a low-cost opt-out with SETUPTOOLS_USE_DISTUTILS=stdlib.

That said, there may be some performance optimizations that can be made, and probably should be made if they don't encumber the implementation too much.

@jaraco jaraco added enhancement Needs Investigation Issues which are likely in scope but need investigation to figure out the cause and removed bug Needs Triage Issues that need to be evaluated for severity and status. labels Jan 6, 2022
@jaraco
Copy link
Member

jaraco commented Jan 6, 2022

For importlib_metadata, I developed pytest-perf to exercise performance tests and prevent regressions. That could be employed here to (a) measure the performance and (b) prevent regressions as performance is improved.

@pablogsal pablogsal changed the title [BUG] The _distutils_hack pth file casuses significant startup slowdown [BUG] The _distutils_hack pth file causes significant startup slowdown Jan 6, 2022
@tiran
Copy link
Contributor

tiran commented Jan 6, 2022

You could reduce the impact on startup performance by not using contextlib and moving the import of re into the function body.

@gvanrossum
Copy link
Contributor

In order to avoid importing re you would also have to remove the warnings.filterwarnings() call -- that function uses re. What does that call do anyway?

How temporary is the hack? Will it be gone by the time 3.11 final is released (coming October)?

@gvanrossum
Copy link
Contributor

If you wanted to make us really happy, you could get rid of the filterwarnings call, and delay importing of warnings, re, importlib and contextlib. That avoids any imports in the default python -c pass case that aren't already frozen.

@gvanrossum
Copy link
Contributor

In order to avoid importing re you would also have to remove the warnings.filterwarnings() call -- that function uses re. What does that call do anyway?

Answering my own question, this seems to get rid of the warning printed by importing distutils whenever _distutils_hack is imported. This could probably also be postponed, e.g. put this line in spec_for_distutils().

(@jaraco, would it be helpful if I submitted a PR?)

tiran added a commit to tiran/setuptools that referenced this issue Jan 6, 2022
``_distutils_hack`` is imported by a ``.pth`` file at every start of a
Python interpreter. The import of costly modules like ``re`` and
``contextlib`` almost doubles the initial startup time of an
interpreter.

- replace ``contextlib`` with simple context manager and try/except
- replace ``re`` with simple string match
- move import of ``importlib`` into function body
- remove ``warnings.filterwarnings()``, which imports ``re``, too.

Fixes: pypa#3006
Signed-off-by: Christian Heimes <christian@python.org>
@tiran
Copy link
Contributor

tiran commented Jan 6, 2022

The warnings filter is problematic for two reasons. It not only slows down startup. It also hides Python core's deprecation warning from all users that have a recent version of setuptools. Since the distutils-precedence.pth file loads the _distutils_hack package always, the filter gets installed every time, too. Virtually everybody has setuptools installed, too. This makes core's deprecation warning in the stdlib pointless.

@gvanrossum
Copy link
Contributor

The warnings filter is problematic for two reasons. It not only slows down startup. It also hides Python core's deprecation warning from all users that have a recent version of setuptools. Since the distutils-precedence.pth file loads the _distutils_hack package always, the filter gets installed every time, too. Virtually everybody has setuptools installed, too. This makes core's deprecation warning in the stdlib pointless.

Not quite. If you set SETUPTOOLS_USE_DISTUTILS=stdlib, _distutils_hack is not imported, and you will see the warning. However, I think your point remains -- a user who has old code that imports distutils will (likely) not get the warning when they have setuptools installed, even if they're not using setuptools. This seems to defeat the point of having the warning.

@tiran
Copy link
Contributor

tiran commented Jan 7, 2022

setuptools main branch

$ venv/bin/python3 -X importtime -c pass
import time: self [us] | cumulative | imported package
import time:       652 |        652 |   _io
import time:       142 |        142 |   marshal
import time:      1390 |       1390 |   posix
import time:      2513 |       4695 | _frozen_importlib_external
import time:       541 |        541 |   time
import time:       824 |       1364 | zipimport
import time:       612 |        612 |     _codecs
import time:      1788 |       2400 |   codecs
import time:      1677 |       1677 |   encodings.aliases
import time:      2176 |       6252 | encodings
import time:       925 |        925 | encodings.utf_8
import time:       402 |        402 | _signal
import time:       222 |        222 |     _abc
import time:       955 |       1177 |   abc
import time:      1119 |       2296 | io
import time:       213 |        213 |       _stat
import time:       824 |       1036 |     stat
import time:      2834 |       2834 |     _collections_abc
import time:       381 |        381 |       genericpath
import time:       576 |        957 |     posixpath
import time:      2629 |       7454 |   os
import time:       445 |        445 |   _sitebuiltins
import time:       479 |        479 |         types
import time:      1361 |       1839 |       enum
import time:       243 |        243 |         _sre
import time:       498 |        498 |           sre_constants
import time:       617 |       1115 |         sre_parse
import time:       501 |       1858 |       sre_compile
import time:       157 |        157 |           itertools
import time:       235 |        235 |           keyword
import time:       103 |        103 |             _operator
import time:       527 |        630 |           operator
import time:       278 |        278 |           reprlib
import time:       101 |        101 |           _collections
import time:      1841 |       3239 |         collections
import time:       197 |        197 |         _functools
import time:       804 |       4239 |       functools
import time:       109 |        109 |       _locale
import time:       204 |        204 |       copyreg
import time:       977 |       9225 |     re
import time:       352 |        352 |       warnings
import time:       223 |        575 |     importlib
import time:       679 |        679 |     contextlib
import time:       977 |      11455 |   _distutils_hack
import time:       167 |        167 |   sitecustomize
import time:      2255 |      21774 | site

with my fix

$ venv/bin/python3 -X importtime -c pass
import time: self [us] | cumulative | imported package
import time:       627 |        627 |   _io
import time:       156 |        156 |   marshal
import time:      1573 |       1573 |   posix
import time:      2355 |       4709 | _frozen_importlib_external
import time:       418 |        418 |   time
import time:       876 |       1293 | zipimport
import time:       520 |        520 |     _codecs
import time:      1872 |       2392 |   codecs
import time:      1831 |       1831 |   encodings.aliases
import time:      2339 |       6560 | encodings
import time:       704 |        704 | encodings.utf_8
import time:       289 |        289 | _signal
import time:       113 |        113 |     _abc
import time:       709 |        821 |   abc
import time:       730 |       1551 | io
import time:       142 |        142 |       _stat
import time:       472 |        614 |     stat
import time:      1925 |       1925 |     _collections_abc
import time:       282 |        282 |       genericpath
import time:       408 |        690 |     posixpath
import time:      1777 |       5005 |   os
import time:       358 |        358 |   _sitebuiltins
import time:       913 |        913 |   _distutils_hack
import time:       191 |        191 |   sitecustomize
import time:      2005 |       8471 | site

import of site went from 21774 to 8471.

jaraco pushed a commit to tiran/setuptools that referenced this issue Jan 8, 2022
``_distutils_hack`` is imported by a ``.pth`` file at every start of a
Python interpreter. The import of costly modules like ``re`` and
``contextlib`` almost doubles the initial startup time of an
interpreter.

- replace ``contextlib`` with simple context manager and try/except
- replace ``re`` with simple string match
- move import of ``importlib`` into function body
- remove ``warnings.filterwarnings()``, which imports ``re``, too.

Fixes: pypa#3006
Signed-off-by: Christian Heimes <christian@python.org>
nijel added a commit to nijel/docker that referenced this issue Jan 10, 2022
It should not be needed since
WeblateOrg/weblate@7949fb5
and can cause performance regressions (see
pypa/setuptools#3006).
nijel added a commit to nijel/docker that referenced this issue Jan 10, 2022
It should not be needed since
WeblateOrg/weblate@7949fb5
and can cause performance regressions (see
pypa/setuptools#3006).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Needs Investigation Issues which are likely in scope but need investigation to figure out the cause
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants