Plugins: Investigate official importlib.metadata for entrypoint discovery #3458

dev-zero · 2019-10-23T14:26:29Z

Python 3.8 introduced a built-in module for entrypoint discovery https://docs.python.org/3.8/library/importlib.metadata.html with a backport for <3.8 and 2.7 here: https://importlib-metadata.readthedocs.io/en/latest/

While there's nothing wrong with pkg_resources other than we need reentry to make it scale, we should do a test with importlib.metadata to see whether it would be fast enough for us without reentry.

Release notes from pluggy (pytest) and tox suggest that it performs better than setuptools pkg_resources.

The text was updated successfully, but these errors were encountered:

ltalirz · 2019-10-23T15:52:08Z

Sounds great, I'll have a look at this.

P.S. Ah, I see you've already assigned yourself ;-) I'll wait a bit then to see what you discover.

ltalirz · 2019-10-29T08:39:24Z

as part of this, one should also check whether the dependency on fastentrypoints.py can be removed (see utils/fastentrypoints.py)

ltalirz · 2019-11-08T16:36:39Z

I started looking into this here:
https://github.com/ltalirz/aiida_core/tree/issue_3458_importlib_metadata

I'm using a basic in-memory cache to make sure we're loading the entry points just once, but still the cli is a bit less responsive than with reentry - not terribly, but noticeably.

Timings with importlib (on Macbook Pro Retina Mid-2015):

$ python -mtimeit -s 'import importlib_metadata as im' 'im.entry_points()'
5 loops, best of 5: 57 msec per loop

Note: It seems that with importlib_metadata you always have to load the full set of entry points.

Timings with reentry (on Macbook Pro Retina Mid-2015):

$ python -mtimeit  -s 'from reentry.default_manager import PluginManager as p' 'p().get_entry_map()'
500 loops, best of 5: 635 usec per loop

P.S. Importing importlib_metadata takes another ~40ms, compared to 25ms for reentry and 180ms for pkg_resources.

So for importlib_metadata, we're likely looking at a total of ~60+40=100ms vs something close to 25ms for reentry.

ltalirz · 2019-11-09T00:53:41Z

@dev-zero Do you happen to have an AiiDA installation on a magnetic hard disk, where you could give the two tests above a try?

ltalirz · 2019-11-11T11:05:24Z

@giovannipizzi @sphuber
So, it seems in my environment, the difference between using reentry and using the built-in entry-point mechanism is down from roughly an order of magnitude to roughly a factor of 4.

Would you mind giving the PR a try and let me know how the autocompletion speed feels to you?
https://github.com/ltalirz/aiida_core/tree/issue_3458_importlib_metadata

It's certainly a bit slower, but at some point we need to think whether this still warrants the hassle of maintaining reentry, and asking people to reentry scan.

sphuber · 2019-11-11T11:14:39Z

So, it seems in my environment, the difference between using reentry and using the built-in entry-point mechanism is down from roughly an order of magnitude to roughly a factor of 4.

For me personally the responsiveness of the CLI is very important. I use it all the time and the slowness of auto-completion really annoys me. A factor of 4 is huge in that context I feel and for me would definitely warrant the continued development of reentry and the annoyance of reentry scan.

However, where does that number come from. I installed your branch and tested time verdi. For your branch I get of the order of 0.380 ms. The current develop gives 0.280 and 0.1 second difference is not ideal but nowhere near as bad as what I expected when you said factor of 4.

dev-zero · 2019-11-11T12:00:07Z

results on SSD (only thing so far) are:

develop:

(develop=) $ hyperfine --warmup 3 verdi "verdi data gaussian.basisset"
Benchmark #1: verdi
  Time (mean ± σ):     159.9 ms ±   7.2 ms    [User: 135.9 ms, System: 15.1 ms]
  Range (min … max):   149.3 ms … 171.0 ms    19 runs
 
Benchmark #2: verdi data gaussian.basisset
  Time (mean ± σ):     171.4 ms ±   7.9 ms    [User: 145.9 ms, System: 15.8 ms]
  Range (min … max):   159.3 ms … 186.6 ms    17 runs
 
Summary
  'verdi' ran
    1.07 ± 0.07 times faster than 'verdi data gaussian.basisset'

the importlib branch:

(issue_3458_importlib_metadata=) $ hyperfine --warmup 3 verdi "verdi data" "verdi data gaussian.basisset"
Benchmark #1: verdi
  Time (mean ± σ):     204.2 ms ±   2.8 ms    [User: 175.6 ms, System: 18.3 ms]
  Range (min … max):   201.2 ms … 210.5 ms    14 runs
 
Benchmark #2: verdi data
  Time (mean ± σ):     216.0 ms ±   2.5 ms    [User: 186.5 ms, System: 18.7 ms]
  Range (min … max):   212.0 ms … 220.5 ms    13 runs
 
Benchmark #3: verdi data gaussian.basisset
  Time (mean ± σ):     228.4 ms ±  13.0 ms    [User: 196.6 ms, System: 20.6 ms]
  Range (min … max):   208.9 ms … 246.9 ms    13 runs
 
Summary
  'verdi' ran
    1.06 ± 0.02 times faster than 'verdi data'
    1.12 ± 0.07 times faster than 'verdi data gaussian.basisset'

ltalirz · 2019-11-11T12:00:13Z

However, where does that number come from.

As described above, this comes from adding up the timings for importing the package (reentry vs importlib_metadata) and the call to load the entry points. 4 *25ms = 100ms, and 75ms of difference is close to the difference you are seeing as well.

Is the result of time verdi also the time needed for autocompletion?

dev-zero · 2019-11-11T12:12:21Z

@ltalirz yes, pretty much it seems. Here's a benchmark with the simulated completion for verdi data g<TAB>:

develop:

$ COMP_CWORD="2" COMP_WORDS="verdi data g" _VERDI_COMPLETE=complete-bash hyperfine --warmup 3 verdi
Benchmark #1: verdi
  Time (mean ± σ):     159.4 ms ±   2.5 ms    [User: 137.2 ms, System: 13.7 ms]
  Range (min … max):   156.9 ms … 166.0 ms    18 runs

importlib:

$ COMP_CWORD="2" COMP_WORDS="verdi data g" _VERDI_COMPLETE=complete-bash hyperfine --warmup 3 verdi
Benchmark #1: verdi
  Time (mean ± σ):     211.9 ms ±   6.8 ms    [User: 184.3 ms, System: 16.9 ms]
  Range (min … max):   206.9 ms … 232.8 ms    13 runs

dev-zero · 2019-11-11T12:15:26Z

@ltalirz but it seems there's still a bug in your branch:

COMP_CWORD="2" COMP_WORDS="verdi data g" _VERDI_COMPLETE=complete-bash python -m cProfile -o program.importlib.prof /scratch/tiziano/virtualenvs/aiida/bin/verdi
Traceback (most recent call last):
  File "/scratch/tiziano/work/aiida/aiida_core/aiida/cmdline/params/types/identifier.py", line 65, in __init__
    entry_point = get_entry_point_from_string(entry_point_string)
  File "/scratch/tiziano/work/aiida/aiida_core/aiida/plugins/entry_point.py", line 193, in get_entry_point_from_string
    return get_entry_point(group, name)
  File "/scratch/tiziano/work/aiida/aiida_core/aiida/plugins/entry_point.py", line 298, in get_entry_point
    raise MultipleEntryPointError("Multiple entry points '{}' found in group".format(name, group))
aiida.common.exceptions.MultipleEntryPointError: Multiple entry points 'process.calculation.calcjob' found in group

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/users/tiziano/.pyenv/versions/3.7.5/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/users/tiziano/.pyenv/versions/3.7.5/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/users/tiziano/.pyenv/versions/3.7.5/lib/python3.7/cProfile.py", line 185, in <module>
    main()
  File "/users/tiziano/.pyenv/versions/3.7.5/lib/python3.7/cProfile.py", line 178, in main
    runctx(code, globs, None, options.outfile, options.sort)
  File "/users/tiziano/.pyenv/versions/3.7.5/lib/python3.7/cProfile.py", line 20, in runctx
    filename, sort)
  File "/users/tiziano/.pyenv/versions/3.7.5/lib/python3.7/profile.py", line 62, in runctx
    prof.runctx(statement, globals, locals)
  File "/users/tiziano/.pyenv/versions/3.7.5/lib/python3.7/cProfile.py", line 100, in runctx
    exec(cmd, globals, locals)
  File "/scratch/tiziano/virtualenvs/aiida/bin/verdi", line 6, in <module>
    from aiida.cmdline.commands.cmd_verdi import verdi
  File "/scratch/tiziano/work/aiida/aiida_core/aiida/cmdline/commands/__init__.py", line 21, in <module>
    from aiida.cmdline.commands import (
  File "/scratch/tiziano/work/aiida/aiida_core/aiida/cmdline/commands/cmd_calcjob.py", line 31, in <module>
    @arguments.CALCULATION('calcjob', type=CalculationParamType(sub_classes=('aiida.node:process.calculation.calcjob',)))
  File "/scratch/tiziano/work/aiida/aiida_core/aiida/cmdline/params/types/identifier.py", line 67, in __init__
    raise ValueError('{} is not a valid entry point string: {}'.format(entry_point_string, exception))
ValueError: aiida.node:process.calculation.calcjob is not a valid entry point string: Multiple entry points 'process.calculation.calcjob' found in group

... works in develop.

ltalirz · 2019-11-11T12:23:37Z

Ok, here are the corresponding timings for my laptop:

importlib_metadata

$ COMP_CWORD="2" COMP_WORDS="verdi data g" _VERDI_COMPLETE=complete-bash hyperfine --warmup 3 verdi
Benchmark #1: verdi
  Time (mean ± σ):     304.1 ms ±  33.8 ms    [User: 254.9 ms, System: 45.2 ms]
  Range (min … max):   283.4 ms … 394.6 ms    10 runs

reentry

$ COMP_CWORD="2" COMP_WORDS="verdi data g" _VERDI_COMPLETE=complete-bash hyperfine --warmup 3 verdi
Benchmark #1: verdi
  Time (mean ± σ):     207.8 ms ±   4.2 ms    [User: 170.5 ms, System: 33.2 ms]
  Range (min … max):   202.0 ms … 219.3 ms    14 runs

Not a negligible difference... on the other hand, most of the time is already spent outside entry-point discovery, so one could look whether we could avoid some other package imports.

it seems there's still a bug in your branch:

Yeah, the verdi data isn't working properly yet - the entry point interface of importlib_metadata is slightly different, I need to check what to fix there.
I first wanted to get an idea of whether we are going to use it.

giovannipizzi · 2019-11-12T21:49:25Z

Just to report my results.

Ubuntu workstation, develop:

Benchmark #1: verdi
  Time (mean ± σ):     152.0 ms ±   6.6 ms    [User: 115.4 ms, System: 14.0 ms]
  Range (min … max):   144.4 ms … 162.9 ms    20 runs

Ubuntu workstation, branch:

Benchmark #1: verdi
  Time (mean ± σ):     194.6 ms ±   6.4 ms    [User: 174.6 ms, System: 15.7 ms]
  Range (min … max):   187.4 ms … 207.8 ms    14 runs

On my mac: on develop I get ~0.25s and on the branch: ~0.31s

Note: I tested with py3.

I think this is more or less consistent with the other results.

My feeling: while this is a small loss for a very important feature, probably dropping reentry is still a big win so I would be in favour of going in this direction, especially if we can try to gain something by optimising other parts of the code called by verdi.

ltalirz · 2021-05-21T16:50:33Z

Since @chrisjsewell brought this up again, I quickly ported the changes to the current develop and gave it a spin on the M1 Macbook Air, python 3.9.1:

reentry (bare)

$ python -mtimeit -n1 -r1 'import reentry'
1 loop, best of 1: 19.6 msec per loop
$ python -mtimeit  -s 'from reentry.default_manager import PluginManager as p' 'p().get_entry_map()'
200 loops, best of 5: 1.07 msec per loop

Still takes about 21ms altogether (compared to 25ms for the previous test on my old MacBook)

importlib (bare)

$ python -mtimeit -n1 -r1 'from importlib import metadata'
1 loop, best of 1: 16.7 msec per loop
$ python -mtimeit -s 'import importlib.metadata as im' 'im.entry_points()'
20 loops, best of 5: 18.8 msec per loop

Now at 17+19=36ms (down from ~100ms on my old MacBook which also used importlib-metadata, not the one in the standard lib).

This is a big difference to my previous benchmark; it leaves only a ~15ms bonus for reentry.

Note: Since this difference is so large, I tried the same using the importlib-metadata backport package, which can also be installed under python 3.9. This implementation is significantly slower than the one from the standard lib:

$ python -mtimeit -n1 -r1 'import importlib_metadata'
1 loop, best of 1: 31.3 msec per loop
$ python -mtimeit -s 'import importlib_metadata as im' 'im.entry_points()'
5 loops, best of 5: 53.5 msec per loop

Total of 31+54=85ms

verdi autocompletion (reentry, develop branch)

$ COMP_CWORD="2" COMP_WORDS="verdi data g" _VERDI_COMPLETE=complete-bash hyperfine --warmup 3 verdi
Benchmark #1: verdi
  Time (mean ± σ):     209.9 ms ±   5.2 ms    [User: 153.6 ms, System: 34.9 ms]
  Range (min … max):   205.5 ms … 224.0 ms    13 runs

verdi autocompletion (importlib, https://github.com/ltalirz/aiida-core/tree/issue_3458_importlib_metadata_2 )

$ COMP_CWORD="2" COMP_WORDS="verdi data g" _VERDI_COMPLETE=complete-bash hyperfine --warmup 3 verdi
Benchmark #1: verdi
  Time (mean ± σ):     225.9 ms ±   2.6 ms    [User: 169.3 ms, System: 36.1 ms]
  Range (min … max):   222.9 ms … 231.3 ms    12 runs

As expected from the bare tests, the difference in the mean is ~15ms.

With this, at least on the latest macbooks and with python>=3.8 I think there is no longer a reason to use reentry.
While we promise to still support python 3.7 until 26-Dec-2021, I think it's ok to accept a little slowdown there - it won't break anything and people who want better speed can upgrade to python 3.8.

Let's see whether we find similar behavior also on other hardware - if yes, I would say we can move ahead with dropping the reentry dependency.

P.S. I'm a little surprised that the absolute time of the autocompletion is not lower (still 200+ms). We may want to check at some point again whether there are any imports that could be avoided...

ltalirz · 2021-05-21T17:05:05Z

P.P.S. I notice that the timings do depend on the number of packages installed in the environment (which makes sense - this will add a linear cost to importlib, while the cost for reentry remains effectively constant).

For reference, in my tests:

$ pip list | wc -l
219

ltalirz · 2021-05-21T17:21:01Z

Now, here come the same bare tests on the AiiDA lab (python 3.7; NFS file system)

$ pip list | wc -l
253

reentry (bare)

(base) aiida@168c1dbd1cfe:~$  python -mtimeit -n1 -r1 'import reentry'
1 loop, best of 1: 25.1 msec per loop
(base) aiida@168c1dbd1cfe:~$ python -mtimeit  -s 'from reentry.default_manager import PluginManager as p' 'p().get_entry_map()'
100 loops, best of 5: 2.8 msec per loop

importlib-metadata (bare)

(base) aiida@168c1dbd1cfe:~$ python -mtimeit -n1 -r1 'import importlib_metadata'
1 loop, best of 1: 22.8 msec per loop
$ python -mtimeit -s 'import importlib_metadata as im' 'im.entry_points()'
5 loops, best of 5: 56.4 msec per loop

I guess here one could see a significant speed benefit from upgrading to python 3.8

chrisjsewell · 2021-08-10T12:48:05Z

As discussed in #5058, I think importlib-metadata >= 4.3 is may now be faster than importlib.metadata. For me on python 3.9, with importlib-metadata 4.6.3 and reentry 1.3.2:

$ python -mtimeit -n1 -r1 'import reentry'
1 loop, best of 1: 36.8 msec per loop
$ python -mtimeit  -s 'from reentry.default_manager import PluginManager as p' 'p().get_entry_map()'
1 loop, best of 5: 1.53 msec per loop
$ python -mtimeit -n1 -r1 'from importlib import metadata'
1 loop, best of 1: 39.1 msec per loop
$ python -mtimeit -s 'import importlib.metadata as im' 'im.entry_points()'
10 loops, best of 5: 30.3 msec per loop
$ python -mtimeit -n1 -r1 'import importlib_metadata'
1 loop, best of 1: 51 msec per loop
$ python -mtimeit -s 'import importlib_metadata as im' 'im.entry_points()'
20 loops, best of 5: 10.7 msec per loop

ltalirz · 2021-08-10T13:05:21Z

I confirm that there seems to have been a massive improvement from importlib_metadata v4.0.1 (that I was testing before) to v4.6.3

v4.0.1

$ python -mtimeit -s 'import importlib_metadata as im' 'im.entry_points()'
5 loops, best of 5: 59.3 msec per loop

v4.6.3

$ python -mtimeit -s 'import importlib_metadata as im' 'im.entry_points()'
50 loops, best of 5: 5.94 msec per loop

Both tests under python 3.9.1

ltalirz · 2021-08-12T14:18:41Z

Final note: I've archived reentry with a detailed explanation & archival note aiidateam/reentry@577e5bb

dev-zero added the type/feature request status undecided label Oct 23, 2019

dev-zero added this to the v1.1.0 milestone Oct 23, 2019

dev-zero self-assigned this Oct 23, 2019

ltalirz self-assigned this Oct 23, 2019

giovannipizzi mentioned this issue Nov 11, 2019

decide whether or not to provide wheels for aiida-core #2746

Closed

sphuber removed this from the v1.1.0 milestone Feb 28, 2020

ramirezfranciscof added the topic/plugin-system label Mar 23, 2021

chrisjsewell linked a pull request Aug 10, 2021 that will close this issue

♻️ REFACTOR: Remove reentry requirement #5058

Merged

ltalirz mentioned this issue Aug 11, 2021

♻️ REFACTOR: Remove reentry requirement #5058

Merged

chrisjsewell closed this as completed in #5058 Aug 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plugins: Investigate official importlib.metadata for entrypoint discovery #3458

Plugins: Investigate official importlib.metadata for entrypoint discovery #3458

dev-zero commented Oct 23, 2019

ltalirz commented Oct 23, 2019 •

edited

ltalirz commented Oct 29, 2019

ltalirz commented Nov 8, 2019 •

edited

ltalirz commented Nov 9, 2019

ltalirz commented Nov 11, 2019

sphuber commented Nov 11, 2019

dev-zero commented Nov 11, 2019

ltalirz commented Nov 11, 2019 •

edited

dev-zero commented Nov 11, 2019

dev-zero commented Nov 11, 2019

ltalirz commented Nov 11, 2019 •

edited

giovannipizzi commented Nov 12, 2019

ltalirz commented May 21, 2021 •

edited

ltalirz commented May 21, 2021 •

edited

ltalirz commented May 21, 2021 •

edited

chrisjsewell commented Aug 10, 2021 •

edited

ltalirz commented Aug 10, 2021 •

edited

ltalirz commented Aug 12, 2021

Plugins: Investigate official importlib.metadata for entrypoint discovery #3458

Plugins: Investigate official importlib.metadata for entrypoint discovery #3458

Comments

dev-zero commented Oct 23, 2019

ltalirz commented Oct 23, 2019 • edited

ltalirz commented Oct 29, 2019

ltalirz commented Nov 8, 2019 • edited

ltalirz commented Nov 9, 2019

ltalirz commented Nov 11, 2019

sphuber commented Nov 11, 2019

dev-zero commented Nov 11, 2019

ltalirz commented Nov 11, 2019 • edited

dev-zero commented Nov 11, 2019

dev-zero commented Nov 11, 2019

ltalirz commented Nov 11, 2019 • edited

giovannipizzi commented Nov 12, 2019

ltalirz commented May 21, 2021 • edited

reentry (bare)

importlib (bare)

verdi autocompletion (reentry, develop branch)

verdi autocompletion (importlib, https://github.com/ltalirz/aiida-core/tree/issue_3458_importlib_metadata_2 )

ltalirz commented May 21, 2021 • edited

ltalirz commented May 21, 2021 • edited

reentry (bare)

importlib-metadata (bare)

chrisjsewell commented Aug 10, 2021 • edited

ltalirz commented Aug 10, 2021 • edited

ltalirz commented Aug 12, 2021

ltalirz commented Oct 23, 2019 •

edited

ltalirz commented Nov 8, 2019 •

edited

ltalirz commented Nov 11, 2019 •

edited

ltalirz commented Nov 11, 2019 •

edited

ltalirz commented May 21, 2021 •

edited

ltalirz commented May 21, 2021 •

edited

ltalirz commented May 21, 2021 •

edited

chrisjsewell commented Aug 10, 2021 •

edited

ltalirz commented Aug 10, 2021 •

edited