New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Plugins: Investigate official importlib.metadata for entrypoint discovery #3458
Comments
Sounds great, I'll have a look at this. P.S. Ah, I see you've already assigned yourself ;-) I'll wait a bit then to see what you discover. |
as part of this, one should also check whether the dependency on |
I started looking into this here: I'm using a basic in-memory cache to make sure we're loading the entry points just once, but still the cli is a bit less responsive than with reentry - not terribly, but noticeably. Timings with importlib (on Macbook Pro Retina Mid-2015):
Note: It seems that with importlib_metadata you always have to load the full set of entry points. Timings with reentry (on Macbook Pro Retina Mid-2015):
P.S. Importing So for |
@dev-zero Do you happen to have an AiiDA installation on a magnetic hard disk, where you could give the two tests above a try? |
@giovannipizzi @sphuber Would you mind giving the PR a try and let me know how the autocompletion speed feels to you? It's certainly a bit slower, but at some point we need to think whether this still warrants the hassle of maintaining reentry, and asking people to |
For me personally the responsiveness of the CLI is very important. I use it all the time and the slowness of auto-completion really annoys me. A factor of 4 is huge in that context I feel and for me would definitely warrant the continued development of reentry and the annoyance of However, where does that number come from. I installed your branch and tested |
results on SSD (only thing so far) are: develop: (develop=) $ hyperfine --warmup 3 verdi "verdi data gaussian.basisset"
Benchmark #1: verdi
Time (mean ± σ): 159.9 ms ± 7.2 ms [User: 135.9 ms, System: 15.1 ms]
Range (min … max): 149.3 ms … 171.0 ms 19 runs
Benchmark #2: verdi data gaussian.basisset
Time (mean ± σ): 171.4 ms ± 7.9 ms [User: 145.9 ms, System: 15.8 ms]
Range (min … max): 159.3 ms … 186.6 ms 17 runs
Summary
'verdi' ran
1.07 ± 0.07 times faster than 'verdi data gaussian.basisset' the importlib branch: (issue_3458_importlib_metadata=) $ hyperfine --warmup 3 verdi "verdi data" "verdi data gaussian.basisset"
Benchmark #1: verdi
Time (mean ± σ): 204.2 ms ± 2.8 ms [User: 175.6 ms, System: 18.3 ms]
Range (min … max): 201.2 ms … 210.5 ms 14 runs
Benchmark #2: verdi data
Time (mean ± σ): 216.0 ms ± 2.5 ms [User: 186.5 ms, System: 18.7 ms]
Range (min … max): 212.0 ms … 220.5 ms 13 runs
Benchmark #3: verdi data gaussian.basisset
Time (mean ± σ): 228.4 ms ± 13.0 ms [User: 196.6 ms, System: 20.6 ms]
Range (min … max): 208.9 ms … 246.9 ms 13 runs
Summary
'verdi' ran
1.06 ± 0.02 times faster than 'verdi data'
1.12 ± 0.07 times faster than 'verdi data gaussian.basisset' |
As described above, this comes from adding up the timings for importing the package (reentry vs importlib_metadata) and the call to load the entry points. 4 *25ms = 100ms, and 75ms of difference is close to the difference you are seeing as well. Is the result of |
@ltalirz yes, pretty much it seems. Here's a benchmark with the simulated completion for develop: $ COMP_CWORD="2" COMP_WORDS="verdi data g" _VERDI_COMPLETE=complete-bash hyperfine --warmup 3 verdi
Benchmark #1: verdi
Time (mean ± σ): 159.4 ms ± 2.5 ms [User: 137.2 ms, System: 13.7 ms]
Range (min … max): 156.9 ms … 166.0 ms 18 runs
importlib: $ COMP_CWORD="2" COMP_WORDS="verdi data g" _VERDI_COMPLETE=complete-bash hyperfine --warmup 3 verdi
Benchmark #1: verdi
Time (mean ± σ): 211.9 ms ± 6.8 ms [User: 184.3 ms, System: 16.9 ms]
Range (min … max): 206.9 ms … 232.8 ms 13 runs
|
@ltalirz but it seems there's still a bug in your branch: COMP_CWORD="2" COMP_WORDS="verdi data g" _VERDI_COMPLETE=complete-bash python -m cProfile -o program.importlib.prof /scratch/tiziano/virtualenvs/aiida/bin/verdi
Traceback (most recent call last):
File "/scratch/tiziano/work/aiida/aiida_core/aiida/cmdline/params/types/identifier.py", line 65, in __init__
entry_point = get_entry_point_from_string(entry_point_string)
File "/scratch/tiziano/work/aiida/aiida_core/aiida/plugins/entry_point.py", line 193, in get_entry_point_from_string
return get_entry_point(group, name)
File "/scratch/tiziano/work/aiida/aiida_core/aiida/plugins/entry_point.py", line 298, in get_entry_point
raise MultipleEntryPointError("Multiple entry points '{}' found in group".format(name, group))
aiida.common.exceptions.MultipleEntryPointError: Multiple entry points 'process.calculation.calcjob' found in group
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/users/tiziano/.pyenv/versions/3.7.5/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/users/tiziano/.pyenv/versions/3.7.5/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/users/tiziano/.pyenv/versions/3.7.5/lib/python3.7/cProfile.py", line 185, in <module>
main()
File "/users/tiziano/.pyenv/versions/3.7.5/lib/python3.7/cProfile.py", line 178, in main
runctx(code, globs, None, options.outfile, options.sort)
File "/users/tiziano/.pyenv/versions/3.7.5/lib/python3.7/cProfile.py", line 20, in runctx
filename, sort)
File "/users/tiziano/.pyenv/versions/3.7.5/lib/python3.7/profile.py", line 62, in runctx
prof.runctx(statement, globals, locals)
File "/users/tiziano/.pyenv/versions/3.7.5/lib/python3.7/cProfile.py", line 100, in runctx
exec(cmd, globals, locals)
File "/scratch/tiziano/virtualenvs/aiida/bin/verdi", line 6, in <module>
from aiida.cmdline.commands.cmd_verdi import verdi
File "/scratch/tiziano/work/aiida/aiida_core/aiida/cmdline/commands/__init__.py", line 21, in <module>
from aiida.cmdline.commands import (
File "/scratch/tiziano/work/aiida/aiida_core/aiida/cmdline/commands/cmd_calcjob.py", line 31, in <module>
@arguments.CALCULATION('calcjob', type=CalculationParamType(sub_classes=('aiida.node:process.calculation.calcjob',)))
File "/scratch/tiziano/work/aiida/aiida_core/aiida/cmdline/params/types/identifier.py", line 67, in __init__
raise ValueError('{} is not a valid entry point string: {}'.format(entry_point_string, exception))
ValueError: aiida.node:process.calculation.calcjob is not a valid entry point string: Multiple entry points 'process.calculation.calcjob' found in group ... works in |
Ok, here are the corresponding timings for my laptop: importlib_metadata
reentry
Not a negligible difference... on the other hand, most of the time is already spent outside entry-point discovery, so one could look whether we could avoid some other package imports.
Yeah, the |
Just to report my results. Ubuntu workstation, develop:
Ubuntu workstation, branch:
On my mac: on develop I get ~0.25s and on the branch: ~0.31s Note: I tested with py3. I think this is more or less consistent with the other results. My feeling: while this is a small loss for a very important feature, probably dropping reentry is still a big win so I would be in favour of going in this direction, especially if we can try to gain something by optimising other parts of the code called by |
Since @chrisjsewell brought this up again, I quickly ported the changes to the current develop and gave it a spin on the M1 Macbook Air, python 3.9.1: reentry (bare)
Still takes about 21ms altogether (compared to 25ms for the previous test on my old MacBook) importlib (bare)
Now at 17+19=36ms (down from ~100ms on my old MacBook which also used This is a big difference to my previous benchmark; it leaves only a ~15ms bonus for reentry. Note: Since this difference is so large, I tried the same using the
Total of 31+54=85ms verdi autocompletion (reentry, develop branch)
verdi autocompletion (importlib, https://github.com/ltalirz/aiida-core/tree/issue_3458_importlib_metadata_2 )
As expected from the bare tests, the difference in the mean is ~15ms. With this, at least on the latest macbooks and with python>=3.8 I think there is no longer a reason to use reentry. Let's see whether we find similar behavior also on other hardware - if yes, I would say we can move ahead with dropping the reentry dependency. P.S. I'm a little surprised that the absolute time of the autocompletion is not lower (still 200+ms). We may want to check at some point again whether there are any imports that could be avoided... |
P.P.S. I notice that the timings do depend on the number of packages installed in the environment (which makes sense - this will add a linear cost to For reference, in my tests:
|
Now, here come the same bare tests on the AiiDA lab (python 3.7; NFS file system)
reentry (bare)
importlib-metadata (bare)
I guess here one could see a significant speed benefit from upgrading to python 3.8 |
As discussed in #5058, I think $ python -mtimeit -n1 -r1 'import reentry'
1 loop, best of 1: 36.8 msec per loop
$ python -mtimeit -s 'from reentry.default_manager import PluginManager as p' 'p().get_entry_map()'
1 loop, best of 5: 1.53 msec per loop
$ python -mtimeit -n1 -r1 'from importlib import metadata'
1 loop, best of 1: 39.1 msec per loop
$ python -mtimeit -s 'import importlib.metadata as im' 'im.entry_points()'
10 loops, best of 5: 30.3 msec per loop
$ python -mtimeit -n1 -r1 'import importlib_metadata'
1 loop, best of 1: 51 msec per loop
$ python -mtimeit -s 'import importlib_metadata as im' 'im.entry_points()'
20 loops, best of 5: 10.7 msec per loop |
I confirm that there seems to have been a massive improvement from v4.0.1
v4.6.3
Both tests under python 3.9.1 |
Final note: I've archived |
Python 3.8 introduced a built-in module for entrypoint discovery https://docs.python.org/3.8/library/importlib.metadata.html with a backport for <3.8 and 2.7 here: https://importlib-metadata.readthedocs.io/en/latest/
While there's nothing wrong with
pkg_resources
other than we needreentry
to make it scale, we should do a test withimportlib.metadata
to see whether it would be fast enough for us withoutreentry
.Release notes from
pluggy
(pytest
) andtox
suggest that it performs better thansetuptool
spkg_resources
.The text was updated successfully, but these errors were encountered: