New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hooks: pkg_resources: implement support for package content listing #5284
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shows the quality of your work that the only problems I can find are typos! However just for safety's sake, I'll ask @bwoodsend to review this as well.
I'd feel slightly more comfortable if those resource files weren't empty. When I do: >>> pkg_resources.resource_string("pyi_pkgres_testpkg", "subpkg1/data/extra/extra_entry1.txt")
b'' I question if something went wrong - even though I know you get a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been code.interact()
messing about with this and it's very satisfyingly consistent with unfrozen Python.
Good point - I'll populate the data files in test package. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Impressive work! You've been so brave going down into the catacombs of pkg_resources. Wow!
Thanks for implementing this, this solves a major pain we have since long. And I like the idea of transparently including the files in the file-system.
Beside some nitpicking (see below and line comments), I have two relevant points:
1) Which takes precedense: PYZ or filesystem? This should be the same behavior as if a module is both in the PYZ and in the file-system (due to collect_data_files, including .pyc). I don't have the asnwer at hand, so please have a look at the code or test it.
Of course this needs to be documents, at least in the relase notes and at the top of the runt-time-hook..
2) AFAIU, TocFilesystem
structure is a dict of dicts, containing pathlib.PurePath
s as leafs. IMHO building this is much too much overhead:
- Over the run-time of a program, resource access is rare and typically not done in a inner loop. This our Provider should not spend too much time, memory and eletrical power on speeding up the lookup of a resource. Chances are high that only a handful of resources will be requested even for a huge program.
This is especially true since AFAIS, this is done for each call ofget_provider
(which is called again by echoresource_*()
call), and not cached:
>>> p1 = get_provider("ansible")
>>> p2 = get_provider("ansible")
>>> id(p1)
140475825572560
>>> id(p2)
140475825511888
- While using pathlib here is elegant and simple, the overhead is huge. Again, esp. since results seem not to be cached, and the current code is touching many strings and files.
- Thus I suggest you have a look at ZipProvider and seem cant be imitated from there.
And now here are the minor remarks.
- Please put at least a
#
(and line-ending) into the empty files. Empyt file are easily deleted accidentally. - Please choose different extensions for the data files, to avoid only .txt-files are collected ;-) If you want to stick with text-like file extensions, .rst and .md are good choices.
- Please provide a Makefile for recreating the .egg. Some distributions liek Debian want to have pure source code, and a zip file is a binary.
- Please use present tense in the commit messages.
Thanks for the detailed review!
Perhaps it's worth pointing out that from the issues I've seen, we could probably get away by implementing only the on-filesystem part of the behavior here. If we aimed for that, it would probably be enough to just subclass the That's the reason why the initial attempt to use
The PYZ-embedded resources take precedence. This is to keep behavior consistent with
Indeed,
The prefix tree is there not so much to speed up the resource look-up, but more due to the fact that if we want to emulate PYZ-embedded filesystem, we need to somehow reconstruct the directory structure from filenames in the TOC. I could not come up with a better idea to do so - perhaps you have one? But I agree with your performance concerns. Would you consider the following two modifications sufficient to alleviate those?
OK, I will take a closer look at
Aha, good point. I filled in the datafiles as per an earlier comment, but the .py files are indeed blank. Will add a comment line in them.
Sure, we can diversify the extensions and contents of datafiles a bit.
Good point. I thought zipped eggs were allowed, since we already have But on the other hand, keeping the "source" and zipped egg in sync is somewhat a hassle, so it might actually be better to have the test build the egg from the source and use it...
Yeah, the commit messages are a bit of a mess right now, and reflect the actual incremental development history... I was hoping they would be squashed together with a summary message from the PR. Otherwise, I will squash/reorganize them a bit once all other concerns are addresses and the PR is ready for merge... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beside some minor nit-picking, this looks good for me. Okay to merge after this have been done and commits cleaned up.
(I would have applied these nit-picking changes myself, but commits are not yet clean up.)
tests/functional/modules/pyi_pkg_resources_provider/package/pyi_pkgres_testpkg/a.py
Outdated
Show resolved
Hide resolved
Alright, here's the updated PR, rebased on top of I've squashed all test-related commits into first commit, and most initial incremental work on the |
The test script tests for behavior of resource_exists(), resource_isdir(), and resource_listdir() functions from pkg_resources package (which in turn call the methods with same name in the provider class). The idea is to run the test script twice, once as unfrozen python script and once as a frozen program. In both cases, the test package is once present as a plain package directory and once as a zipped egg (generated on-the-fly from the source directory). This way, we test behavior of the original provider (DefaultProvider or ZipProvider) and the provider used within the frozen application (which we will need to implement to replace the currently used NullProvider).
Implement PyiFrozenProvider that subclasses NullProvider from pkg_resources and provides _has(), _isdir(), and _listdir() methods. The implementation of these methods supports both PYZ-embedded and on-filesystem resources, in that order of precedence.
Because a directory may exist as both an embedded and on-filesystem resource, we need to de-duplicate the results when listing the filesystem in addition to embedded tree.
Add a block describing basic behavior of PyiFrozenProvider, w.r.t. to PYZ-embedded and on-filesystem resources.
Implement a custom
pkg_resources
provider for ourFrozenImporter
, by subclassing apkg_resources.NullProvider
and overriding its_has()
,_isdir()
, and_listdir()
methods. These are required/used by:resource_exists()
function (or provider'shas_resource()
method)resource_isdir()
function (or provider'sresource_isdir()
method)resource_listdir()
function (or provider'sresource_listdir()
method)and the default implementations provided by NullProvider raise
NotImplementedError("Can't perform this operation for unregistered loader type")
.The implementation supports both external data files (the ones collected into the frozen application's directory, e.g. from wheels) and embedded data files (collected into PYZ archive, e.g. from eggs). The former should suffice to fix the
pkg_resources
problems withmetpy
(#5247),branca
(#5256) andfolium
(#5262).Fixes the "Providers" part of #4881.
The behavior of the new
PyiFrozenProvider
is modelled afterDefaultProvider
andZipProvider
. Wherever the behavior between the two is inconsistent, we implement the one that makes more sense for our case. The one deviationin behavior is that in contrast to native providers, we cannot list the source
.py
files, as they do not exist in the frozen application.The two new added tests,
test_pkg_resources_provider_source
andtest_pkg_resources_provider_frozen
verify the behavior of providers in original (source) and frozen application. They both use the same script with several behavior checks with test package in either package or egg form. The "source" test can be seen as a sort of a sanity check for the "frozen" one; by design of the test script and transitivity of its results, both tests passing indicate thatPyiFrozenProvider
implements behavior that is in line with the behavior of the two reference providers.