Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkgutil.get_data() with in-memory resources #457

Open
dae opened this issue Oct 14, 2021 · 5 comments
Open

pkgutil.get_data() with in-memory resources #457

dae opened this issue Oct 14, 2021 · 5 comments
Labels
bug Something isn't working enhancement New feature or request

Comments

@dae
Copy link
Contributor

dae commented Oct 14, 2021

Third-party libraries like jsonschema, that call pkgutil.get_data() to read their bundled resources, currently fail when the sources are in memory, as they lack a __file__ attribute. PyOxidizer's ResourceReader interface does not require __file__, and can successfully retrieve the resources. By dynamically replacing get_data() with a call to open_resource(), it appears to be possible to get jsonschema to work when in memory (mostly - it still has a dependency on an extension module):

def make_exe():
    dist = default_python_distribution()

    policy = dist.make_python_packaging_policy()
    policy.resources_location = "in-memory"
    policy.resources_location_fallback = "filesystem-relative:lib"

    python_config = dist.make_python_interpreter_config()
    python_config.run_command = """
import sys
import pkgutil
import importlib

def get_data_custom(package, resource):
    try:
        module = importlib.import_module(package)
        with module.__loader__.get_resource_reader(package).open_resource(resource) as f:
            return f.read()
    except:
        return None
if getattr(sys, "oxidized", False):
    pkgutil.get_data = get_data_custom

# quick jsonschema test
schema = {"type" : "object", "properties" : {"price" : {"type" : "number"}} }

import jsonschema
jsonschema.validate(instance=dict(price="test"), schema=schema)
"""

    exe = dist.to_python_executable(
        name = "pyapp",
        packaging_policy = policy,
        config = python_config,
    )
    for resource in exe.pip_install(["jsonschema"]):
        exe.add_python_resource(resource)
    return exe

def make_embedded_resources(exe):
    return exe.to_embedded_resources()

def make_install(exe):
    files = FileManifest()
    files.add_python_resource(".", exe)
    return files

register_target("exe", make_exe)
register_target("resources", make_embedded_resources, depends = ["exe"], default_build_script = True)
register_target("install", make_install, depends = ["exe"], default = True)

resolve_targets()

The sys.oxidized gate is necessary to ensure things run correctly outside PyOxidizer, as the default Python ResourceReader can not handle subdirs in the resource path.

I have not tested this with any other libraries yet, but presume this approach is not limited to jsonschema. It's a hack, and I'm not suggesting this be included in PyOxidizer itself, but I thought it might be worth mentioning in case anyone finds it useful.

Partially related to #436

@indygreg
Copy link
Owner

Here's the implementation of pkgutil.read_data() from the Python stdlib:

def get_data(package, resource):
    spec = importlib.util.find_spec(package)
    if spec is None:
        return None
    loader = spec.loader
    if loader is None or not hasattr(loader, 'get_data'):
        return None
    # XXX needs test
    mod = (sys.modules.get(package) or
           importlib._bootstrap._load(spec))
    if mod is None or not hasattr(mod, '__file__'):
        return None

    # Modify the resource name to be compatible with the loader.get_data
    # signature - an os.path format "filename" starting with the dirname of
    # the package's __file__
    parts = resource.split('/')
    parts.insert(0, os.path.dirname(mod.__file__))
    resource_name = os.path.join(*parts)
    return loader.get_data(resource_name)

That if mod is None or not hasattr(mode, '__file__') check is preventing pkgutil.get_data() from working with PyOxidizer, even though OxidizedFinder implements get_data(). Sadness.

I'm really tempted to give up on __file__ being optional and set it to a virtual path under the current executable's path. This is what zipimport does and as long as a sys.path_hooks entry is registered, things should just work.

@indygreg indygreg added bug Something isn't working enhancement New feature or request labels Oct 24, 2021
@sevein
Copy link

sevein commented Nov 3, 2021

FWIW jsonschema is now using importlib.resources (https://github.com/Julian/jsonschema/pull/873/files) - v4.2.0.

@dae
Copy link
Contributor Author

dae commented Nov 5, 2021

Hmm, the new code doesn't seem to work for me on a pyoxidized Python 3.9:

  File "jsonschema", line 29, in <module>
  File "jsonschema.validators", line 349, in <module>
  File "jsonschema._utils", line 60, in load_schema
  File "importlib.resources", line 147, in files
  File "importlib._common", line 14, in from_package
  File "importlib._common", line 18, in fallback_resources
  File "pathlib", line 1082, in __new__
  File "pathlib", line 707, in _from_parts
  File "pathlib", line 691, in _parse_args
TypeError: expected str, bytes or os.PathLike object, not NoneType

@sevein
Copy link

sevein commented Nov 5, 2021

You're right @dae, I can reproduce when the location is in-memory.
It does work with jsonschema v4.2.0 when using filesystem-relative.

dae added a commit to ankitects/anki that referenced this issue Nov 5, 2021
jsonschema 4.2 introduced a change that broke our current workaround
for in-memory support.

python-jsonschema/jsonschema#873
indygreg/PyOxidizer#457
@pquentin
Copy link

The new code breaks due to #237. It's doing exactly that:

It is possible some code somewhere (including in the Python standard library) is buggy and assumes presence of a 3.9 API (e.g. importlib.resources.files() implies support for these APIs on individual meta path importers. I could easily see how someone could test for hasattr(importlib.resources, "files") and assume all meta path importers support .files(). This behavior would be naive about the possibility of 3rd party meta path importers, such as OxidizedFinder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants