Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use resources loader to handle non-filesystem situations #120

Merged
merged 2 commits into from Mar 18, 2020

Conversation

adferrand
Copy link
Contributor

@adferrand adferrand commented Mar 13, 2020

While playing around with PyOxidizer, I realized that certifi was not working properly in self-contained executables generated by this tool. The typical traceback is the following:

Traceback (most recent call last):
  File ...  # Some code that relies on requests library
  File "requests", line 112, in <module>
  File "requests.utils", line 39, in <module>
  File "certifi.core", line 13, in where
NameError: name '__file__' is not defined

PyOxidizer maintainers made a complete explanation in this page of what is going on: https://pyoxidizer.readthedocs.io/en/stable/packaging_pitfalls.html. Short answer is: one cannot rely on __file__ existence. Indeed the module may be exposed to Python runtime through an archive file, or even no file at all and directly from the RAM. In these cases __file__ is irrelevant or does not exist.

However, the certifi.core.where() method is built on the assumption that __file__ exists and can be used to get the actual path of cacert.pem.

In the same page about the Packaging Pitfalls (https://pyoxidizer.readthedocs.io/en/stable/packaging_pitfalls.html), PyOxidizer maintainers provide some decent alternatives to __file__ in order to handle these situations, and these alternatives are using high level resources loaders APIs that have been added progressively to Python.

Functional examples are provided, and I used them to reimplement the certifi.core module, in order to consume the available resource loaders if possible. Thanks to it, in non-filesystem situations:

  • where() will extract the file content if pkg_resources is available and put it as a temporary file in the current Python cache, then return its path.
  • contents() will directly return the file content if importlib.resources is available (and so handling the path manipulations internally), or rely on where() to return this content.

As a general fallback, if no resources loader is available, the original behavior of certifi.core will be triggered: relying on __file__ existence. This fallback may happen on Python <3.7, and never on Python 3.7+, making certifi (and so requests) work in any situation with this recent version of Python.

Fixes #66

@Lukasa
Copy link
Member

Lukasa commented Mar 13, 2020

Does #116 not address this?

@adferrand
Copy link
Contributor Author

Well, not totally in fact.

Indeed contents() is better handled with #116, because if importlib.resources is available, then it will return the content of cacert.pem in all situations. However if importlib.resources is not there, you fallback to the capacity of where() to provide a path.

And in this case, and in the case of a direct call to where() (like in requests.utils L40), things will break in non-filesystem situations because where() relies directly on __file__ without trying a resources loader. This is what is described in the inline comment for read_text in fact.

@sigmavirus24
Copy link
Member

We discussed on #116 that if there was no resources loader (Python < 3.7.0) then we were fine with that breaking because it would be documented by PyOxidizer as a requirement. This seems to add an incredible amount more of complexity in order to achieve something which we already decided we didn't care about

@adferrand
Copy link
Contributor Author

Is it so? This PR adds 10 lines of code, and covers a code that is not compliant to Python specifications.

It is really not for PyOxidizer, it is more that it shows why things could be broken here because the code is not compliant.

In term of complexity, this PR does not add new patterns (the try/except ImportError is already used), and obviously does not add any external dependencies. Since this library is very low level and is used by basically everyone, I think it is a decent improvement with no cost.

@adferrand
Copy link
Contributor Author

adferrand commented Mar 13, 2020

Independently from this, the current code of requests still uses where() and so is broken for any Python version when the library certifi is not in the filesystem. Is there an ongoing work to take advantage of #116 and use contents()?

Alternatively this PR, if I am not mistaken, fixes requests on that matter without modification to requests itself when a resources loader is available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

"__file__" variable cannot be used when running with frozen program
3 participants