Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feedback on new files() and Traverseable API #90

Closed
jaraco opened this issue Oct 21, 2020 · 18 comments
Closed

Feedback on new files() and Traverseable API #90

jaraco opened this issue Oct 21, 2020 · 18 comments

Comments

@jaraco
Copy link
Member

jaraco commented Oct 21, 2020

In GitLab by @gregory.szorc on Mar 15, 2020, 18:04

I'm attempting to implement the new files() and Traverseable APIs in PyOxidizer and am struggling to figure it out.

First, the new classes in abc.py don't have type annotations. I think I can work them out. But being explicit would be much appreciated. Having type annotations would also match the rest of the file.

Second, the overloading of Traverseable to be both a resource loader and an actual resource is a bit confusing. Elsewhere in the importlib world, there is a separation between the interface for loading things and the things returned by that interface. It is weird to me that a single Traverseable both represents the returned value from files() (a logical collection of resources or a mechanism to obtain resources) as well as a handle on an individual resource. I would like for these interfaces to be teased apart, if possible. More on this later.

Third - and this is related to the last point - I just don't understand how you are supposed to handle directories in all cases. Because Traversable is both a loader and a handle on an individual resource, what are we supposed to do for operations like calling Traverseable.open() on an object that represents a directory? Are we supposed to return None? Raise an OsError or IoError or whatever the OS emits when you try to open a directory? Do we assume the caller calls is_dir() / is_file() and isn't dumb enough to call open() on a directory? But what if they do?

Fourth, I don't understand why there exists an open, read_text, and read_bytes on Traverseable. read_text and read_bytes can be implemented in terms of open(). So why are they a) abstract b) part of the interface in the first place? If we're going to provide open(), I think it makes sense to expose helper functions for read_text and read_bytes that are implemented in terms of open().

Fifth, to be honest, I'm not a huge fan of open() because it puts a lot of burden on custom implementations. What subset of Path.open() needs to be implemented? Path.open() is supposed to accept mode, buffering, encoding, errors, and newline arguments. Since these are all *args, **kwrags-awayed in the interface and the docs only call out mode, what is supposed to be supported? Is it really reasonable for custom implementations (which may not be able to simply call out to builtins.open() because they aren't using a traditional filesystem) to have to implement all this functionality? I would highly encourage reducing the scope of open() to binary file reading. Remove buffering, encoding, errors, and newline support. If people really want to buffer or read resources as text, they can use io.BufferedReader and/or io.TextIOWrapper. And of course, helper APIs that wrap low-level binary-only file-like objects can exist to make read_text() a trivial one-liner. Or consider making the text reading APIs optional and having the default resource reading code go through a polyfill that uses io.* wrappers accordingly.

I want to say the new files() + Traverseable API is a net improvement and thank everybody responsible. But at this point in time, I feel like the pile of warts I described in #58 has more or less been replaced by a different pile of warts :/ I failed to see the new API being designed and feel like I missed an opportunity to influence its design towards something more reasonable for advanced use cases like PyOxidizer. For that I feel bad and apologize. If it isn't too late to make backwards-incompatible changes to the interface, I would strongly vote for reducing the scope of open() and removing read_text and read_bytes. If we do that and clarify the interface and semantics a bit more, I think this new API will be clearly better than what came before.

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Mar 17, 2020, 23:06

Thanks for the feedback. I don't think it's too late to change the implementation. It'll be harder once it lands on Python 3.9.

Let me quickly answer your feedback.

First, the new classes in abc.py don't have type annotations.

Feel free to add them. I'm not very good at these annotations, so someone more experienced with them would be better suited to implement them.

Second, the overloading of Traverseable to be both a resource loader and an actual resource is a bit confusing.

I can see how it would be a bit jarring when you're coming with the prior expectation, but based on my experience with importlib_metadata and the pattern already presented by pathlib.Path, this approach seemed to be the most accessible. If the implementation isn't based on pathlib.Path, then the user would need to have a different model to reason about resources in subdirectories in packages versus files on a file system. With this approach, there's an underlying subset of behavior that's shared in both cases. Do you have a different interface in mind?

what are we supposed to do for operations like calling Traverseable.open() on an object that represents a directory?

It's not strictly defined, but I'd say through some reasonable exception and consider raising the same exception that pathlib.Path does when opening a directory.

I'm not a huge fan of open()

I'm not either. It came late to the party because I found it was necessary to implement open_resource. It's also necessary to get a handle to a potentially large resource (that wouldn't fit in memory performantly). If we wish to drop support for those use-cases, then open could be removed.

What subset of Path.open() needs to be implemented?

I tried to document this, but even pathlib.Path.open doesn't do a good job of documenting it. Basically, it should must modes r, rb and optionally w and wb, and it must support encoding params to io.TextIOWrapper. I do all of these things so again the interface is as close as possible to what pathlib.Path supports.

I would strongly vote for reducing the scope of open() and removing read_text and read_bytes.

In the original design without open(), read_text and read_bytes were the only way to access content. Now that open() is a required part of the interface, read_text and read_bytes could have default implementations on the Traversable object.

I'm not opposed to limiting the scope of open(), but I am opposed to returning something that conflicts with the behavior of a pathlib.Path object. That is, I'd like to be able to return a pathlib.Path instance for file-system resources... and pathlib.Path.open() opens the file in text mode by default. We could make the minimum interface support open(mode='rb') and make all of the other modes (including encoding params) optional.

I may not have answered all your questions, but I think this gives some food for thought. How would you like to proceed? Would you like to draft a proposed change?

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @gregory.szorc on Mar 18, 2020, 21:40

If the implementation isn't based on pathlib.Path, then the user would need to have a different model to reason about resources in subdirectories in packages versus files on a file system. With this approach, there's an underlying subset of behavior that's shared in both cases. Do you have a different interface in mind?

I would propose something that distinguishes between "directories" (which are containers) and files/resources, where the latter cannot cross boundaries from the former.

Essentially, you would have a hierarchy of virtual directories/packages. Each of these would map to an iterable of children virtual directories/packages and files/resources. If traversing a filesystem, you could resolve the available children directories/packages by iterating a directory and classifying each entry as a file or directory.

In API form:

class MetaPathFinder:
    def get_resource_reader(self, fullname) -> ResourceReader
        """Obtain an item that can load resources for a fully qualified Python package."""


class ResourceReader:
    # Holds the string name of the virtual Python package this is a resource loader for.
    # For filesystems, this is effectively a reader for a directory at `fullname.replace('.', '/')`
    fullname = None

    def child_readers(self) -> [ResourceReader]
        """Obtain an iterable of ResourceReader for available child virtual packages of this one.

        On filesystems, this essentially returns instances corresponding to immediate child directories.
        """

    def resources(self) -> [str]
        """Obtain available named resources for this virtual package.

        On filesystems, this essentially returns files in the current directory.
        TODO consider returning a special type that exposes an `open()`, etc.
        """"

    def open_binary(self, resource) -> File
        """Obtain a File-like for a named resource.

        On filesystems, this attempts to open os.path.join(self, resource).

        Attempting to open a non-resource entity (such as a subdirectory) or a missing
        resource raises NotAResourceError.
        """

There are plenty of details to bikeshed (such as whether we need a special type to represent the names of resources to abstract over filesystem encoding oddities). But something like this where this is a clear line between collections of resources and resources themselves is where I would start.

It's not strictly defined, but I'd say through some reasonable exception and consider raising the same exception that pathlib.Path does when opening a directory.

For resource loaders that don't perform filesystem I/O, this possibly is a leaky abstraction, since IOError and OSError are not appropriate here. However, modern pathlib does seem to use exceptions like IsADirectoryError. If we standardized on these special exception types and avoided use of IOError and OSError, I think this would be acceptable to me.

I'm not either. It came late to the party because I found it was necessary to implement open_resource. It's also necessary to get a handle to a potentially large resource (that wouldn't fit in memory performantly). If we wish to drop support for those use-cases, then open could be removed.

I would strongly advocate for the existence of an API that allows incremental resource reading via a File-like interface. Having an open() or something like it is critical to support large files. My issues with open() stem from complexity by exposing additional features. My argument is that very few people will use the ResourceReader and Traverseable APIs directly, instead opting to go through a wrapper in importlib. I argue that we only need to expose the equivalent of open('rb') and wrappers can build remaining functionality, including text conversion.

I tried to document this, but even pathlib.Path.open doesn't do a good job of documenting it. Basically, it should must modes r, rb and optionally w and wb, and it must support encoding params to io.TextIOWrapper. I do all of these things so again the interface is as close as possible to what pathlib.Path supports.

New concern: allowing writable handles. Why? The interface we used to obtain a handle is named ResourceReader. Why does it support writing?

I concede that allowing text mode operations is convenient. And io.TextIOWrapper does make it somewhat trivial to wrap a binary stream. If the API were documented to accept mode and the exact arguments that io.TextIOWrapper does along with the same conventions, I wouldn't complain as much about the existence of an open() that supports text. It's the undefined nature of the behavior today that concerns me.

I'm not opposed to limiting the scope of open(), but I am opposed to returning something that conflicts with the behavior of a pathlib.Path object. That is, I'd like to be able to return a pathlib.Path instance for file-system resources... and pathlib.Path.open() opens the file in text mode by default. We could make the minimum interface support open(mode='rb') and make all of the other modes (including encoding params) optional.

I think defining the interface in terms of pathlib.Path compatibility is wrong from an API design perspective because it is a leaky abstraction. It leaks that the resources you are describing are backed by files when in fact they may not be. This means that importers not backed by a traditional filesystem (such as PyOxidizer's or zipimporter) are left with a lot of questions around which parts of the pathlib APIs they need to implement and how to implement them when semantics can't be preserved exactly. If you expose the full breadth of pathlib.Path APIs, inevitably some code will start relying on those APIs being present and as soon as that code runs in PyOxidizer or via zipimporter, things will blow up.

I think it is fine to model the interface of the returned object against what pathlib.Path exposes. And implementations could arguably return a pathlib.Path if they wanted. But the documented interface should be the minimal subset of pathlib.Path necessary to provide resource loading functionality and the documentation should clearly say that code relying on returned objects having attributes not in this interface results in undefined behavior. I think that CPython should prevent misuse by returning a custom type and not a pathlib.Path, otherwise code in the real world will unknowingly violate the API contract. Leaky abstractions like pathlib.Path are one of the reasons why non-filesystem importers have such a difficult time achieving compatibility with many packages in the Python ecosystem.

How would you like to proceed? Would you like to draft a proposed change?

I provided a suggested API in this comment that I'd like to bikeshed.

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Mar 19, 2020, 21:52

I would propose something that distinguishes between "directories" (which are containers) and files/resources, where the latter cannot cross boundaries from the former.

This approach runs in conflict with the goal to limit the number of interfaces a Python programmer needs to reason about. If you s/ResourceReader/Traversable/ and s/child_readers/iterdir/ and s/open_binary/open/, you almost have Traversable. You're right that a Traversable combines the aspect of a directory and file into a single construct, but this approach has value. It allows both directories and files to have a first-class representation. For example, with a Traversable file, you can get its siblings through file.parent.iterdir(). ResourceReader provides no such way to relate its contents with just a handle to a resource.

More importantly, a Traversable provides a familiar, shared interface to traverse a tree of resources in a package. Thinking about a simple case of getting a/b/c.txt as text from a package, with ResourceReader, one would need to:

io.TextIOWrapper(next(sub for next(sub for reader.child_readers() if sub.fullname.split('.')[-1] == 'a').child_readers() if sub.fullname.split('.')[-1] == 'b').open_binary('c.txt')).read()

Compare that with:

(reader.files() / 'a' / 'b' / 'c.txt').read_text()

or

reader.files().joinpath('a').joinpath('b').joinpath('c.txt').read_text()

I'm far from convinced there is value in separating directories from resources, especially given that most users have a mental model and reference implementation in pathlib.Path that conflates the two.

New concern: allowing writable handles. Why? The interface we used to obtain a handle is named ResourceReader. Why does it support writing?

You're right, writing is not a concern. That was an early consideration that's no longer a consideration.

Defining the interface in terms of pathlib.Path compatibility is wrong from an API design perspective because it is a leaky abstraction. It leaks that the resources you are describing are backed by files when in fact they may not be.

The directory tree abstraction works very well for what resources are trying to expose. Previously, importlib demanded that all resources be "files", and the main deficiency was that it didn't support "subdirectories". That's the abstraction the users are looking for. There's no expectation that the resources are backed by a file system, but they should be represented by directories of those resources. Why not call them files? zipp.Path is one proven implementation that isn't backed by a filesystem (example of accessing contents over the network).

Can you conceive of a resource provider that would provide the resources in directories where the Traversable interface is a poor match?

I do sympathize. While I was working on zipp.Path, I struggled a bit with the logical mismatch between zip files and a file system (such as how to reliably represent a directory).

the documented interface should be the minimal subset of pathlib.Path exposes

This is the intent of Traversable, and until I realized .open() was necessary, read_text and read_bytes seemed the most basic interface needed to access resource contents. I now see that those methods should probably be built on the abstract implementations, maybe something like this:

index 28596a4..ec7cc2a 100644
--- a/importlib_resources/abc.py
+++ b/importlib_resources/abc.py
@@ -69,17 +69,19 @@ class Traversable(ABC):
         Yield Traversable objects in self
         """
 
-    @abc.abstractmethod
     def read_bytes(self):
         """
         Read contents of self as bytes
         """
+        with self.open('rb') as strm:
+            return strm.read()
 
-    @abc.abstractmethod
     def read_text(self, encoding=None):
         """
         Read contents of self as bytes
         """
+        with self.open('r', encoding=encoding) as strm:
+            return strm.read()
 
     @abc.abstractmethod
     def is_dir(self):
@@ -99,11 +101,11 @@ class Traversable(ABC):
         Return Traversable child in self
         """
 
-    @abc.abstractmethod
     def __truediv__(self, child):
         """
         Return Traversable child in self
         """
+        return self.joinpath(child)
 
     @abc.abstractmethod
     def open(self, mode='r', *args, **kwargs):

If this were the case, then the methods that an implementer would need to implement would be is_dir, is_file, iterdir, joinpath, and open.

Does that help? Can we focus on refining the interface definition for Traversable and ensuring it suits the needs of PyOxidizer?

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @gregory.szorc on Mar 22, 2020, 14:21

I'm not arguing that having a Path-like API for manipulating resources isn't a user-friendly API: it is. Instead, the crux of my argument is a) actually use Path (or any type is pathlib) is a bad idea because it is a leaky abstraction b) failing to distinguish between containers (directories) and items (files) makes implementation more difficult and computationally complex at run-time the farther away you get from a data store backed by a traditional filesystem (as is the case with PyOxidizer).

Path and other types from pathlib should absolutely not be exposed from this interface because their surface area is too large for implementations that aren't backed by a traditional filesystem. If you want to define an interface that quacks like Path and exposed methods like joinpath(), open(), and __truediv__, that's perfectly fine. Just as long as the interface is as tightly defined/constrained as possible.

With all due respect, custom implementations that use zip files are still effectively backed by filesystems because zip files (or tar archives) contain entry metadata similar to what a regular filesystem would expose. So you just aren't going to see the kinds of abstract problems I'm worried about if zip files are your test subject.

I think a better test subject is something like PyOxidizer, which wants to back resources in memory by a Rust HashMap<String, Vec<u8>> - a dict({str: bytes}) in Python speak. The following concrete problems arise with this:

  1. Resource name encoding. What exactly is the allowed text encoding of resource names? Is any sequence of bytes allowed? Must resource names be UTF-8? If resources are discovered from the filesystem, filesystem encoding is at play, greatly complicating what can happen here. How exactly should PyOxidizer discover resources on the filesystem and translate those filenames to bytes (for storage in the binary) and then re-expose them to Python's resources interface later? How do we ensure compatibility with Python's default implementation? What if the default filesystem path encoding is different between machines? e.g. if you produce a binary containing resources on a Windows machine using English and run that binary on a Windows machine using Japanese where the filesystem encoding is different?

  2. How do we efficiently index resources? In a filesystem (or zip file), you don't maintain an index explicitly because the filesystem is the index (or the zip file already has metadata). If you are loading resources from memory, what exactly is your index? Do you maintain a global HashMap<String, Vec<u8>> of all resources? Do you maintain a per-package/loader HashMap<String, Vec<u8>> of resources anchored to a Python package root? Different approaches have different trade-offs. A global HashMap would require a full key scan and str.starts_with() lookup on each key in order to resolve children resources. This would be grossly inefficient if we had say 100,000 resources. If we maintained separate HashMap for each package/loader, we would have to potentially scan multiple HashMap to locate resources. This is because the resources API does not forbid crossing directory boundaries. For example, if you are trying to resolve the resource foo/bar/baz/resource.txt, you could access the resource via foo as bar/baz/resource.txt, via foo.bar as baz/resource.txt, or via foo.bar.baz as resource.txt. If you wanted to resolve children resources of foo, we would need to iterate through HashMap for foo, foo.bar, and foo.bar.baz. And this would introduce complexity just to discover what child packages exist! (Do you traverse all known package keys and do a starts_with() or do you maintain a tree-like data structure?)

From the perspective of a resource reader that isn't backed by anything resembling a traditional filesystem, to keep things simple we need:

  1. Clear restrictions on allowed encoding for resource names and/or how encodings should be normalized when files on the filesystem are turned into abstract strings/paths.
  2. Restrictions on crossing boundaries between resource readers. e.g. remove the feature that N resource readers for different Python packages can obtain a handle on the same resource. e.g. you can't use foo's resource reader to access foo/bar/resource.txt when there exists a foo.bar package. i.e. set clear guidance on ownership of resources. The current importlib.resources APIs in Python 3.8 get this correct by stating that resource names must not contain path separators. Unfortunately, the new proposed Traversable interface regresses this by exposing the full power of paths.

I would encourage you to implement a custom resource reader as follows:

  1. The set of known resources is initially discovered by walking the filesystem. Discovered files are turned into Python data structures. (e.g. a dict mapping strings/paths to bytes representing each resource)
  1. I/O cannot be performed as part of servicing any ResourceReader API: you must use your custom index to implement the APIs.
  2. Care about performance and implementation complexity.

I think if you attempt to do this with the new Traversable interface, you will find it more complex than with the resources interfaces as defined by Python 3.8. This is mainly because the power of Path exposes too much complexity and ambiguity. Don't get me wrong, I like the semantics of Path from a user perspective. But behavior needs to be tightly constrained to keep implementation complexity and performance in check.

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @brettcannon on Mar 23, 2020, 19:27

In regards to resource name encoding, import itself doesn't take an opinion on that, so in general that doesn't have an answer in Python.

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @gregory.szorc on Mar 23, 2020, 21:31

In regards to resource name encoding, import itself doesn't take an opinion on that, so in general that doesn't have an answer in Python.

To be nuanced, it is undefined behavior. And that is arguably an opinion in of itself.

In reality, the importer has imposed restrictions on the use of encodings:

  • PyImport_Inittab defines module names in terms of char * and the builtin importer C code appears to .encode('utf-8') the Python str defining the module name.
  • importlib._bootstrap_external appears to use string manipulation in the domain of str to define paths to module files. This goes through an implicit fsencode() when I/O is performed. Different machines can experience different results when attempting to import str module values that map to different byte sequences when hitting an I/O API.
  • And more!

And in case you are wondering, Python is capable of importing non-ascii module names from the filesystem:

$ cat こんにちは.py
print('こんにちは from the python file')
$ python3.8 -m こんにちは
こんにちは from the python file

This is a cool feature. But the engineer in me wants module and resource names to be limited to a subset of ASCII because any code point that encodes differently depending on the filesystem encoding is potential for non-determinism and non-portable packages. But the human in me empathizes with non-English speakers (I wouldn't be surprised if non-English speakers used language/locale native module names for some projects) and there can be legitimate reasons for having non-ASCII names. e.g. even as an English speaker I have encountered many test files with filenames outside ASCII that exist to facilitate encoding testing.

But this encoding ship has sailed a long time ago and probably can't be reined in. PyOxidizer will likely normalize resource names to UTF-8 and this seems reasonable to me. Any additional restrictions imposed by Python would be desirable from a simplicity and compatibility perspective, but aren't strictly required.

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Mar 24, 2020, 10:43

All good considerations. Thanks for writing them up. I do think that the zipfile defies your expectation, as it presents only a flat list of entries for the entire zip file. There's no tree structure and there's no index, only a pile of files. I believe it falls prey to all of the implementation concerns you raise, including performance concerns, many of which were addressed after the release of Python 3.8 (example).

I'm warming up to the idea that importlib resources could solicit a lower-level interface from resource readers and also provide a wrapper for that interface that implements Traversable over that lower-level interface (as opposed to the status quo, in which each resource provider must implement the full Traversable interface).

Needed: Clear restrictions on allowed encoding for resource names and/or how encodings should be normalized when files on the filesystem are turned into abstract strings/paths.... Engineer in me wants module and resource names to be limited to a subset of ASCII.

Let's defer this concern for later. It's an anti-goal for me to overly constrain resource names. In particular, I want to aim to support Unicode text wherever possible, even if that means there are edge cases where non-ASCII won't be supported. I went through a similar experience with the path package. I acknowledge that some operating systems and container formats allow creating files or file-like objects with names that don't reliably decode to Unicode text. I consider those use-cases unsupported, and I'm certainly unwilling to compromise the functionality here to accommodate those environments without an extremely compelling use-case. Practicality beats purity here.

Needed: Restrictions on crossing boundaries between resource readers.

I'm not sure this restriction is needed. It's already the case that python module files are resources of its containing package, so why would it be necessary or desirable to restrict access to packages?

I suggest we start with a modest proposal to create the lower-level interface similar to your proposal above and we can deal with constraining that interface in a separate discussion/proposal.

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Mar 24, 2020, 11:25

mentioned in commit 1394980

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Mar 24, 2020, 11:36

mentioned in merge request !95

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Mar 24, 2020, 11:37

In the feature/90-low-level-reader branch, I've taken Gregory's proposed low-level interface, implemented as SimpleReader and drafted ResourceHandle, ResourceContainer, and TraversableReader, built on it. In !95, I've created a WIP MR to discuss the concept. I'm eager to hear your feedback.

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @brettcannon on Mar 24, 2020, 12:53

(I wouldn't be surprised if non-English speakers used language/locale native module names for some projects)

I can solve that potential lack of surprise by telling you people do. 😄 I brought this up because there's a bug filed against importlib about how we don't attempt to normalize names to NKFD or any other specific, normalized form for comparison. So I didn't mean to imply there weren't any restrictions (sorry about that if my comment read that way), just that it isn't fully defined to the point that there's no potential for conflicts and I didn't want you caught off-guard with PyOxidizer by that.

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Mar 30, 2020, 16:47

I'm eager to get feedback on !95 if you have time to review.

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @gregory.szorc on Apr 5, 2020, 11:56

I left some feedback on !95. Overall it is looking good!

FWIW I recently made a number of tweaks to PyOxidizer's resource loading code. You may find the following documentation useful to peruse to better understand some of the challenges PyOxidizer faces:

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Jun 13, 2020, 11:29

mentioned in merge request !103

@jaraco
Copy link
Member Author

jaraco commented Jan 19, 2021

@indygreg The 5.1.0 release includes the importlib_resources.simple module, which contains the adapters from #183. My thinking on this is the adapter should be something that could be employed by any provider that wishes to implement the SimpleReader interface on their reader. My expectation is that most providers won't bother with it, so I'm unsure if it should even be ported to CPython (or remain in a third-party package). Please give it a try and let me know how it works. If it's going to stick around, I'll want to write some tests for it.

@jaraco
Copy link
Member Author

jaraco commented May 4, 2021

I do want to say, with respect to the leaky abstraction in the files protocol, I do see that as a tradeoff between simplicity and correctness. A correct implementation would indeed be to wrap or transform existing zipfile.Path and pathlib.Path objects into something that only implements the required protocol, but that added complexity is likely to introduce bugs and additional challenges. Moreover, the increasing adoption of type-checkers should help protect against the leaky abstractions.

Lacking engagement on this issue, I'm going to close it without prejudice. Happy to revisit later.

@jaraco jaraco closed this as completed May 4, 2021
@septatrix
Copy link

Is this API supposed to superseed the existing {open,read}_{binary,text} methods? It feels like they cover the exact same surface or are there any places where one is able to do more/less than the other one¹? Also in the CPython docs the Traversable(Reader) are not documented as to what subset of pathlib methods they support.

¹ Based on some comments on gitlab I suspect that the files() API is able to grant access to resources in directories without an __init__.py file but that does not seem to be explicitly stated or documented anywhere...

@jaraco
Copy link
Member Author

jaraco commented May 11, 2021

My intention/expectation is for the files() API to supersede the other API functions (see #80). I haven't yet invested the time in deprecating the legacy API. The changelog for 1.1 advertises some of these concerns.

Agreed, if docs are unclear or could be enhanced, I'd welcome those changes. One challenge is that the docs for importlib_resources and importlib.resources are not synced, so any changes should ideally be applied to both.

Regarding Traversable, I'd recommend linking to the source rather than trying to duplicate the interface definition in documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants