Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to reference files in the local filesystem with file:// s… #11569

Merged

Conversation

jcar87
Copy link
Contributor

@jcar87 jcar87 commented Jul 4, 2022

Changelog: Feature: Add ability to download files in the local filesystem using file:// URIs.
Docs: conan-io/docs#2635
Close #8846

Description

It has been requested be able to download local files with tools.download() and tools.get() (#8846 and conan-io/conan-center-index#10623).

This draft PR gives conan.tools.files.download() (which is used by .get()) the ability to resolve URLs that start with file:// to files in the local filesystem. In that scenario, the URL provided will be converted to a filesystem path, and shutil.copy() will be used to copy the file into the destination.

The underling implementation uses urllib functionality to perform this conversion, rather than any internal logic. It should be noted that while the file:// URI scheme is specified by RFC 8089, different platforms, applications and implementations may behave slightly differently (see discussion here).

Note that it is possible for a file:// uri to point to resources on a remote system - this implementation makes an assumption that the path returned by the urllib library can be accessed without any additional handling.
On windows this may not cover all cases, see: https://docs.microsoft.com/en-us/dotnet/api/system.uri.isunc?view=net-6.0

Note that with this proposed implementation the downloads cache is skipped - under the assumption that the files to retrieve are already in the local filesystem. This may not always be true, since it's possible to locally mount network resources - happy to try and look into making this interact with the downloads cache if necessary!

pyfakefs allows mocking a filesystem for python functions

@jcar87 jcar87 requested review from lasote and czoido July 4, 2022 14:24
auth=auth, headers=headers, md5=md5, sha1=sha1, sha256=sha256)
if file_url.startswith('file:'):
filepath = _path_from_file_uri(file_url)
shutil.copyfile(filepath, filename)
Copy link
Contributor Author

@jcar87 jcar87 Jul 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still wonder if this is the best place to do it - we'd have to ignore retry, retry_wait, auth and headers, but still have to check the checksums and overwrite! (working on it!)

The best alternative would be: conans/client/downloaders/file_downloader.py

@@ -1,6 +1,7 @@
pytest>=6.1.1, <7.0.0; python_version > '3.0'
pytest>=4.6.11; python_version < '3.0'
pytest-xdist # To launch in N cores with pytest -n
pyfakefs>=4.5.6
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say definitely not worth the complexities and problems for using more third parties, for this very specific use case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn't realize it was requirements_dev.txt. No problem with this one, if it really helps for the test (if there is an easy way to do the test without it, better, of course)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, just used for testing!

I exposes an in-memory filesystem to tests such that python functions that operate on the filesystem read from this instead.

For the purposes of this test, writing a temp file on the actual file system and reading back is also an option. I wanted to a be a little more explicit making sure that paths like file:///C:/path/to/file.txt are resolved properly on Windows - but I think if we used the existing temp_folder() functionality in the test, the paths would be implicitly platform-appropriate anyway, so we would achieve the same testing. Let me know if this is preferred!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say yes, we already do a ton of temp_folder() in our tests suite, and I think this could perfectly work, please give it a try.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Still does the job :D

def download(self, url, file_path, md5=None, sha1=None, sha256=None, **kwargs):

file_origin = self._path_from_file_uri(url)
copyfile(file_origin, file_path)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note here: shutil.copyfile() will overwrite the file if it already exists - I've not added any logic to handle that case. From what I can see, in the newer implementation we are defaulting to overwrite=True always, and the overwrite argument is no longer exposed.
See here: https://github.com/conan-io/conan/blob/develop/conan/tools/files/files.py#L151

But I can see the file downloader does interpret this flag:
https://github.com/conan-io/conan/blob/develop/conans/client/downloaders/file_downloader.py#L46-L53

If there are plans to make overwrites configurable (via conf like with the other options), then it's worth implementing now

@jcar87 jcar87 marked this pull request as ready for review July 6, 2022 12:19
@memsharded memsharded added this to the 1.51 milestone Jul 6, 2022
Copy link
Member

@memsharded memsharded left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good

Comment on lines 166 to 167
with open(file_location, 'w') as textfile:
textfile.write('this is some content\n')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
with open(file_location, 'w') as textfile:
textfile.write('this is some content\n')
save(file_location, "this is some content\n")

We have one line helpers for this.

dest = os.path.join(temp_folder(), "downloaded_file.txt")
download(conanfile, file_url, dest, md5=file_md5)
content = load(dest)
assert "this is some content" == content.rstrip()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert "this is some content" == content.rstrip()
assert "this is some content" == content

The rstrip() is probably unnecessary, and risks hiding a bug somewhere. The file must be identical

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed it by changing the md5 that is checked above (as it stands, given the md5 check, this is probably redundant).

Issue was that most command line text editors and the shell itself will append a newline character, so all of my attempts to get the md5 sum of a string will give me the md5 sum of a string ending with the newline character. Given that a few lines above we have control of the exact byte-wise contents of the file, I think it's safe to remove the rstrip()

@memsharded memsharded removed the request for review from lasote July 7, 2022 10:38
@czoido czoido merged commit b20b72b into conan-io:develop Jul 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[feature] Support local paths in conans.tools.get
3 participants