Skip to content
This repository has been archived by the owner on Apr 6, 2023. It is now read-only.

Add code and tests + supporting files #1

Merged
merged 15 commits into from Oct 11, 2022
Merged

Add code and tests + supporting files #1

merged 15 commits into from Oct 11, 2022

Conversation

mariosasko
Copy link
Contributor

Add the filesystem implementation with additional tweaks (cp_file, rm_file, etc.) needed for the datasets integration.

Also, add/update supporting files to align the repo with the existing HF projects.

@mariosasko mariosasko requested a review from lhoestq July 29, 2022 12:15
Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woohoo love it !

src/hffs/spec.py Outdated Show resolved Hide resolved
tests/test_spec.py Outdated Show resolved Hide resolved
README.md Show resolved Hide resolved
@lhoestq lhoestq mentioned this pull request Sep 15, 2022
src/hffs/spec.py Outdated Show resolved Hide resolved
@mariosasko
Copy link
Contributor Author

I've addressed the comments and made additional improvements to make the code cleaner/more performant. Let me know what you think!

I'm still trying to figure out if I can avoid the temp file creation in write mode in HfFile because this goes against fsspec's "philosophy". For instance, simplecache already uses a temp file for caching remote writes, so in our case, this would mean creating 2 temp files and occupying 2x space.

Having thought about it, I agree it's OK to depend on huggingface_hub for simplicity. With this in mind, some things that would be cool to have in huggingface_hub (cc @Wauplin) to simplify the code further:

  • the endpoint param in hf_hub_url to support custom endpoints
  • an op for efficient copying of repo files in the commit API (advanced use-case)

@Wauplin
Copy link
Contributor

Wauplin commented Sep 26, 2022

@mariosasko I did not have the time to review the PR yet but about your two last points:

the endpoint param in hf_hub_url to support custom endpoints

Yes, would definitely be good to have that. Could you create an issue for that please ?

an op for efficient copying of repo files in the commit API (advanced use-case)

Hmm, I see the purpose in hffs since we need to comply to the specification. However as you said it seems a very niche use-case. You can still create an issue for it but since it would require some changes in moon-landing and that it's not really a priority, it is most likely that we won't implement it anytime soon. I would say the sub-optimal solution where the file is copied locally twice is not too bad, hoping that no-one uses it for large files.

Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, thanks for working on it @mariosasko !

I have added comments, most of them been about stuff that can be leverage from huggingface_hub. If would require huggingface_hub>=0.10.0 instead of huggingface_hub>=0.9.0 if that's fine.

Makefile Outdated Show resolved Hide resolved
src/hffs/fs.py Outdated Show resolved Hide resolved
src/hffs/fs.py Outdated Show resolved Hide resolved
src/hffs/fs.py Outdated Show resolved Hide resolved
src/hffs/fs.py Outdated Show resolved Hide resolved
src/hffs/fs.py Show resolved Hide resolved
src/hffs/fs.py Outdated Show resolved Hide resolved
src/hffs/fs.py Outdated Show resolved Hide resolved
src/hffs/fs.py Outdated Show resolved Hide resolved
src/hffs/fs.py Outdated Show resolved Hide resolved
@lhoestq
Copy link
Member

lhoestq commented Oct 7, 2022

Hi ! Any news on this ? @severo and I would like to use it to store datasets in Parquet format on the Hub under git references (for the datasets-server project and to showcase how to get a arbitrary slices of data)

Happy to help if I can :)

src/hffs/fs.py Outdated Show resolved Hide resolved
src/hffs/fs.py Outdated Show resolved Hide resolved
@mariosasko
Copy link
Contributor Author

@Wauplin @lhoestq Let me know if everything looks good now

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks ! looks all good :)

src/hffs/fs.py Outdated Show resolved Hide resolved
mariosasko and others added 2 commits October 11, 2022 16:16
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
Add newline to pyproject.toml
Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good ! Thanks for making the changes :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants