This repository has been archived by the owner on Apr 6, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 6
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add code and tests + supporting files (#1)
* Finish implementation (for now) * Add basic readme * Minor changes in setup.py * Add setup.cfg * Add tests * Add Makefile * Minor README fix * Code improvements * Better tests * Comment out some coode * add zarr example to readme * Lucain's comments * Quentin's comments * Update src/hffs/fs.py Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com> * Update pyproject.toml Add newline to pyproject.toml Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
- Loading branch information
1 parent
d00546d
commit 0c61ed2
Showing
9 changed files
with
627 additions
and
14 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
.PHONY: quality style test | ||
|
||
check_dirs := tests src setup.py | ||
|
||
# Check that source code meets quality standards | ||
|
||
quality: | ||
black --check $(check_dirs) | ||
isort --check-only $(check_dirs) | ||
flake8 $(check_dirs) | ||
|
||
# Format source code automatically | ||
|
||
style: | ||
black $(check_dirs) | ||
isort $(check_dirs) | ||
|
||
# Run tests for the library | ||
|
||
test: | ||
python -m pytest -sv ./tests/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,106 @@ | ||
# HFfs | ||
# `hffs` | ||
|
||
HFfs builds on huggingface_hub to provide a convenient Python filesystem interface for huggingface.co repositories. | ||
`hffs` builds on [`huggingface_hub`](https://github.com/huggingface/huggingface_hub) and [`fsspec`](https://github.com/huggingface/huggingface_hub) to provide a convenient Python filesystem interface to 🤗 Hub. | ||
|
||
## Examples | ||
|
||
Locate and read a file from a 🤗 Hub repo: | ||
|
||
```python | ||
>>> import hffs | ||
>>> fs = hffs.HfFileSystem("my-username/my-dataset-repo", repo_type="dataset") | ||
>>> fs.ls("") | ||
['.gitattributes', 'my-file.txt'] | ||
>>> with fs.open("my-file.txt", "r") as f: | ||
... f.read() | ||
'Hello, world' | ||
``` | ||
|
||
Write a file to the repo: | ||
|
||
```python | ||
>>> with fs.open("my-file-new.txt", "w") as f: | ||
... f.write("Hello, world1") | ||
... f.write("Hello, world2") | ||
>>> fs.exists("my-file-new.txt") | ||
True | ||
>>> fs.du("my-file-new.txt") | ||
26 | ||
``` | ||
|
||
Instantiation via `fsspec`: | ||
|
||
```python | ||
>>> import fsspec | ||
|
||
>>> # Instantiate a `hffs.HfFileSystem` object | ||
>>> fs = fsspec.filesystem("hf://my-username/my-model-repo", repo_type="model") | ||
>>> fs.ls("") | ||
['.gitattributes', 'config.json', 'pytorch_model.bin'] | ||
|
||
>>> # Instantiate a `hffs.HfFileSystem` object and write a file to it | ||
>>> with fsspec.open("hf://my-username/my-dataset-repo:/my-file-new.txt", repo_type="dataset"): | ||
... f.write("Hello, world1") | ||
... f.write("Hello, world2") | ||
``` | ||
|
||
> **Note**: To be recognized as a `hffs` URL, the URL path passed to [`fsspec.open`](https://filesystem-spec.readthedocs.io/en/latest/api.html?highlight=open#fsspec.open) must adhere to the following scheme: | ||
> ``` | ||
> hf://<repo_id>[@<revision>]:/<path/in/repo> | ||
> ``` | ||
# Installation | ||
|
||
```bash | ||
pip install hffs | ||
``` | ||
|
||
## Integrations | ||
|
||
* [`pandas`](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#reading-writing-remote-files)/[`dask`](https://docs.dask.org/en/stable/how-to/connect-to-remote-data.html) | ||
|
||
```python | ||
>>> import pandas as pd | ||
|
||
>>> # Read a remote CSV file into a dataframe | ||
>>> df = pd.read_csv("hf://my-username/my-dataset-repo:/train.csv", storage_options={"repo_type": "dataset"}) | ||
|
||
>>> # Write a dataframe to a remote CSV file | ||
>>> df.to_csv("hf://my-username/my-dataset-repo:/test.csv", storage_options={"repo_type": "dataset"}) | ||
``` | ||
|
||
* [`datasets`](https://huggingface.co/docs/datasets/filesystems#load-and-save-your-datasets-using-your-cloud-storage-filesystem) | ||
|
||
```python | ||
>>> import datasets | ||
|
||
>>> # Export a (large) dataset to a repo | ||
>>> cache_dir = "hf://my-username/my-dataset-repo" | ||
>>> builder = datasets.load_dataset_builder("path/to/local/loading_script/loading_script.py", cache_dir=cache_dir, storage_options={"repo_type": "dataset"}) | ||
>>> builder.download_and_prepare(file_format="parquet") | ||
|
||
>>> # Stream the dataset from the repo | ||
>>> dset = datasets.load_dataset("my-username/my-dataset-repo", split="train") | ||
>>> # Process the examples | ||
>>> for ex in dset: | ||
... ... | ||
``` | ||
|
||
* [`zarr`](https://zarr.readthedocs.io/en/stable/tutorial.html#io-with-fsspec) | ||
|
||
```python | ||
>>> import numpy as np | ||
>>> import zarr | ||
|
||
>>> embeddings = np.random.randn(50000, 1000).astype("float32") | ||
|
||
>>> # Write an array to a repo acting as a remote zarr store | ||
>>> with zarr.open_group("hf://my-username/my-model-repo:/array-store", mode="w", storage_options={"repo_type": "model"}) as root: | ||
... foo = root.create_group("embeddings") | ||
... foobar = foo.zeros('experiment_0', shape=(50000, 1000), chunks=(10000, 1000), dtype='f4') | ||
... foobar[:] = embeddings | ||
|
||
>>> # Read from a remote zarr store | ||
>>> with zarr.open_group("hf://my-username/my-model-repo:/array-store", mode="r", storage_options={"repo_type": "model"}) as root: | ||
... first_row = root["embeddings/experiment_0"][0] | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
[tool.black] | ||
line-length = 119 | ||
target_version = ['py37', 'py38', 'py39', 'py310'] | ||
preview = true |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
[metadata] | ||
license_file = LICENSE | ||
|
||
[isort] | ||
ensure_newline_before_comments = True | ||
force_grid_wrap = 0 | ||
include_trailing_comma = True | ||
line_length = 119 | ||
lines_after_imports = 2 | ||
multi_line_output = 3 | ||
use_parentheses = True | ||
|
||
[flake8] | ||
ignore = E203, E501, W503 | ||
max-line-length = 119 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,5 @@ | ||
from fsspec import AbstractFileSystem | ||
# flake8: noqa | ||
|
||
__version__ = "0.0.1.dev0" | ||
|
||
class HfFileSystem(AbstractFileSystem): | ||
pass | ||
from .fs import HfFileSystem |
Oops, something went wrong.