Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update how to guides #840

Merged
merged 10 commits into from Apr 27, 2022
10 changes: 7 additions & 3 deletions docs/source/_toctree.yml
@@ -1,14 +1,16 @@
- sections:
- local: index
title: 🤗 Hugging Face Hub Python Wrapper
- local: how-to-manage
title: Create and manage repositories
- local: how-to-downstream
title: How to download files from the hub
title: Download files from the Hub
- local: how-to-upstream
title: How to upload files to the hub
title: Upload files to the Hub
- local: searching-the-hub
title: Searching the Hub
- local: how-to-inference
title: How to programmatically access the Inference API
title: Access the Inference API
title: "Guides"
- sections:
- local: package_reference/repository
Expand All @@ -19,6 +21,8 @@
title: Downloading files
- local: package_reference/mixins
title: Mixins & serialization methods
- local: package_reference/inference_api
title: Inference API
- local: package_reference/logging
title: Logging
title: "Reference"
Expand Down
46 changes: 20 additions & 26 deletions docs/source/how-to-downstream.mdx
@@ -1,30 +1,24 @@
---
title: How to download files from the Hub
---
# Download files from the Hub

# How to integrate downstream utilities in your library
The `huggingface_hub` library provides functions to download files from the repositories stored on the Hub. You can use these functions independently or integrate them into your own library so it is more convenient for your users to interact with the Hub. This guide will show you how to:

Utilities that allow your library to download files from the Hub are referred to as *downstream* utilities. This guide introduces additional downstream utilities you can integrate with your library, or use separately on their own. You will learn how to:

* Retrieve a URL to download.
* Download a file and cache it on your disk.
* Specify a file to download from the Hub.
* Download and cache a file on your disk.
* Download all the files in a repository.

## hf_hub_url

Use [`hf_hub_url`] to retrieve the URL of a specific file to download by providing a `filename`.
## Choose a file to download

![/docs/assets/hub/repo.png](/docs/assets/hub/repo.png)
Use the `filename` parameter in the [`hf_hub_url`] function to retrieve the URL of a specific file to download:

```python
>>> from huggingface_hub import hf_hub_url
>>> hf_hub_url(repo_id="lysandre/arxiv-nlp", filename="config.json")
'https://huggingface.co/lysandre/arxiv-nlp/resolve/main/config.json'
```

Specify a particular file version by providing the file revision. The file revision can be a branch, a tag, or a commit hash.
![/docs/assets/hub/repo.png](/docs/assets/hub/repo.png)

When using the commit hash, it must be the full-length hash instead of a 7-character commit hash:
Specify a particular file version by providing the file revision, which can be a branch, a tag, or a commit hash. When using the commit hash, it must be the full-length hash instead of a 7-character commit hash:

```python
>>> hf_hub_url(repo_id="lysandre/arxiv-nlp", filename="config.json", revision="877b84a8f93f2d619faa2a6e514a32beef88ab0a")
Expand All @@ -34,18 +28,18 @@ When using the commit hash, it must be the full-length hash instead of a 7-chara
[`hf_hub_url`] can also use the branch name to specify a file revision:

```python
hf_hub_url(repo_id="lysandre/arxiv-nlp", filename="config.json", revision="main")
>>> hf_hub_url(repo_id="lysandre/arxiv-nlp", filename="config.json", revision="main")
```

Specify a file revision with a tag identifier. For example, if you want `v1.0` of the `config.json` file:
You can also specify a file revision with a tag identifier. For example, if you want `v1.0` of the `config.json` file:

```python
hf_hub_url(repo_id="lysandre/arxiv-nlp", filename="config.json", revision="v1.0")
>>> hf_hub_url(repo_id="lysandre/arxiv-nlp", filename="config.json", revision="v1.0")
```

## cached_download
## Download and store a file

[`cached_download`] is useful for downloading and caching a file on your local disk. Once stored in your cache, you don't have to redownload the file the next time you use it. [`cached_download`] is a hands-free solution for staying up to date with new file versions. When a downloaded file is updated in the remote repository, [`cached_download`] will automatically download and store it for you.
[`cached_download`] is used to download and cache a file on your local disk. Once a file is stored in your cache, you don't have to redownload the file the next time you use it. [`cached_download`] is a hands-free solution for staying up to date with new file versions. When a downloaded file is updated in the remote repository, [`cached_download`] will automatically download and store it.

Begin by retrieving your file URL with [`hf_hub_url`], and then pass the specified URL to [`cached_download`] to download the file:

Expand All @@ -56,14 +50,14 @@ Begin by retrieving your file URL with [`hf_hub_url`], and then pass the specifi
'/home/lysandre/.cache/huggingface/hub/bc0e8cc2f8271b322304e8bb84b3b7580701d53a335ab2d75da19c249e2eeebb.066dae6fdb1e2b8cce60c35cc0f78ed1451d9b341c78de19f3ad469d10a8cbb1'
```

[`hf_hub_url`] and [`cached_download`] work hand in hand to download a file. This is precisely how [`hf_hub_download`] from the tutorial works! [`hf_hub_download`] is simply a wrapper that calls both [`hf_hub_url`] and [`cached_download`].
[`hf_hub_url`] and [`cached_download`] work hand-in-hand to download a file. This is precisely how `hf_hub_download` from the tutorial works! `hf_hub_download` is simply a wrapper that calls both `hf_hub_url` and `cached_download`.
stevhliu marked this conversation as resolved.
Show resolved Hide resolved

```python
>>> from huggingface_hub import hf_hub_download
>>> hf_hub_download(repo_id="lysandre/arxiv-nlp", filename="config.json")
```

## snapshot_download
## Download an entire repository

[`snapshot_download`] downloads an entire repository at a given revision. Like [`cached_download`], all downloaded files are cached on your local disk. However, even if only a single file is updated, the entire repository will be redownloaded.

Expand All @@ -75,19 +69,20 @@ Download a whole repository as shown in the following:
'/home/lysandre/.cache/huggingface/hub/lysandre__arxiv-nlp.894a9adde21d9a3e3843e6d5aeaaf01875c7fade'
```

[`snapshot_download`] downloads the latest revision by default. If you want a specific repository revision, use the `revision` parameter as shown with [`hf_hub_url`].
[`snapshot_download`] downloads the latest revision by default. If you want a specific repository revision, use the `revision` parameter:

```python
>>> from huggingface_hub import snapshot_download
>>> snapshot_download(repo_id="lysandre/arxiv-nlp", revision="main")
```

In general, it is usually better to manually download files with [`hf_hub_download`] (if you already know the file name) to avoid re-downloading an entire repository. [`snapshot_download`] is helpful when your library's downloading utility is a helper, and unaware of which files need to be downloaded.
In general, it is usually better to download files with [`hf_hub_download`] - if you already know the file name - to avoid re-downloading an entire repository. [`snapshot_download`] is helpful when you are unaware of which files need to be downloaded.

However, you don't want to always download the contents of an entire repository with [`snapshot_download`]. Even if you don't know the file name and only know the file type, you can download specific files with `allow_regex` and `ignore_regex`.
However, you don't want to always download the contents of an entire repository with [`snapshot_download`]. Even if you don't know the file name, you can download specific files if you know the file type with `allow_regex` and `ignore_regex`.
Use the `allow_regex` and `ignore_regex` arguments to specify
which files to download.
`allow_regex` and `ignore_regex` accept either a single regex or a list of regexes.
These parameters accept either a single regex or a list of regexes.

The regex matching is based on [`fnmatch`](https://docs.python.org/3/library/fnmatch.html) which means it provides support for Unix shell-style wildcards.

For example, you can use `allow_regex` to only download JSON configuration files:
Expand All @@ -98,7 +93,6 @@ For example, you can use `allow_regex` to only download JSON configuration files
```

On the other hand, `ignore_regex` can be used to exclude certain files from being downloaded. The following example ignores the `.msgpack` and `.h5` file extensions:
or `.h5` extensions, you could make use of `ignore_regex`:

```python
>>> from huggingface_hub import snapshot_download
Expand Down
8 changes: 2 additions & 6 deletions docs/source/how-to-inference.mdx
@@ -1,16 +1,12 @@
---
title: How to programmatically access the Inference API
---

# How to programmatically access the Inference API
# Access the Inference API

The Inference API provides fast inference for your hosted models. The Inference API can be accessed via usual HTTP requests with your favorite programming languages, but the `huggingface_hub` library has a client wrapper to access the Inference API programmatically. This guide will show you how to make calls to the Inference API with the `huggingface_hub` library.

**If you want to make the HTTP calls directly, please refer to [Accelerated Inference API Documentation](https://api-inference.huggingface.co/docs/python/html/index.html) or to the sample snippets visible on every supported model page.**

![Snippet of code to make calls to the Inference API](/docs/assets/hub/inference_api_snippet.png)

Begin by creating an instance of the `InferenceApi` with a specific model repository ID. You can find your `API_TOKEN` under Settings from your Hugging Face account. The `API_TOKEN` will allow you to send requests to the Inference API.
Begin by creating an instance of the [`InferenceApi`] with the model repository ID of the model you want to use. You can find your `API_TOKEN` under Settings from your Hugging Face account. The `API_TOKEN` will allow you to send requests to the Inference API.

```python
>>> from huggingface_hub.inference_api import InferenceApi
Expand Down
146 changes: 146 additions & 0 deletions docs/source/how-to-manage.mdx
@@ -0,0 +1,146 @@
# Create and manage a repository

A repository is a space for you to store your model or dataset files. This guide will show you how to:

* Create and delete a repository.
* Adjust repository visibility.
* Use the [`Repository`] class for common Git operations like clone, branch, push, etc.

If you want to create a repository on the Hub, you need to log in to your Hugging Face account:

1. Log in to your Hugging Face account with the following command:

```bash
huggingface-cli login
```

2. Alternatively, if you prefer working from a Jupyter or Colaboratory notebook, login with [`notebook_login`]:

```python
>>> from huggingface_hub import notebook_login
>>> notebook_login()
```

[`notebook_login`] will launch a widget in your notebook from which you can enter your Hugging Face credentials.

## Create a repository

Create an empty repository with [`create_repo`] and give it a name with the `repo_id` parameter. The `repo_id` is your namespace followed by the repository name: `{username_or_org}/{repo_name}`.

```python
>>> from huggingface_hub import create_repo
>>> create_repo("lysandre/test-model")
'https://huggingface.co/lysandre/test-model'
```

By default, [`create_repo`] creates a model repository. But you can use the `repo_type` parameter to specify another repository type. For example, if you want to create a dataset repository:

```py
>>> from huggingface_hub import create_repo
>>> create_repo("lysandre/test-dataset", repo_type="dataset")
'https://huggingface.co/lysandre/test-dataset'
```

When creating a repository, you also have the option to set your repository visibility with the `private` parameter. For example, if you want to create a private repository:

```py
>>> from huggingface_hub import create_repo
>>> create_repo("lysandre/test-private", private=True)
```

If you want to change the repository visibility at a later time, you can use the [`update_repo_visibility`] function.

## Delete a repository

Delete a repository with [`delete_repo`]. Make sure you are certain you want to delete a repository because this is an irreversible process!

Specify the `repo_id` of the repository you want to delete:

```py
>>> delete_repo(repo_id=name)
```

You can also specify the repository type to delete by adding the `repo_type` parameter:

```python
>>> delete_repo(repo_id=REPO_NAME, repo_type="dataset")
```

## Change repository visibility

A repository can be public or private. A private repository is only visible to you or members of the organization in which the repository is located. Change a repository to private as shown in the following:

```python
>>> from huggingface_hub import update_repo_visibility
>>> update_repo_visibility(name=REPO_NAME, private=True)
```

## The Repository class

The [`Repository`] class allows you to interact with files and repositories on the Hub with functions similar to `git` commands. [`Repository`] is a wrapper over Git and Git-LFS methods, so make sure you have Git-LFS installed (see [here](https://git-lfs.github.com/) for installation instructions) and set up before you begin. With [`Repository`], you can use the Git commands you already know and love.

### Use a local repository

Instantiate a [`Repository`] object with a path to a local repository:

```python
>>> from huggingface_hub import Repository
>>> repo = Repository(local_dir="<path>/<to>/<folder>")
```

### Clone

The `clone_from` parameter clones a repository from a Hugging Face repository ID to a local directory specified by the `local_dir` argument:

```python
>>> repo = Repository(local_dir="w2v2", clone_from="facebook/wav2vec2-large-960h-lv60")
```

`clone_from` can also clone a repository from a specified directory using a URL (if you are working offline, this parameter should be `None`):

```python
>>> repo = Repository(local_dir="huggingface-hub", clone_from="https://huggingface.co/facebook/wav2vec2-large-960h-lv60")
```

You can combine the `clone_from` parameter with [`create_repo`] to create and clone a repository:

```python
>>> repo_url = create_repo(repo_id="repo_name")
>>> repo = Repository(local_dir="repo_local_path", clone_from=repo_url)
```

When you clone a repository, you can also attribute a Git username and email to a cloned repository by specifying the `git_user` and `git_email` parameters. When users commit to that repository, Git will be aware of the commit author.

```python
>>> repo = Repository(
... "my-dataset",
... clone_from="<user>/<dataset_id>",
... use_auth_token=True,
... repo_type="dataset",
... git_user="MyName",
... git_email="me@cool.mail"
... )
```

### Branch

Branches are important for collaboration and experimentation without impacting your current files and code. Switch between branches with [`git_checkout`]. For example, if you want to switch from `branch1` to `branch2`:

```python
>>> repo = Repository(local_dir="huggingface-hub", clone_from="<user>/<dataset_id>", revision='branch1')
>>> repo.git_checkout("branch2")
```

### Pull

Pull allows you to update a current local branch with changes from a remote repository:

```python
>>> repo.git_pull()
```

Set `rebase=True` if you want your local commits to occur after your branch is updated with the new commits from the remote:

```python
>>> repo.git_pull(rebase=True)
```