Skip to content

Commit

Permalink
Update how to guides (#840)
Browse files Browse the repository at this point in the history
* πŸ“ clarify how to download and upload files

* πŸ– finish section on create/manage a repo

* πŸ– add sign-in section to repo manage guide

* ✨ update inference api section

* πŸ– apply omar review

* πŸ– format links to functions

* πŸ– more review

* πŸ– fix toctree
  • Loading branch information
stevhliu committed Apr 27, 2022
1 parent cab4152 commit dabcc83
Show file tree
Hide file tree
Showing 6 changed files with 279 additions and 258 deletions.
33 changes: 18 additions & 15 deletions docs/source/_toctree.yml
Expand Up @@ -5,25 +5,28 @@
title: Quick start
title: "Get started"
- sections:
- local: how-to-manage
title: Create and manage repositories
- local: how-to-downstream
title: How to download files from the hub
title: Download files from the Hub
- local: how-to-upstream
title: How to upload files to the hub
title: Upload files to the Hub
- local: searching-the-hub
title: Searching the Hub
- local: how-to-inference
title: How to programmatically access the Inference API
title: Access the Inference API
title: "Guides"
- sections:
- local: package_reference/repository
title: Managing local and online repositories
- local: package_reference/hf_api
title: Hugging Face Hub API
- local: package_reference/file_download
title: Downloading files
- local: package_reference/mixins
title: Mixins & serialization methods
- local: package_reference/logging
title: Logging
title: "Reference"

- local: package_reference/repository
title: Managing local and online repositories
- local: package_reference/hf_api
title: Hugging Face Hub API
- local: package_reference/file_download
title: Downloading files
- local: package_reference/mixins
title: Mixins & serialization methods
- local: package_reference/inference_api
title: Inference API
- local: package_reference/logging
title: Logging
title: "Reference"
98 changes: 59 additions & 39 deletions docs/source/how-to-downstream.mdx
@@ -1,53 +1,62 @@
---
title: How to download files from the Hub
---
# Download files from the Hub

# How to integrate downstream utilities in your library
The `huggingface_hub` library provides functions to download files from the repositories
stored on the Hub. You can use these functions independently or integrate them into your
own library, making it more convenient for your users to interact with the Hub. This
guide will show you how to:

Utilities that allow your library to download files from the Hub are referred to as *downstream* utilities. This guide introduces additional downstream utilities you can integrate with your library, or use separately on their own. You will learn how to:

* Retrieve a URL to download.
* Download a file and cache it on your disk.
* Specify a file to download from the Hub.
* Download and cache a file on your disk.
* Download all the files in a repository.

## hf_hub_url

Use [`hf_hub_url`] to retrieve the URL of a specific file to download by providing a `filename`.
## Choose a file to download

![/docs/assets/hub/repo.png](/docs/assets/hub/repo.png)
Use the `filename` parameter in the [`hf_hub_url`] function to retrieve the URL of a
specific file to download:

```python
>>> from huggingface_hub import hf_hub_url
>>> hf_hub_url(repo_id="lysandre/arxiv-nlp", filename="config.json")
'https://huggingface.co/lysandre/arxiv-nlp/resolve/main/config.json'
```

Specify a particular file version by providing the file revision. The file revision can be a branch, a tag, or a commit hash.
![/docs/assets/hub/repo.png](/docs/assets/hub/repo.png)

When using the commit hash, it must be the full-length hash instead of a 7-character commit hash:
Specify a particular file version by providing the file revision, which can be the
branch name, a tag, or a commit hash. When using the commit hash, it must be the
full-length hash instead of a 7-character commit hash:

```python
>>> hf_hub_url(repo_id="lysandre/arxiv-nlp", filename="config.json", revision="877b84a8f93f2d619faa2a6e514a32beef88ab0a")
>>> hf_hub_url(repo_id="lysandre/arxiv-nlp",
... filename="config.json",
... revision="877b84a8f93f2d619faa2a6e514a32beef88ab0a",
... )
'https://huggingface.co/lysandre/arxiv-nlp/resolve/877b84a8f93f2d619faa2a6e514a32beef88ab0a/config.json'
```

[`hf_hub_url`] can also use the branch name to specify a file revision:
To specify a file revision with the branch name:

```python
hf_hub_url(repo_id="lysandre/arxiv-nlp", filename="config.json", revision="main")
>>> hf_hub_url(repo_id="lysandre/arxiv-nlp", filename="config.json", revision="main")
```

Specify a file revision with a tag identifier. For example, if you want `v1.0` of the `config.json` file:
To specify a file revision with a tag identifier. For example, if you want `v1.0` of the
`config.json` file:

```python
hf_hub_url(repo_id="lysandre/arxiv-nlp", filename="config.json", revision="v1.0")
>>> hf_hub_url(repo_id="lysandre/arxiv-nlp", filename="config.json", revision="v1.0")
```

## cached_download
## Download and store a file

[`cached_download`] is useful for downloading and caching a file on your local disk. Once stored in your cache, you don't have to redownload the file the next time you use it. [`cached_download`] is a hands-free solution for staying up to date with new file versions. When a downloaded file is updated in the remote repository, [`cached_download`] will automatically download and store it for you.
[`cached_download`] is used to download and cache a file on your local disk. Once a file
is stored in your cache, you don't have to redownload it the next time you use it.
[`cached_download`] is a hands-free solution for staying up to date with new file
versions. When a downloaded file is updated in the remote repository,
[`cached_download`] will automatically download and store it.

Begin by retrieving your file URL with [`hf_hub_url`], and then pass the specified URL to [`cached_download`] to download the file:
Begin by retrieving the file URL with [`hf_hub_url`], and then pass the specified URL to
[`cached_download`] to download the file:

```python
>>> from huggingface_hub import hf_hub_url, cached_download
Expand All @@ -56,16 +65,20 @@ Begin by retrieving your file URL with [`hf_hub_url`], and then pass the specifi
'/home/lysandre/.cache/huggingface/hub/bc0e8cc2f8271b322304e8bb84b3b7580701d53a335ab2d75da19c249e2eeebb.066dae6fdb1e2b8cce60c35cc0f78ed1451d9b341c78de19f3ad469d10a8cbb1'
```

[`hf_hub_url`] and [`cached_download`] work hand in hand to download a file. This is precisely how [`hf_hub_download`] from the tutorial works! [`hf_hub_download`] is simply a wrapper that calls both [`hf_hub_url`] and [`cached_download`].
[`hf_hub_url`] and [`cached_download`] work hand-in-hand to download a file. This is
such a standard workflow that [`hf_hub_download`] is a wrapper that calls both of these
functions.

```python
>>> from huggingface_hub import hf_hub_download
>>> hf_hub_download(repo_id="lysandre/arxiv-nlp", filename="config.json")
```

## snapshot_download
## Download an entire repository

[`snapshot_download`] downloads an entire repository at a given revision. Like [`cached_download`], all downloaded files are cached on your local disk. However, even if only a single file is updated, the entire repository will be redownloaded.
[`snapshot_download`] downloads an entire repository at a given revision. Like
[`cached_download`], all downloaded files are cached on your local disk. However, even
if only a single file is updated, the entire repository will be redownloaded.

Download a whole repository as shown in the following:

Expand All @@ -75,20 +88,27 @@ Download a whole repository as shown in the following:
'/home/lysandre/.cache/huggingface/hub/lysandre__arxiv-nlp.894a9adde21d9a3e3843e6d5aeaaf01875c7fade'
```

[`snapshot_download`] downloads the latest revision by default. If you want a specific repository revision, use the `revision` parameter as shown with [`hf_hub_url`].
[`snapshot_download`] downloads the latest revision by default. If you want a specific
repository revision, use the `revision` parameter:

```python
>>> from huggingface_hub import snapshot_download
>>> snapshot_download(repo_id="lysandre/arxiv-nlp", revision="main")
```

In general, it is usually better to manually download files with [`hf_hub_download`] (if you already know the file name) to avoid re-downloading an entire repository. [`snapshot_download`] is helpful when your library's downloading utility is a helper, and unaware of which files need to be downloaded.
In general, it is usually better to download files with [`hf_hub_download`] - if you
already know the file name - to avoid redownloading an entire repository.
[`snapshot_download`] is helpful when you are unaware of which files to download.

However, you don't always want to download the contents of an entire repository with
[`snapshot_download`]. Even if you don't know the file name, you can download specific
files if you know the file type with `allow_regex` and `ignore_regex`. Use the
`allow_regex` and `ignore_regex` arguments to specify which files to download. These
parameters accept either a single regex or a list of regexes.

However, you don't want to always download the contents of an entire repository with [`snapshot_download`]. Even if you don't know the file name and only know the file type, you can download specific files with `allow_regex` and `ignore_regex`.
Use the `allow_regex` and `ignore_regex` arguments to specify
which files to download.
`allow_regex` and `ignore_regex` accept either a single regex or a list of regexes.
The regex matching is based on [`fnmatch`](https://docs.python.org/3/library/fnmatch.html) which means it provides support for Unix shell-style wildcards.
The regex matching is based on
[`fnmatch`](https://docs.python.org/3/library/fnmatch.html), which provides support for
Unix shell-style wildcards.

For example, you can use `allow_regex` to only download JSON configuration files:

Expand All @@ -97,17 +117,17 @@ For example, you can use `allow_regex` to only download JSON configuration files
>>> snapshot_download(repo_id="lysandre/arxiv-nlp", allow_regex="*.json")
```

On the other hand, `ignore_regex` can be used to exclude certain files from being downloaded. The following example ignores the `.msgpack` and `.h5` file extensions:
or `.h5` extensions, you could make use of `ignore_regex`:
On the other hand, `ignore_regex` can exclude certain files from being downloaded. The
following example ignores the `.msgpack` and `.h5` file extensions:

```python
>>> from huggingface_hub import snapshot_download
>>> snapshot_download(repo_id="lysandre/arxiv-nlp", ignore_regex=["*.msgpack", "*.h5"])
```

Passing a regex can be especially useful when repositories contain files that
are never expected to be downloaded by [`snapshot_download`].
Passing a regex can be especially useful when repositories contain files that are never
expected to be downloaded by [`snapshot_download`].

Note that passing `allow_regex` or `ignore_regex` does **not** prevent
[`snapshot_download`] from re-downloading the entire model repository if an ignored
file is changed.
Note that passing `allow_regex` or `ignore_regex` does **not** prevent
[`snapshot_download`] from redownloading the entire model repository if an ignored file
is changed.
21 changes: 12 additions & 9 deletions docs/source/how-to-inference.mdx
@@ -1,23 +1,23 @@
---
title: How to programmatically access the Inference API
---
# Access the Inference API

# How to programmatically access the Inference API
The Inference API provides fast inference for your hosted models. The Inference API can be accessed via usual HTTP requests with your favorite programming language, but the `huggingface_hub` library has a client wrapper to access the Inference API programmatically. This guide will show you how to make calls to the Inference API with the `huggingface_hub` library.

The Inference API provides fast inference for your hosted models. The Inference API can be accessed via usual HTTP requests with your favorite programming languages, but the `huggingface_hub` library has a client wrapper to access the Inference API programmatically. This guide will show you how to make calls to the Inference API with the `huggingface_hub` library.
<Tip>

**If you want to make the HTTP calls directly, please refer to [Accelerated Inference API Documentation](https://api-inference.huggingface.co/docs/python/html/index.html) or to the sample snippets visible on every supported model page.**
If you want to make the HTTP calls directly, please refer to [Accelerated Inference API Documentation](https://api-inference.huggingface.co/docs/python/html/index.html) or to the sample snippets visible on every supported model page.

</Tip>

![Snippet of code to make calls to the Inference API](/docs/assets/hub/inference_api_snippet.png)

Begin by creating an instance of the `InferenceApi` with a specific model repository ID. You can find your `API_TOKEN` under Settings from your Hugging Face account. The `API_TOKEN` will allow you to send requests to the Inference API.
Begin by creating an instance of the [`InferenceApi`] with the model repository ID of the model you want to use. You can find your `API_TOKEN` under Settings from your Hugging Face account. The `API_TOKEN` will allow you to send requests to the Inference API.

```python
>>> from huggingface_hub.inference_api import InferenceApi
>>> inference = InferenceApi(repo_id="bert-base-uncased", token=API_TOKEN)
```

The pipeline is determined from the metadata in the model card and configuration files (see [here](https://huggingface.co/docs/hub/main#how-is-a-models-type-of-inference-api-and-widget-determined) for more details). For example, when using the [bert-base-uncased](https://huggingface.co/bert-base-uncased) model, the Inference API can automatically infer that this model should be used for a `fill-mask` task.
The metadata in the model card and configuration files (see [here](https://huggingface.co/docs/hub/main#how-is-a-models-type-of-inference-api-and-widget-determined) for more details) determines the pipeline type. For example, when using the [bert-base-uncased](https://huggingface.co/bert-base-uncased) model, the Inference API can automatically infer that this model should be used for a `fill-mask` task.

```python
>>> from huggingface_hub.inference_api import InferenceApi
Expand Down Expand Up @@ -48,5 +48,8 @@ Some tasks may require additional parameters (see [here](https://api-inference.h
Some models may support multiple tasks. The `sentence-transformers` models can complete both `sentence-similarity` and `feature-extraction` tasks. Specify which task you want to perform with the `task` parameter:

```python
>>> inference = InferenceApi(repo_id="paraphrase-xlm-r-multilingual-v1", task="feature-extraction", token=API_TOKEN)
>>> inference = InferenceApi(repo_id="paraphrase-xlm-r-multilingual-v1",
... task="feature-extraction",
... token=API_TOKEN,
... )
```

0 comments on commit dabcc83

Please sign in to comment.