Update how to guides (#840)

* 📝 clarify how to download and upload files * 🖍 finish section on create/manage a repo * 🖍 add sign-in section to repo manage guide * ✨ update inference api section * 🖍 apply omar review * 🖍 format links to functions * 🖍 more review * 🖍 fix toctree
huggingface · Apr 27, 2022 · dabcc83 · dabcc83
1 parent cab4152
commit dabcc83
Show file tree

Hide file tree

Showing 6 changed files with 279 additions and 258 deletions.
diff --git a/docs/source/_toctree.yml b/docs/source/_toctree.yml
@@ -5,25 +5,28 @@
     title: Quick start
   title: "Get started"
 - sections:
+  - local: how-to-manage
+    title: Create and manage repositories
   - local: how-to-downstream
-    title: How to download files from the hub
+    title: Download files from the Hub
   - local: how-to-upstream
-    title: How to upload files to the hub
+    title: Upload files to the Hub
   - local: searching-the-hub
     title: Searching the Hub
   - local: how-to-inference
-    title: How to programmatically access the Inference API
+    title: Access the Inference API
   title: "Guides"
 - sections:
-  - local: package_reference/repository
-    title: Managing local and online repositories
-  - local: package_reference/hf_api
-    title: Hugging Face Hub API
-  - local: package_reference/file_download
-    title: Downloading files
-  - local: package_reference/mixins
-    title: Mixins & serialization methods
-  - local: package_reference/logging
-    title: Logging
-  title: "Reference"
-
+    - local: package_reference/repository
+      title: Managing local and online repositories
+    - local: package_reference/hf_api
+      title: Hugging Face Hub API
+    - local: package_reference/file_download
+      title: Downloading files
+    - local: package_reference/mixins
+      title: Mixins & serialization methods
+    - local: package_reference/inference_api
+      title: Inference API
+    - local: package_reference/logging
+      title: Logging
+  title: "Reference"
diff --git a/docs/source/how-to-downstream.mdx b/docs/source/how-to-downstream.mdx
@@ -1,53 +1,62 @@
----
-title: How to download files from the Hub
----
+# Download files from the Hub
 
-# How to integrate downstream utilities in your library
+The `huggingface_hub` library provides functions to download files from the repositories
+stored on the Hub. You can use these functions independently or integrate them into your
+own library, making it more convenient for your users to interact with the Hub. This
+guide will show you how to:
 
-Utilities that allow your library to download files from the Hub are referred to as *downstream* utilities. This guide introduces additional downstream utilities you can integrate with your library, or use separately on their own. You will learn how to:
-
-* Retrieve a URL to download.
-* Download a file and cache it on your disk.
+* Specify a file to download from the Hub.
+* Download and cache a file on your disk.
 * Download all the files in a repository.
 
-## hf_hub_url
-
-Use [`hf_hub_url`] to retrieve the URL of a specific file to download by providing a `filename`.
+## Choose a file to download
 
-![/docs/assets/hub/repo.png](/docs/assets/hub/repo.png)
+Use the `filename` parameter in the [`hf_hub_url`] function to retrieve the URL of a
+specific file to download:
 
 ```python
 >>> from huggingface_hub import hf_hub_url
 >>> hf_hub_url(repo_id="lysandre/arxiv-nlp", filename="config.json")
 'https://huggingface.co/lysandre/arxiv-nlp/resolve/main/config.json'
 ```
 
-Specify a particular file version by providing the file revision. The file revision can be a branch, a tag, or a commit hash.
+![/docs/assets/hub/repo.png](/docs/assets/hub/repo.png)
 
-When using the commit hash, it must be the full-length hash instead of a 7-character commit hash:
+Specify a particular file version by providing the file revision, which can be the
+branch name, a tag, or a commit hash. When using the commit hash, it must be the
+full-length hash instead of a 7-character commit hash:
 
 ```python
->>> hf_hub_url(repo_id="lysandre/arxiv-nlp", filename="config.json", revision="877b84a8f93f2d619faa2a6e514a32beef88ab0a")
+>>> hf_hub_url(repo_id="lysandre/arxiv-nlp", 
+...            filename="config.json", 
+...            revision="877b84a8f93f2d619faa2a6e514a32beef88ab0a",
+... )
 'https://huggingface.co/lysandre/arxiv-nlp/resolve/877b84a8f93f2d619faa2a6e514a32beef88ab0a/config.json'
 ```
 
-[`hf_hub_url`] can also use the branch name to specify a file revision:
+To specify a file revision with the branch name:
 
 ```python
-hf_hub_url(repo_id="lysandre/arxiv-nlp", filename="config.json", revision="main")
+>>> hf_hub_url(repo_id="lysandre/arxiv-nlp", filename="config.json", revision="main")
 ```
 
-Specify a file revision with a tag identifier. For example, if you want `v1.0` of the `config.json` file:
+To specify a file revision with a tag identifier. For example, if you want `v1.0` of the
+`config.json` file:
 
 ```python
-hf_hub_url(repo_id="lysandre/arxiv-nlp", filename="config.json", revision="v1.0")
+>>> hf_hub_url(repo_id="lysandre/arxiv-nlp", filename="config.json", revision="v1.0")
 ```
 
-## cached_download
+## Download and store a file
 
-[`cached_download`] is useful for downloading and caching a file on your local disk. Once stored in your cache, you don't have to redownload the file the next time you use it. [`cached_download`] is a hands-free solution for staying up to date with new file versions. When a downloaded file is updated in the remote repository, [`cached_download`] will automatically download and store it for you.
+[`cached_download`] is used to download and cache a file on your local disk. Once a file
+is stored in your cache, you don't have to redownload it the next time you use it.
+[`cached_download`] is a hands-free solution for staying up to date with new file
+versions. When a downloaded file is updated in the remote repository,
+[`cached_download`] will automatically download and store it.
 
-Begin by retrieving your file URL with [`hf_hub_url`], and then pass the specified URL to [`cached_download`] to download the file:
+Begin by retrieving the file URL with [`hf_hub_url`], and then pass the specified URL to
+[`cached_download`] to download the file:
 
 ```python
 >>> from huggingface_hub import hf_hub_url, cached_download
@@ -56,16 +65,20 @@ Begin by retrieving your file URL with [`hf_hub_url`], and then pass the specifi
 '/home/lysandre/.cache/huggingface/hub/bc0e8cc2f8271b322304e8bb84b3b7580701d53a335ab2d75da19c249e2eeebb.066dae6fdb1e2b8cce60c35cc0f78ed1451d9b341c78de19f3ad469d10a8cbb1'
 ```
 
-[`hf_hub_url`] and [`cached_download`] work hand in hand to download a file. This is precisely how [`hf_hub_download`] from the tutorial works! [`hf_hub_download`] is simply a wrapper that calls both [`hf_hub_url`] and [`cached_download`].
+[`hf_hub_url`] and [`cached_download`] work hand-in-hand to download a file. This is
+such a standard workflow that [`hf_hub_download`] is a wrapper that calls both of these
+functions.
 
 ```python
 >>> from huggingface_hub import hf_hub_download
 >>> hf_hub_download(repo_id="lysandre/arxiv-nlp", filename="config.json")
 ```
 
-## snapshot_download
+## Download an entire repository
 
-[`snapshot_download`] downloads an entire repository at a given revision. Like [`cached_download`], all downloaded files are cached on your local disk. However, even if only a single file is updated, the entire repository will be redownloaded.
+[`snapshot_download`] downloads an entire repository at a given revision. Like
+[`cached_download`], all downloaded files are cached on your local disk. However, even
+if only a single file is updated, the entire repository will be redownloaded.
 
 Download a whole repository as shown in the following:
 
@@ -75,20 +88,27 @@ Download a whole repository as shown in the following:
 '/home/lysandre/.cache/huggingface/hub/lysandre__arxiv-nlp.894a9adde21d9a3e3843e6d5aeaaf01875c7fade'
 ```
 
-[`snapshot_download`] downloads the latest revision by default. If you want a specific repository revision, use the `revision` parameter as shown with [`hf_hub_url`].
+[`snapshot_download`] downloads the latest revision by default. If you want a specific
+repository revision, use the `revision` parameter:
 
 ```python
 >>> from huggingface_hub import snapshot_download
 >>> snapshot_download(repo_id="lysandre/arxiv-nlp", revision="main")
 ```
 
-In general, it is usually better to manually download files with [`hf_hub_download`] (if you already know the file name) to avoid re-downloading an entire repository. [`snapshot_download`] is helpful when your library's downloading utility is a helper, and unaware of which files need to be downloaded.
+In general, it is usually better to download files with [`hf_hub_download`] - if you
+already know the file name - to avoid redownloading an entire repository.
+[`snapshot_download`] is helpful when you are unaware of which files to download.
+
+However, you don't always want to download the contents of an entire repository with
+[`snapshot_download`]. Even if you don't know the file name, you can download specific
+files if you know the file type with `allow_regex` and `ignore_regex`. Use the
+`allow_regex` and `ignore_regex` arguments to specify which files to download. These
+parameters accept either a single regex or a list of regexes. 
 
-However, you don't want to always download the contents of an entire repository with [`snapshot_download`]. Even if you don't know the file name and only know the file type, you can download specific files with `allow_regex` and `ignore_regex`.
-Use the `allow_regex` and `ignore_regex` arguments to specify 
-which files to download.
-`allow_regex` and `ignore_regex` accept either a single regex or a list of regexes. 
-The regex matching is based on [`fnmatch`](https://docs.python.org/3/library/fnmatch.html) which means it provides support for Unix shell-style wildcards.
+The regex matching is based on
+[`fnmatch`](https://docs.python.org/3/library/fnmatch.html), which provides support for
+Unix shell-style wildcards.
 
 For example, you can use `allow_regex` to only download JSON configuration files:
 
@@ -97,17 +117,17 @@ For example, you can use `allow_regex` to only download JSON configuration files
 >>> snapshot_download(repo_id="lysandre/arxiv-nlp", allow_regex="*.json")
 ```
 
-On the other hand, `ignore_regex` can be used to exclude certain files from being downloaded. The following example ignores the `.msgpack` and `.h5` file extensions:
-or `.h5` extensions, you could make use of `ignore_regex`:
+On the other hand, `ignore_regex` can exclude certain files from being downloaded. The
+following example ignores the `.msgpack` and `.h5` file extensions:
 
 ```python
 >>> from huggingface_hub import snapshot_download
 >>> snapshot_download(repo_id="lysandre/arxiv-nlp", ignore_regex=["*.msgpack", "*.h5"])
 ```
 
-Passing a regex can be especially useful when repositories contain files that 
-are never expected to be downloaded by [`snapshot_download`].
+Passing a regex can be especially useful when repositories contain files that are never
+expected to be downloaded by [`snapshot_download`].
 
-Note that passing `allow_regex` or `ignore_regex` does **not** prevent 
-[`snapshot_download`] from re-downloading the entire model repository if an ignored
-file is changed.
+Note that passing `allow_regex` or `ignore_regex` does **not** prevent
+[`snapshot_download`] from redownloading the entire model repository if an ignored file
+is changed.
diff --git a/docs/source/how-to-inference.mdx b/docs/source/how-to-inference.mdx
@@ -1,23 +1,23 @@
----
-title: How to programmatically access the Inference API
----
+# Access the Inference API
 
-# How to programmatically access the Inference API
+The Inference API provides fast inference for your hosted models. The Inference API can be accessed via usual HTTP requests with your favorite programming language, but the `huggingface_hub` library has a client wrapper to access the Inference API programmatically. This guide will show you how to make calls to the Inference API with the `huggingface_hub` library.
 
-The Inference API provides fast inference for your hosted models. The Inference API can be accessed via usual HTTP requests with your favorite programming languages, but the `huggingface_hub` library has a client wrapper to access the Inference API programmatically. This guide will show you how to make calls to the Inference API with the `huggingface_hub` library.
+<Tip>
 
-**If you want to make the HTTP calls directly, please refer to [Accelerated Inference API Documentation](https://api-inference.huggingface.co/docs/python/html/index.html) or to the sample snippets visible on every supported model page.**
+If you want to make the HTTP calls directly, please refer to [Accelerated Inference API Documentation](https://api-inference.huggingface.co/docs/python/html/index.html) or to the sample snippets visible on every supported model page.
+
+</Tip>
 
 ![Snippet of code to make calls to the Inference API](/docs/assets/hub/inference_api_snippet.png)
 
-Begin by creating an instance of the `InferenceApi` with a specific model repository ID. You can find your `API_TOKEN` under Settings from your Hugging Face account. The `API_TOKEN` will allow you to send requests to the Inference API.
+Begin by creating an instance of the [`InferenceApi`] with the model repository ID of the model you want to use. You can find your `API_TOKEN` under Settings from your Hugging Face account. The `API_TOKEN` will allow you to send requests to the Inference API.
 
 ```python
 >>> from huggingface_hub.inference_api import InferenceApi
 >>> inference = InferenceApi(repo_id="bert-base-uncased", token=API_TOKEN)
 ```
 
-The pipeline is determined from the metadata in the model card and configuration files (see [here](https://huggingface.co/docs/hub/main#how-is-a-models-type-of-inference-api-and-widget-determined) for more details). For example, when using the [bert-base-uncased](https://huggingface.co/bert-base-uncased) model, the Inference API can automatically infer that this model should be used for a `fill-mask` task.
+The metadata in the model card and configuration files (see [here](https://huggingface.co/docs/hub/main#how-is-a-models-type-of-inference-api-and-widget-determined) for more details) determines the pipeline type. For example, when using the [bert-base-uncased](https://huggingface.co/bert-base-uncased) model, the Inference API can automatically infer that this model should be used for a `fill-mask` task.
 
 ```python
 >>> from huggingface_hub.inference_api import InferenceApi
@@ -48,5 +48,8 @@ Some tasks may require additional parameters (see [here](https://api-inference.h
 Some models may support multiple tasks. The `sentence-transformers` models can complete both `sentence-similarity` and `feature-extraction` tasks. Specify which task you want to perform with the `task` parameter:
 
 ```python
->>> inference = InferenceApi(repo_id="paraphrase-xlm-r-multilingual-v1", task="feature-extraction", token=API_TOKEN)
+>>> inference = InferenceApi(repo_id="paraphrase-xlm-r-multilingual-v1", 
+...                          task="feature-extraction", 
+...                          token=API_TOKEN,
+... )
 ```