Better SEO for 10 Minutes to Dask (#9182)

dask · Jun 24, 2022 · 3c87d4e · 3c87d4e
1 parent aa801de
commit 3c87d4e
Show file tree

Hide file tree

Showing 4 changed files with 78 additions and 43 deletions.
diff --git a/docs/requirements-docs.txt b/docs/requirements-docs.txt
@@ -7,6 +7,7 @@ sphinx-remove-toctrees
 sphinx_autosummary_accessors
 sphinx-tabs
 sphinx-design
+jupyter_sphinx
 toolz
 cloudpickle>=1.5.0
 pandas>=1.4.0

diff --git a/docs/source/10-minutes-to-dask.rst b/docs/source/10-minutes-to-dask.rst
@@ -1,10 +1,19 @@
 10 Minutes to Dask
 ==================
 
-This is a short overview of what you can do with Dask. It is geared towards new users.
+.. meta::
+    :description: This is a short overview of Dask geared towards new users. Additional Dask information can be found in the rest of the Dask documentation.
+
+This is a short overview of Dask geared towards new users.
 There is much more information contained in the rest of the documentation.
 
-We normally import dask as follows:
+.. figure:: images/dask-overview.svg
+   :alt: Dask overview. Dask is composed of three parts: collections, task graphs, and schedulers.
+   :align: center
+
+   High level collections are used to generate task graphs which can be executed by schedulers on a single machine or a cluster.
+
+We normally import Dask as follows:
 
 .. code-block:: python
 
@@ -17,16 +26,18 @@ We normally import dask as follows:
 
 Based on the type of data you are working with, you might not need all of these.
 
-Create a High-Level Collection
-------------------------------
+Creating a Dask Object
+----------------------
 
-You can make a Dask collection from scratch by supplying existing data and optionally
+You can create a Dask object from scratch by supplying existing data and optionally
 including information about how the chunks should be structured.
 
 .. tabs::
 
    .. group-tab:: DataFrame
 
+      See :doc:`dataframe`.
+
       .. code-block:: python
 
          >>> index = pd.date_range("2021-09-01", periods=2400, freq="1H")
@@ -43,7 +54,7 @@ including information about how the chunks should be structured.
          2021-12-09 23:00:00    ...     ...
          Dask Name: from_pandas, 10 tasks
 
-      Now we have a DataFrame with 2 columns and 2400 rows composed of 10 partitions where
+      Now we have a Dask DataFrame with 2 columns and 2400 rows composed of 10 partitions where
       each partition has 240 rows. Each partition represents a piece of the data.
 
       Here are some key properties of an DataFrame:
@@ -75,30 +86,36 @@ including information about how the chunks should be structured.
 
    .. group-tab:: Array
 
-      .. code-block:: python
+      See :doc:`array`.
 
-         >>> data = np.arange(100_000).reshape(200, 500)
-         ... a = da.from_array(data, chunks=(100, 100))
-         ... a
-         dask.array<array, shape=(200, 500), dtype=int64, chunksize=(100, 100), chunktype=numpy.ndarray>
+      .. jupyter-execute::
+
+         import numpy as np
+         import dask.array as da
+
+         data = np.arange(100_000).reshape(200, 500)
+         a = da.from_array(data, chunks=(100, 100))
+         a
 
       Now we have a 2D array with the shape (200, 500) composed of 10 chunks where
       each chunk has the shape (100, 100). Each chunk represents a piece of the data.
 
-      Here are some key properties of an Array:
+      Here are some key properties of a Dask Array:
 
-      .. code-block:: python
+      .. jupyter-execute::
 
-         >>> # inspect the chunks
-         ... a.chunks
-         ((100, 100), (100, 100, 100, 100, 100))
+         # inspect the chunks
+         a.chunks
 
-         >>> # access a particular block of data
-         ... a.blocks[1, 3]
-         dask.array<blocks, shape=(100, 100), dtype=int64, chunksize=(100, 100), chunktype=numpy.ndarray>
+      .. jupyter-execute::        
+
+         # access a particular block of data
+         a.blocks[1, 3]
 
    .. group-tab:: Bag
 
+      See :doc:`bag`.
+
       .. code-block:: python
 
          >>> b = db.from_sequence([1, 2, 3, 4, 5, 6, 2, 1], npartitions=2)
@@ -112,7 +129,7 @@ including information about how the chunks should be structured.
 Indexing
 --------
 
-Indexing Dask collections feels just like slicing numpy arrays or pandas dataframes.
+Indexing Dask collections feels just like slicing NumPy arrays or pandas DataFrame.
 
 .. tabs::
 
@@ -141,10 +158,9 @@ Indexing Dask collections feels just like slicing numpy arrays or pandas datafra
 
    .. group-tab:: Array
 
-      .. code-block:: python
+    .. jupyter-execute::
 
-         >>> a[:50, 200]
-         dask.array<getitem, shape=(50,), dtype=int64, chunksize=(50,), chunktype=numpy.ndarray>
+       a[:50, 200]
 
    .. group-tab:: Bag
 
@@ -362,13 +378,13 @@ triggering computation, we can inspect the task graph to figure out what's going
          >>> result.dask
          HighLevelGraph with 7 layers.
          <dask.highlevelgraph.HighLevelGraph object at 0x7f129df7a9d0>
-         0. from_pandas-0b850a81e4dfe2d272df4dc718065116
-         1. loc-fb7ada1e5ba8f343678fdc54a36e9b3e
-         2. getitem-55d10498f88fc709e600e2c6054a0625
-         3. series-cumsum-map-131dc242aeba09a82fea94e5442f3da9
-         4. series-cumsum-take-last-9ebf1cce482a441d819d8199eac0f721
-         5. series-cumsum-d51d7003e20bd5d2f767cd554bdd5299
-         6. sub-fed3e4af52ad0bd9c3cc3bf800544f57
+         1. from_pandas-0b850a81e4dfe2d272df4dc718065116
+         2. loc-fb7ada1e5ba8f343678fdc54a36e9b3e
+         3. getitem-55d10498f88fc709e600e2c6054a0625
+         4. series-cumsum-map-131dc242aeba09a82fea94e5442f3da9
+         5. series-cumsum-take-last-9ebf1cce482a441d819d8199eac0f721
+         6. series-cumsum-d51d7003e20bd5d2f767cd554bdd5299
+         7. sub-fed3e4af52ad0bd9c3cc3bf800544f57
 
          >>> result.visualize()
 
@@ -382,12 +398,12 @@ triggering computation, we can inspect the task graph to figure out what's going
          >>> b.dask
          HighLevelGraph with 6 layers.
          <dask.highlevelgraph.HighLevelGraph object at 0x7fd33a4aa400>
-         0. array-ef3148ecc2e8957c6abe629e08306680
-         1. amax-b9b637c165d9bf139f7b93458cd68ec3
-         2. amax-partial-aaf8028d4a4785f579b8d03ffc1ec615
-         3. amax-aggregate-07b2f92aee59691afaf1680569ee4a63
-         4. getitem-f9e225a2fd32b3d2f5681070d2c3d767
-         5. add-f54f3a929c7efca76a23d6c42cdbbe84
+         1. array-ef3148ecc2e8957c6abe629e08306680
+         2. amax-b9b637c165d9bf139f7b93458cd68ec3
+         3. amax-partial-aaf8028d4a4785f579b8d03ffc1ec615
+         4. amax-aggregate-07b2f92aee59691afaf1680569ee4a63
+         5. getitem-f9e225a2fd32b3d2f5681070d2c3d767
+         6. add-f54f3a929c7efca76a23d6c42cdbbe84
 
          >>> b.visualize()
 
@@ -401,9 +417,9 @@ triggering computation, we can inspect the task graph to figure out what's going
          >>> c.dask
          HighLevelGraph with 3 layers.
          <dask.highlevelgraph.HighLevelGraph object at 0x7f96d0814fd0>
-         0. from_sequence-cca2a33ba6e12645a0c9bc0fd3fe6c88
-         1. lambda-93a7a982c4231fea874e07f71b4bcd7d
-         2. zip-474300792cc4f502f1c1f632d50e0272
+         1. from_sequence-cca2a33ba6e12645a0c9bc0fd3fe6c88
+         2. lambda-93a7a982c4231fea874e07f71b4bcd7d
+         3. zip-474300792cc4f502f1c1f632d50e0272
 
          >>> c.visualize()
 
@@ -419,7 +435,7 @@ run into code that is parallelizable, but isn't just a big DataFrame or array.
 
    .. group-tab:: Delayed: Lazy
 
-      Dask Delayed let you to wrap individual function calls into a lazily constructed task graph:
+      :doc:`delayed` lets you to wrap individual function calls into a lazily constructed task graph:
 
       .. code-block:: python
 
@@ -442,7 +458,7 @@ run into code that is parallelizable, but isn't just a big DataFrame or array.
    .. group-tab:: Futures: Immediate
 
       Unlike the interfaces described so far, Futures are eager. Computation starts as soon
-      as the function is submitted.
+      as the function is submitted (see :doc:`futures`).
 
       .. code-block:: python
 
@@ -471,7 +487,8 @@ run into code that is parallelizable, but isn't just a big DataFrame or array.
 Scheduling
 ----------
 
-After you have generated a task graph, it is the scheduler's job to execute it.
+After you have generated a task graph, it is the scheduler's job to execute it
+(see :doc:`scheduling`).
 
 By default when you call ``compute`` on a Dask object, Dask uses the thread
 pool on your computer to run computations in parallel.

diff --git a/docs/source/_static/style.css b/docs/source/_static/style.css
@@ -6,4 +6,20 @@
 .classifier::before {
     content: ": ";
 }
-
+
+/* options for jupyter-sphinx extension */
+div.jupyter_container {
+    box-shadow: None;
+    font-family: var(--pst-font-family-monospace);
+    border-radius: 0.4em;
+}
+
+.jupyter_container div.code_cell {
+    padding: 10px;
+    max-width: None !important;
+}
+
+.jupyter_container .output {
+    font-size: 16px;
+    padding: 10px
+}
diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -47,6 +47,7 @@
     "sphinx_remove_toctrees",
     "IPython.sphinxext.ipython_console_highlighting",
     "IPython.sphinxext.ipython_directive",
+    "jupyter_sphinx",
     "sphinx_copybutton",
     "sphinx_design",
 ]