Skip to content

Commit

Permalink
cmd-ref: plots: flexible plots docs
Browse files Browse the repository at this point in the history
  • Loading branch information
pared committed Jul 1, 2022
1 parent e4aa6bc commit ac87c80
Show file tree
Hide file tree
Showing 17 changed files with 272 additions and 70 deletions.
4 changes: 2 additions & 2 deletions content/docs/command-reference/plots/diff.md
Expand Up @@ -122,11 +122,11 @@ file:///Users/usr/src/dvc_plots/index.html
Compare two specific versions (commit hashes, tags, or branches):

```cli
$ dvc plots diff HEAD 0135527 --targets logs.csv
$ dvc plots diff HEAD^ 0135527 --targets logs.csv
file:///Users/usr/src/dvc_plots/index.html
```

![](/img/plots_diff.svg)
![](/img/plots_diff_two_revs.svg)

## Example: Confusion matrix

Expand Down
258 changes: 216 additions & 42 deletions content/docs/command-reference/plots/index.md
@@ -1,6 +1,6 @@
# plots

A set of commands to visualize and compare _plot metrics_:
A set of commands to visualize and compare _plot data_:
[show](/doc/command-reference/plots/show),
[diff](/doc/command-reference/plots/diff), and
[modify](/doc/command-reference/plots/modify).
Expand All @@ -12,46 +12,37 @@ usage: dvc plots [-h] [-q | -v] {show,diff,modify} ...
positional arguments:
COMMAND
show Generate plot from a metrics file.
diff Plot differences in metrics between commits.
modify Modify display properties of data-series plots (has no effect on image-type plots).
show Generate plots from target files or plots definitions from `dvc.yaml` file.
diff Show multiple versions of plot data by plotting them in a single image.
modify Modify display properties of data-series plot outputs (has no effect on image-type plots).
```

## Types of metrics

DVC has two concepts for metrics, that represent different results of machine
learning training or data processing:

1. `dvc metrics` represent **scalar numbers** such as AUC, _true positive rate_,
etc.
2. `dvc plots` can be used to visualize **data series** such as AUC curves, loss
functions, confusion matrices, etc.

## Description

DVC provides a set of commands to visualize certain metrics of machine learning
experiments as plots. Usual plot examples are AUC curves, loss functions,
confusion matrices, among others.
DVC provides a set of commands to visualize data produced by machine learning
experiments. Usual plot examples are AUC curves, loss functions, confusion
matrices, among others.

This type of metrics files are created by users, or generated by user data
processing code, and can be defined in `dvc.yaml` (`plots` field) for tracking
(optional).
This type of data is created by users, or generated by user data processing
code.

DVC can work with two types of plots files:
## Types of plots

`dvc plots` is able to visualize two types of data:

1. Data series files, which can be JSON, YAML, CSV or TSV.
2. Image files in JPEG, GIF, or PNG format.

DVC generates plots as static HTML webpages that can be open with a web browser.
They can also be saved as SVG or PNG image files from the browser.
DVC generates visualizations as static HTML webpages that can be open with a web
browser. They can also be saved as SVG or PNG image files from the browser.

Data-series plots utilize [Vega-Lite](https://vega.github.io/vega-lite/) for
rendering (declarative JSON grammar for defining graphics). Image-type plots are
rendered using `<img>` tags directly.
rendering (declarative JSON grammar for defining graphics). Images are rendered
using `<img>` tags directly.

## Supported file formats

Image-type plots are included in HTML as-is, without additional processing.
Images are included in HTML as-is, without additional processing.

> We recommend to track these source image files with DVC instead of Git, to
> prevent the repository from bloating.
Expand Down Expand Up @@ -96,7 +87,44 @@ names in the `train` array below:
}
```

## Plot templates (data series only)
## Definining a plot

In order to create visualizations, users need to provide the data and
(optionally) configuration that will help customize the plot. DVC provides two
ways to configure visualizations. Users can mark specific <abbr>stage</abbr>
<abbr>outputs</abbr> as plot or define plot configuration inside `dvc.yaml`
under `plots` key.

## Plots definitions

Plots defined in `dvc.yaml` are especially useful when users want to compare
data from differend data sources residing on the same version of the project.
For example, comparing training versus test results on current branch.

### Syntax

In order to define the plot users need to provide data and configuration for the
plot. The plots should be defined in `dvc.yaml` file under `plots` key. Refer to
the [examples](/doc/command-reference/plots#example-simple-plot-definition) for
more syntax insight.

```yaml
# dvc.yaml
stages: ...

plots: ...
```

## Plot outputs

When using `dvc run` or `dvc stage add`, instead of using
`--outs/--outs-no-cache` particular outputs can be marked with
`--plots/--plots-no-cache`. This will tell DVC that they are intended for
visualizations. This special type of outputs might come in hand if users want to
visually compare experiments results with other experiments versions. For
example, comparing new experiment with the baseline version of the project.

## Plot templates (data-series only)

Users have the ability to change the way data-series plots are displayed by
modifying the [Vega-Lite specification](https://vega.github.io/vega-lite/), thus
Expand Down Expand Up @@ -165,7 +193,7 @@ header (first row) are equivalent to field names.

- `<DVC_METRIC_Y_LABEL>` (optional) - field name to display as the X axis label

## HTML templates
## Custom HTML templates

It's possible to supply an HTML file to `dvc plot show` and `dvc plot diff` by
using the the `--html-template` option. This allows you to customize the
Expand All @@ -189,18 +217,14 @@ this feature to render DVC plots without an Internet connection, below.

## Example: Tabular data

We'll use tabular metrics file `logs.csv` for this example:
We'll use tabular data file `logs.csv` for this example:

```
epoch,accuracy,loss,val_accuracy,val_loss
0,0.9418667,0.19958884770199656,0.9679,0.10217399864746257
1,0.9763333,0.07896138601688048,0.9768,0.07310650711813942
2,0.98375,0.05241111190887168,0.9788,0.06665669009438716
3,0.98801666,0.03681169906261687,0.9781,0.06697812260198989
4,0.99111664,0.027362171787042946,0.978,0.07385754839298315
5,0.9932333,0.02069501801203781,0.9771,0.08009233058886166
6,0.9945,0.017702101902437668,0.9803,0.07830339228538505
7,0.9954,0.01396906608727198,0.9802,0.07247738889862157
epoch,loss,accuracy
1,0.19,0.81
2,0.11,0.89
3,0.07,0.93
4,0.04,0.96
```

Let's plot the last column (default behavior):
Expand All @@ -222,10 +246,10 @@ file:///Users/usr/src/dvc_plots/index.html

![](/img/plots_diff.svg)

Visualize a specific field:
Visualize a specific field (`loss`) as y. Use `epoch` as x:

```dvc
$ dvc plots show -y loss logs.csv
$ dvc plots show logs.csv -y loss -x epoch
file:///Users/usr/src/dvc_plots/index.html
```

Expand All @@ -234,7 +258,7 @@ file:///Users/usr/src/dvc_plots/index.html
## Example: Smooth plot

In some cases we would like to smooth our plot. In this example we will use a
plot with 1000 data points:
noisy plot with 100 data points:

```dvc
$ dvc plots show data.csv
Expand Down Expand Up @@ -280,7 +304,157 @@ file:///Users/usr/src/dvc_plots/index.html
![](/img/plots_show_confusion.svg)

> A confusion matrix [template](/doc/command-reference/plots#plot-templates) is
> predefined in DVC (found in `.dvc/plots/confusion.json`).
> predefined in DVC.
We can use `confusion_normalized` template to normalize the results:

```dvc
$ dvc plots show classes.csv --template confusion_normalized -x actual -y predicted
file:///Users/usr/src/dvc_plots/index.html
```

![](/img/plots_show_confusion_normalized.svg)

## Example: simple plot definition

Let's get back to the `logs.csv` data:

```
# logs.csv
epoch,loss,accuracy
1,0.19,0.81
2,0.11,0.89
3,0.07,0.93
4,0.04,0.96
```

Minimal plot definition we can put in `dvc.yaml` is simply data source path
relative to `dvc.yaml` file:

```yaml
# dvc.yaml
stages:
train:
cmd: echo "Training the model..."

plots:
logs.csv:
```

```dvc
$ dvc plots show
file:///Users/usr/src/dvc_plots/index.html
```

![](/img/plots_show_spec_default.svg)

We can customize it:

```yaml
# dvc.yaml
stages:
train:
cmd: echo "Training the model..."

plots:
logs.csv:
x: epoch
y: accuracy
title: Displaying accuracy
x_label: This is epoch
y_label: This is accuracy
```

```dvc
$ dvc plots show
file:///Users/usr/src/dvc_plots/index.html
```

![](/img/plots_show_spec_simple_custom.svg)

## Example: multiple data-series plot definition:

Data in `training_data.csv`:

```csv
epoch,train_loss,test_loss
1,0.33,0.4
2,0.3,0.28
3,0.2,0.25
4,0.1,0.23
```

```yaml
# dvc.yaml
stages:
train:
cmd: echo "Training the model..."

plots:
test_vs_train_loss:
x: epoch
y:
training_data.csv: [test_loss, train_loss]
title: Compare loss training versus test
```

![](/img/plots_show_spec_multiple_columns.svg)

## Example: sourcing data from different files

Lets prepare comparison for confusion matrix data between test set and training
set:

```csv
# train_classes.csv
actual_class,predicted_class
dog,dog
dog,dog
dog,dog
dog,bird
cat,cat
cat,cat
cat,cat
cat,dog
bird,bird
bird,bird
bird,bird
bird,dog
```

```csv
# test_classes.csv
actual_class,predicted_class
dog,dog
dog,dog
dog,cat
bird,bird
bird,bird
bird,cat
cat,cat
cat,cat
cat,bird
```

```yaml
# dvc.yaml
stages:
train:
cmd: echo "Training the model..."

plots:
test_vs_train_confusion:
x: actual_class
y:
train_classes.csv: predicted_class
test_classes.csv: predicted_class
title: Compare test vs train confusion matrix
template: confusion
x_label: Actual class
y_label: Predicted class
```

![](/img/plots_show_spec_conf_train_test.svg)

## Example: Offline HTML Template

Expand Down
5 changes: 3 additions & 2 deletions content/docs/command-reference/plots/modify.md
@@ -1,10 +1,11 @@
# plots modify

Modify display properties of [plot metrics](/doc/command-reference/plots) files.
Modify display properties of
[plot outputs](/doc/command-reference/plots#plot-outputs) files.

> ⚠️ Note that this command can modify only data-series plots. It has no effect
> on image-type plots. See
> [Types of metrics](/doc/command-reference/plots#types-of-metrics).
> [Types of plots](/doc/command-reference/plots#types-of-plots).
## Synopsis

Expand Down

0 comments on commit ac87c80

Please sign in to comment.