A set of commands to visualize and compare plot data: show, diff, and modify.
usage: dvc plots [-h] [-q | -v] {show,diff,modify} ...
positional arguments:
COMMAND
show Generate plots from target files or plots definitions from `dvc.yaml` file.
diff Show multiple versions of plot data by plotting them in a single image.
modify Modify display properties of data-series plot outputs (has no effect on image-type plots).
DVC provides a set of commands to visualize data produced by machine learning experiments. Usual plot examples are AUC curves, loss functions, confusion matrices, among others.
This type of data is created by users, or generated by user data processing code.
dvc plots
is able to visualize two types of data:
- Data series files, which can be JSON, YAML, CSV or TSV.
- Image files in JPEG, GIF, or PNG format.
DVC generates visualizations as static HTML webpages that can be open with a web browser. They can also be saved as SVG or PNG image files from the browser.
Data-series plots utilize Vega-Lite for
rendering (declarative JSON grammar for defining graphics). Images are rendered
using <img>
tags directly.
Images are included in HTML as-is, without additional processing.
We recommend to track these source image files with DVC instead of Git, to prevent the repository from bloating.
Structured plots can be read from JSON, YAML 1.2, CSV, or TSV files. DVC expects to see an array (or multiple arrays) of objects (usually float numbers) in the file.
In tabular file formats such as CSV and TSV, each column is an array.
dvc plots
subcommands can produce plots for a specified column or a set of
them. For example, epoch
, AUC
, and loss
are the column names below:
epoch, AUC, loss
34, 0.91935, 0.0317345
35, 0.91913, 0.0317829
36, 0.92256, 0.0304632
37, 0.92302, 0.0299015
Hierarchical file formats such as JSON and YAML consists of an array of consistent objects (sharing a common structure): All objects should contain the fields used for the X and Y axis of the plot (see DVC template anchors); Extra elements will be ignored silently.
dvc plots
subcommands can produce plots for a specified field or a set of
them, from the array's objects. For example, val_loss
is one of the field
names in the train
array below:
{
"train": [
{ "val_accuracy": 0.9665, "val_loss": 0.10757 },
{ "val_accuracy": 0.9764, "val_loss": 0.07324 },
{ "val_accuracy": 0.877, "val_loss": 0.08136 },
{ "val_accuracy": 0.874, "val_loss": 0.09026 },
{ "val_accuracy": 0.8795, "val_loss": 0.0764 },
{ "val_accuracy": 0.8803, "val_loss": 0.07608 },
{ "val_accuracy": 0.8987, "val_loss": 0.08455 }
]
}
In order to create visualizations, users need to provide the data and
(optionally) configuration that will help customize the plot. DVC provides two
ways to configure visualizations. Users can mark specific stage
outputs as plot or define plot configuration inside dvc.yaml
under plots
key.
Plots defined in dvc.yaml
are especially useful when users want to compare
data from differend data sources residing on the same version of the project.
For example, comparing training versus test results on current branch.
In order to define the plot users need to provide data and configuration for the
plot. The plots should be defined in dvc.yaml
file under plots
key. Refer to
the examples for
more syntax insight.
# dvc.yaml
stages: ...
plots: ...
When using dvc run
or dvc stage add
, instead of using
--outs/--outs-no-cache
particular outputs can be marked with
--plots/--plots-no-cache
. This will tell DVC that they are intended for
visualizations. This special type of outputs might come in hand if users want to
visually compare experiments results with other experiments versions. For
example, comparing new experiment with the baseline version of the project.
Users have the ability to change the way data-series plots are displayed by modifying the Vega-Lite specification, thus generating plots in the style that best fits the their needs. This keeps DVC projects programming language agnostic, as it's independent from user display configuration and visualization code.
Built-in plot templates are stored in the .dvc/plots/
directory. The default
one is called default.json
. It can be changed with the --template
(-t
)
option of dvc plots show
and dvc plots diff
. For templates in the
.dvc/plots/
directory, the path and the json extension are not required: you
can specify only the base name e.g. --template scatter
.
DVC has the following built-in plot templates:
default
- linear plotscatter
- scatter plotsmooth
- linear plot with LOESS smoothing, see exampleconfusion
- confusion matrix, see example
Plot template files are Vega-Lite files
that use predefined DVC anchors as placeholders for DVC to inject the plot
values. You can create a custom template from scratch, or modify an existing one
from .dvc/plots/
.
💡 Note that custom templates can be safely added to the template directory.
All metrics files given to dvc plots show
and dvc plots diff
as input are
combined together into a single data array for injection into a template file.
There are two important fields that DVC adds to the plot data:
-
index
- zero-based counter for the data rows/values. In many cases it corresponds to a machine learning training epoch or step number. -
rev
- Git commit hash, tag, or branch of the metrics file. This helps distinguish between different versions when using thedvc plots diff
command.
Note that in the case of CSV/TSV metrics files, column names from the table header (first row) are equivalent to field names.
-
<DVC_METRIC_DATA>
(required) - the plot data from any type of metrics files is converted to a single JSON array, and injected instead of this anchor. Two additional fields will be added:index
andrev
(explained above). -
<DVC_METRIC_TITLE>
(optional) - a title for the plot, that can be defined with the--title
option of thedvc plot
subcommands. -
<DVC_METRIC_X>
(optional) - field name of the data for the X axis. It can be defined with the-x
option of thedvc plot
subcommands. The auto-generatedindex
field (explained above) is the default. -
<DVC_METRIC_Y>
(optional) - field name of the data for the Y axis. It can be defined with the-y
option of thedvc plot
subcommands. It defaults to the last header of the metrics file: the last column for CSV/TSV, or the last field for JSON/YAML. -
<DVC_METRIC_X_LABEL>
(optional) - field name to display as the X axis label -
<DVC_METRIC_Y_LABEL>
(optional) - field name to display as the X axis label
It's possible to supply an HTML file to dvc plot show
and dvc plot diff
by
using the the --html-template
option. This allows you to customize the
container where DVC will inject plots it generates.
⚠️ This is a separate feature from custom Vega-Lite templates.
The only requirement for this HTML file is to specify the place to inject plots
with a {plot_divs}
marker. See an
example that uses
this feature to render DVC plots without an Internet connection, below.
-
-h
,--help
- prints the usage/help message, and exit. -
-q
,--quiet
- do not write anything to standard output. -
-v
,--verbose
- displays detailed tracing information.
We'll use tabular data file logs.csv
for this example:
epoch,loss,accuracy
1,0.19,0.81
2,0.11,0.89
3,0.07,0.93
4,0.04,0.96
Let's plot the last column (default behavior):
$ dvc plots show logs.csv
file:///Users/usr/src/dvc_plots/index.html
Difference in this metric between the current project version and the previous commit:
$ dvc plots diff HEAD^ --targets logs.csv
file:///Users/usr/src/dvc_plots/index.html
Visualize a specific field (loss
) as y. Use epoch
as x:
$ dvc plots show logs.csv -y loss -x epoch
file:///Users/usr/src/dvc_plots/index.html
In some cases we would like to smooth our plot. In this example we will use a noisy plot with 100 data points:
$ dvc plots show data.csv
file:///Users/usr/src/dvc_plots/index.html
We can use the -t
option and smooth
template to make it less noisy:
$ dvc plots show -t smooth data.csv
file:///Users/usr/src/dvc_plots/index.html
We'll use classes.csv
for this example:
actual,predicted
cat,cat
cat,cat
cat,cat
cat,dog
cat,dinosaur
cat,dinosaur
cat,bird
turtle,dog
turtle,cat
...
Let's visualize it:
$ dvc plots show classes.csv --template confusion -x actual -y predicted
file:///Users/usr/src/dvc_plots/index.html
A confusion matrix template is predefined in DVC.
We can use confusion_normalized
template to normalize the results:
$ dvc plots show classes.csv --template confusion_normalized -x actual -y predicted
file:///Users/usr/src/dvc_plots/index.html
Let's get back to the logs.csv
data:
# logs.csv
epoch,loss,accuracy
1,0.19,0.81
2,0.11,0.89
3,0.07,0.93
4,0.04,0.96
Minimal plot definition we can put in dvc.yaml
is simply data source path
relative to dvc.yaml
file:
# dvc.yaml
stages:
train:
cmd: echo "Training the model..."
plots:
logs.csv:
$ dvc plots show
file:///Users/usr/src/dvc_plots/index.html
We can customize it:
# dvc.yaml
stages:
train:
cmd: echo "Training the model..."
plots:
logs.csv:
x: epoch
y: accuracy
title: Displaying accuracy
x_label: This is epoch
y_label: This is accuracy
$ dvc plots show
file:///Users/usr/src/dvc_plots/index.html
Data in training_data.csv
:
epoch,train_loss,test_loss
1,0.33,0.4
2,0.3,0.28
3,0.2,0.25
4,0.1,0.23
# dvc.yaml
stages:
train:
cmd: echo "Training the model..."
plots:
test_vs_train_loss:
x: epoch
y:
training_data.csv: [test_loss, train_loss]
title: Compare loss training versus test
Lets prepare comparison for confusion matrix data between test set and training set:
# train_classes.csv
actual_class,predicted_class
dog,dog
dog,dog
dog,dog
dog,bird
cat,cat
cat,cat
cat,cat
cat,dog
bird,bird
bird,bird
bird,bird
bird,dog
# test_classes.csv
actual_class,predicted_class
dog,dog
dog,dog
dog,cat
bird,bird
bird,bird
bird,cat
cat,cat
cat,cat
cat,bird
# dvc.yaml
stages:
train:
cmd: echo "Training the model..."
plots:
test_vs_train_confusion:
x: actual_class
y:
train_classes.csv: predicted_class
test_classes.csv: predicted_class
title: Compare test vs train confusion matrix
template: confusion
x_label: Actual class
y_label: Predicted class
The plots generated by dvc plots
uses Vega-Lite JavaScript libraries, and by
default these load online resources.
There may be times when you need to produce plots without Internet access, or
want to customize the plots output to put some extra content, like banners or
extra text. DVC allows to replace the HTML file that contains the final plots.
Download the Vega-Lite libraries into the directory where you'll produce the
dvc plots
:
$ wget https://cdn.jsdelivr.net/npm/vega@5.20.2 -O my_vega.js
$ wget https://cdn.jsdelivr.net/npm/vega-lite@5.1.0 -O my_vega_lite.js
$ wget https://cdn.jsdelivr.net/npm/vega-embed@6.18.2 -O my_vega_embed.js
Create the following HTML file and save it in .dvc/plots/mypage.html
:
<html>
<head>
<script src="../path/to/my_vega.js" type="text/javascript"></script>
<script src="../path/to/my_vega_lite.js" type="text/javascript"></script>
<script src="../path/to/my_vega_embed.js" type="text/javascript"></script>
</head>
<body>
{plot_divs}
</body>
</html>
Note that this is a standard HTML file with only {plot_divs}
as a placeholder
for DVC to inject plots. <script>
tags in this file point to the local
JavaScript libraries we downloaded above. We can use it like this:
$ dvc plots show --html-template .dvc/plots/mypage.html
You can also make it the default HTML template by setting it as dvc config
parameter plots.html_template
.
$ dvc config plots.html_template plots/mypage.html
Note that the path supplied to dvc config plots.html_template
is relative to
.dvc/
directory.