plots: flexible dvc.yaml spec #7086

dberenbaum · 2021-12-03T15:40:23Z

YAML doesn't permit dup keys also even if we change that to arrays we will have the issue of duplication and discrepancy, i.e. one template under one key and another one under the second key. So plot names should be unique string, whichever you write in target I suppose and datafile should go as key. We may still use a key if datafile is absent, this will simplify a very common one file case. So:

plots:
- train_vs_val:
    x: epoch
    y: loss
    data: [train_loss.csv, val_loss.csv]


- val_f1.csv:
    x: epoch
    y: [f1_class_0, f1_class_1]
    y_label: f1
    # data is absent, using key value: val_f1.csv

# Separately plot since we have TWO plots, even though with the same data file
- scores_acc:
    x: epoch
    y: acc
    data: scores.csv
- scores_auc:
    x: epoch
    y: auc
    data: scores.csv

Originally posted by @Suor in #5980 (reply in thread)

The text was updated successfully, but these errors were encountered:

dberenbaum · 2021-12-03T16:05:41Z

@pared I think it makes sense to implement in reverse order of how it's listed above.

Multiple plot specs from the same file:

# Separately plot since we have TWO plots, even though with the same data file
- scores_acc:
    x: epoch
    y: acc
    data: scores.csv
- scores_auc:
    x: epoch
    y: auc
    data: scores.csv

Requirements:

Have a way to define plots spec separate from a stage output.
Add an optional data field to specify file, in which case plot key is used as a title.
Update plots commands.
Nice to have: dvc plots add command to generate plots.

Multiple series per axis:

- val_f1.csv:
    x: epoch
    y: [f1_class_0, f1_class_1]
    y_label: f1
    # data is absent, using key value: val_f1.csv

Requirements:

Enable x, y, x-axis, and y-axis fields to be arrays.
Show multiple lines/series in plots.
Support diffs of multi-series plots -- most likely as a facet grid.

Multiple files within one plot:

- train_vs_val:
    x: epoch
    y: loss
    data: [train_loss.csv, val_loss.csv]

Requirements:

Enable data field to be an array.
Show multiple lines/series in plots and support diffs (similar to above).
Out of scope: x or y field arrays combined with data array. This can throw an error.

pared · 2021-12-14T16:16:15Z

Question: can we provide parallel coordinates plots config alongside/inside plots spec?

mattlbeck · 2022-01-04T20:27:28Z

Related: #6316

dberenbaum · 2022-01-04T20:53:32Z

Question: can we provide parallel coordinates plots config alongside/inside plots spec?

IMO let's not worry about it as part of this issue.

daavoo · 2022-05-09T19:45:40Z

Commenting here to not pollute #7477 with things that are not implementation-related.

The following syntax is currently supported in the P.R. and can cover all 3 use cases above:

plots:
  scores_acc:
    x: epoch
    y: 
      scores.csv: acc
  scores_auc:
    x: epoch
    y: 
      scores.csv: auc

val_f1:
    x: epoch
    y: 
      val_f1.csv: acc
      val_f1.csv: loss
    y_label: f1

- train_vs_val:
    x: epoch
    y: 
      train_loss.csv: loss
      val_loss.csv: loss

I'm missing the reasoning behind adding the data field. The above syntax looks more intuitive to me.

cc @dberenbaum @pared

dberenbaum · 2022-05-12T16:01:09Z

@daavoo I find them about equally intuitive, although getting rid of data obviously is fewer fields to manage.

I can see two distinctions using data:

It impacts the legend and axes (compare the left and right plots below). Should dvc try to find a common axis name and strip common field names (acc) from the legend in this case?

Should dvc support an x-axis from one file and y-axis from another file? For example:

plots:
  confusion:
    x: actual
    y: predicted
    data:
    - dir/actual.csv
    - dir/preds.csv
    template: confusion

Currently, this will produce two separate confusion matrices, which isn't what I want.

It won't work at all AFAIK without data, but maybe it could be supported like this:

plots:
  confusion:
    x: 
      dir/actual.csv: actual
    y:
      dir/preds.csv: predicted
    template: confusion

Thoughts @pared @daavoo?

daavoo · 2022-05-13T10:56:28Z

although getting rid of data obviously is fewer fields to manage

And less code to maintain.

it impacts the legend and axes

I think we should just remove data and iterate on the UI for legends.

but maybe it could be supported like this:

plots:
confusion:
x:
dir/actual.csv: actual
y:
dir/preds.csv: predicted
template: confusion

I would prefer this new syntax.

Thoughts @pared @daavoo?

I would vote for getting rid of data.

About the x-axis flexibility, I don't really know. How hard would it be if we focus on the explicit {path}:{entry} syntax @pared ?
If it changes too many things, we could leave all this as a future follow-up. There are already a lot of new use cases we are covering with the current state of the P.R.

pared · 2022-05-13T11:04:33Z

AFAIR it should be not that big of a problem, we already have underlying logic. Just need to modify it a little bit. At least for y. x did not have this logic till now but still, I think we should be able to handle that. Currently I need to kind of search for x and handle situations when the particular column is not in the targeted files. With explicit x that would be easier, I think.

dberenbaum · 2022-05-13T16:42:47Z

So how does this look for updated syntax of the original comment at the top of this issue?

plots:
- train_vs_val:
    x: 
    - train_loss.csv:epoch
    - val_loss.csv:epoch
    y: 
    - train_loss.csv:loss
    - val_loss.csv:loss

- val_f1.csv:
    x:
    - epoch
    - epoch
    y:
    - f1_class_0
    - f1_class_1
    y_label: f1
    # data source is absent, using key value: val_f1.csv

# Separately plot since we have TWO plots, even though with the same data file
- scores_acc:
    x: scores.csv:epoch
    y: scores.csv:acc
- scores_auc:
    x: scores.csv:epoch
    y: scores.csv:auc

It is more verbose, but I like that it's more clear and explicit. My only question would be to what extent to allow shortcuts and try to infer expected behavior. For example:

If you specify the data source for only one of x or y, should dvc try to infer it for the other?
In the val_f1.csv example, do you actually need to specify the x value twice?

I don't mind requiring everything to be listed explicitly for now since it does make behavior more clear.

skshetry · 2022-05-14T13:11:45Z

Another suggestion, that clearly separates plot level metadata with props, it's a bit verbose though.

plots:
  train_vs_val:
    title: title
    props:
    - path: train_loss.csv  # or, train_loss.csv:epoch as proposed above
      key: epoch
      axis: x
    - path: val_loss.csv
      key: epoch
      axis: x
      label: f1

dberenbaum · 2022-05-16T16:51:06Z

@pared and I discussed and came up with the following action points:

Remove data field support.
Add new issue for x path support.
Add new issue for CLI support.
Check that syntax is consistent with params.

Edit:

@pared We forgot to add in cleaning up the legend and labels (see the first point in #7086 (comment)).

dberenbaum · 2022-05-16T17:45:03Z

@pared The example above in #7086 (comment) uses file paths as dict keys under y, which doesn't work in one of the examples:

val_f1:
    x: epoch
    y: 
      val_f1.csv: acc
      val_f1.csv: loss
    y_label: f1

This will raise a duplicate keys error. Should it instead support a list of values here (note that this isn't supported today)?

plots:
  scores_acc:
    x: epoch
    y: 
    - scores.csv: 
      - acc
  scores_auc:
    x: epoch
    y: 
    - scores.csv:
      - auc

val_f1:
    x: epoch
    y: 
    - val_f1.csv:
      - acc
    - val_f1.csv:
      - loss
    y_label: f1

train_vs_val:
    x: epoch
    y: 
    - train_loss.csv:
      - loss
    - val_loss.csv:
      - loss

This would be consistent with how y works without paths since it expects a list if there are multiple y values, and it would be similar to how stage parameters are handled:

stages:
  train:
    cmd: python train.py
    params:
    - config.yaml:
      - alpha
      - beta
    outs:
    - model.h5

pared · 2022-05-19T10:02:03Z

@dberenbaum yeah, that makes sense, we can also support both, support for dict is in place anyway, so...

pared · 2022-05-19T10:32:56Z

@skshetry This is definitely very explicit. The question is whether it won't be too much for users to type. I think we should be going for the least amount of work needed for user to input the data. On the other hand that implies a lot of automatic behavior (like infering x data sources knowing y) which might be to hard to grasp.

pared · 2022-05-19T13:39:55Z

@daavoo
regarding this:

val_f1:
    x: epoch
    y: 
      val_f1.csv: acc
      val_f1.csv: loss
    y_label: f1

It can be achieved with:

val_f1:
    x: epoch
    y: 
      val_f1.csv: [acc, loss]
      y_label: f1

Closes: iterative#7086

Closes: #7086

dberenbaum assigned pared Dec 3, 2021

dberenbaum added the A: plots Related to the plots label Dec 3, 2021

dberenbaum mentioned this issue Jan 25, 2022

ref: clarify about types of metrics/plots iterative/dvc.org#2956

Closed

pared mentioned this issue Jan 28, 2022

html generation: make dvclive independent of dvc iterative/dvclive#213

Closed

dberenbaum mentioned this issue Mar 31, 2022

plots: show path in html iterative/dvc-render#22

Open

daavoo mentioned this issue Feb 10, 2022

Add GS:Visualization and Plots iterative/dvc.org#3050

Merged

This was referenced Apr 22, 2022

plots diff: unnecessary requirement for dvc.yaml to be valid #6150

Closed

Fill anchors instead of quoted anchors iterative/dvc-render#28

Closed

This was referenced May 16, 2022

plots: CLI support for flexible plots #7753

Closed

plots: support for flexible x-axis values #7754

Closed

dberenbaum mentioned this issue May 31, 2022

plots: smart legend labeling #7830

Closed

pared mentioned this issue Jun 13, 2022

plots: explicit template generation (DVC 2.9.4) iterative/dvc.org#3295

Closed

dberenbaum mentioned this issue Jun 17, 2022

Initial support for flexible plots #7477

Merged

2 tasks

pared added a commit to pared/dvc that referenced this issue Jul 1, 2022

plots: introduce flexible plots configuration to dvcfiles

a834fb0

Closes: iterative#7086

pared closed this as completed in #7477 Jul 1, 2022

pared added a commit that referenced this issue Jul 1, 2022

plots: introduce flexible plots configuration to dvcfiles

fbb98a8

Closes: #7086

jorgeorpinel mentioned this issue Jul 17, 2022

cmd-ref: plots: flexible plots docs iterative/dvc.org#3691

Closed

jorgeorpinel mentioned this issue Nov 30, 2022

docs/studio: top-level plots iterative/dvc.org#4121

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

plots: flexible dvc.yaml spec #7086

plots: flexible dvc.yaml spec #7086

dberenbaum commented Dec 3, 2021

dberenbaum commented Dec 3, 2021

pared commented Dec 14, 2021 •

edited

mattlbeck commented Jan 4, 2022

dberenbaum commented Jan 4, 2022

daavoo commented May 9, 2022

dberenbaum commented May 12, 2022 •

edited

daavoo commented May 13, 2022

pared commented May 13, 2022

dberenbaum commented May 13, 2022

skshetry commented May 14, 2022 •

edited

dberenbaum commented May 16, 2022 •

edited

dberenbaum commented May 16, 2022 •

edited

pared commented May 19, 2022

pared commented May 19, 2022

pared commented May 19, 2022

plots: flexible dvc.yaml spec #7086

plots: flexible dvc.yaml spec #7086

Comments

dberenbaum commented Dec 3, 2021

dberenbaum commented Dec 3, 2021

pared commented Dec 14, 2021 • edited

mattlbeck commented Jan 4, 2022

dberenbaum commented Jan 4, 2022

daavoo commented May 9, 2022

dberenbaum commented May 12, 2022 • edited

daavoo commented May 13, 2022

pared commented May 13, 2022

dberenbaum commented May 13, 2022

skshetry commented May 14, 2022 • edited

dberenbaum commented May 16, 2022 • edited

dberenbaum commented May 16, 2022 • edited

pared commented May 19, 2022

pared commented May 19, 2022

pared commented May 19, 2022

pared commented Dec 14, 2021 •

edited

dberenbaum commented May 12, 2022 •

edited

skshetry commented May 14, 2022 •

edited

dberenbaum commented May 16, 2022 •

edited

dberenbaum commented May 16, 2022 •

edited