Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

html generation: make dvclive independent of dvc #213

Closed
Tracked by #225
pared opened this issue Jan 28, 2022 · 11 comments · Fixed by #227
Closed
Tracked by #225

html generation: make dvclive independent of dvc #213

pared opened this issue Jan 28, 2022 · 11 comments · Fixed by #227

Comments

@pared
Copy link
Contributor

pared commented Jan 28, 2022

In iterative/dvc#6944 we are considering extracting the rendering outside of DVC.
This issue would go nicely with #163 - because we could let both DVC and dvclive use the same rendering backend, which would make dvclive more functional.

This unravels some problems:

  1. Plot is not only some data, but also a configuration
  2. DVC has a capability to store the configuration, dvclive does not

In order to provide such functionality, we need to let dvclive provide configuration to the potential plots library.
We discussed this issue with @dberenbaum and @daavoo and it seems that we could go two ways about it:

  1. Let the dvclive render the data only at the runtime - we know what method has been called and can provide proper config from within the library - this however would make visualizing previously logged data hard. We could avoid that by saving the config on DVC side (similar to what we do with html and summary now) - though that would make it work only in DVC + dvclive use case. Standalone dvclive would be limited.

  2. Let the dvclive store the configuration - that seems like our desired behavior, though it poses a problem: how to structure the plot config and how to store it so that both standalone dvclive and dvvclive + DVC use cases work properly?
    Plot config would probably be defined by plots: flexible dvc.yaml spec dvc#7086
    In the same issue we are intending to define plots specification inside dvc.yaml - but if we want dvclive to render too, we would need to store it somewhere non-DVC related. Initial idea would be to store it in dvclive_path/.config, but in this case, when we are using dvclive alongside DVC we will need to make sure DVC reads additional plots definitions from within dvclive dirs.

Maybe we could let plots section inside dvc.yaml address other file by path?
eg:

# dvc.yaml
plots:
   plots_configs: ['dvclive/.config.yaml']
   plot1:
   ...
   plot2:
   ...
   

Which would let DVC know where to look for additional plots configurations and "expand it" on dvc plots show/diff. That would also make dvclive independent of DVC in terms of plots generation.
An additional problem for this use case is templates: do we want to let users define their own plots templates?
We will have to store them somewhere.

Did I miss something? cc @dberenbaum @daavoo

@dberenbaum
Copy link
Contributor

cc @efiop

Maybe we could let plots section inside dvc.yaml address other file by path?

This seems reasonable and has been discussed in other scenarios, like referencing dvc.yaml files in subdirs. I'm not quite sure I understand the syntax as is. How would you determine whether a key under plots is a reference to a plot or a config file? Maybe it makes more sense to have each plot be able to define its own config path?

Other options:

  • Use some convention, like keep the config next to the plots file and have DVC look for it there. This saves users from having to specify the config path and ensures the plots and their config files are together. On the downside, it is less explicit and makes it hard to configure plots that combine paths in different locations (although this could still be handled in the plots section of dvc.yaml).
  • Use the current working directory for reading/writing the plots config. This could actually replace the plots section in dvc.yaml altogether by just putting this same info into a separate file. It's also consistent with how current DVC commands like dvc repro find dvc.yaml files by default. On the downside, it means separating configuration between dvc.yaml and another config file, and it means that behavior changes when running the same script or command from different directories.

@pared
Copy link
Contributor Author

pared commented Jan 31, 2022

I'm not quite sure I understand the syntax as is.

The one I proposed is clunky, I just wanted to share the idea. Having each plot define its own config would definitely be possible .

Use the current working directory for reading/writing the plots config. This could actually replace the plots section in dvc.yaml altogether by just putting this same info into a separate file. It's also consistent with how current DVC commands like dvc repro find dvc.yaml files by default. On the downside, it means separating configuration between dvc.yaml and another config file, and it means that behavior changes when running the same script or command from different directories.

Another problem that I see with this: how do we handle it in case of standalone dvclive? I think there needs to be some standard convention where do we have a config, so that both DVC and standalone dvclive can handle that. Worth noting that user might not have to specify those paths when using DVC - it is possible that DVC-dvclive integration would be able to handle it.

@dberenbaum
Copy link
Contributor

Another problem that I see with this: how do we handle it in case of standalone dvclive? I think there needs to be some standard convention where do we have a config, so that both DVC and standalone dvclive can handle that. Worth noting that user might not have to specify those paths when using DVC - it is possible that DVC-dvclive integration would be able to handle it.

Yeah, one advantage of using the current working directory is that it can work the same way with or without DVC. In that case, standalone dvclive run from the same directory will have the same behavior.

@shcheklein
Copy link
Member

сс @mattseddon JFYI since it might be affecting certain decisions on the VS code end (the whole live section will be deprectated)

@pared
Copy link
Contributor Author

pared commented Feb 12, 2022

@shcheklein I am not sure I understand, why would be live section deprecated?

@mattseddon
Copy link
Member

@shcheklein @pared as long as the extension has access to the template, the path to the plots data and the plots data is being updated then everything should be ok. We are not currently dependent on dvclive doing anything outside of writing the data. From reading it doesn't sound as if there is going to be a breaking change to plots diff but I could be wrong, please LMK.

@pared
Copy link
Contributor Author

pared commented Feb 14, 2022

I think that there might be some misconception regarding what this change is about to do.
Currently, dvclive has to rely on DVC making the visualizations of the logged metrics. What we intend to do is to extract a library taking care of generating HTML, that would be used by both dvclive and dvc so that both can work independently of each other. Still, we cannot allow making multiple HTML's (one by dvclive and the other by dvc) in case of cooperation of the two. We will need to update the integration to make one of them create visualization in such use case. Besides that everything should stay as it was, so I don't think that will have some negative influence on vscode integration. I might be missing something though, so @shcheklein I would appreciate it if you could expand your concerns a bit.

@daavoo
Copy link
Contributor

daavoo commented Feb 14, 2022

@shcheklein I am not sure I understand, why would be live section deprecated?

@pared This is coming from https://www.notion.so/iterative/Deprecate-live-section-of-dvc-yaml-432838600d9f48e99db3e3f92fec111c . The overall idea is that we could start using metrics and plots sections instead of live once DVCLive could generate HTML by itself.

@pared
Copy link
Contributor Author

pared commented Feb 14, 2022

Ah, sorry, my bad, that makes sense. I thought we are talking here about the live plots that were supposed to be a section in vscode integration - I guess we need a clear separation of concepts we are talking about.

@dberenbaum
Copy link
Contributor

@daavoo I know deprecating live depends on this issue, but do you think we should open a separate issue in https://github.com/iterative/dvc.org? Replacing instances of live in the docs is probably all that will be needed to start, right?

@dberenbaum
Copy link
Contributor

Note that if we deprecate live, we will also want to update dvc exp init --type dl. Cc @skshetry

daavoo added a commit that referenced this issue Mar 30, 2022
Uses `dvc_render`.
Add new method `Live.make_report`.

Closes #213 #195
daavoo added a commit that referenced this issue Apr 1, 2022
Uses `dvc_render`.
Add new method `Live.make_report`.

Closes #213 #195
daavoo added a commit that referenced this issue Apr 4, 2022
Uses `dvc_render`.
Add new method `Live.make_report`.

Closes #213 #195
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants