Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dvc plots show: paths relative to dvc.yaml instead of $PWD in version 2.12.1 and 2.13.0 #8003

Closed
tibor-mach opened this issue Jul 11, 2022 · 7 comments · Fixed by #8004
Closed
Assignees
Labels
A: plots Related to the plots regression Ohh, we broke something :-(

Comments

@tibor-mach
Copy link
Contributor

Bug Report

Description

In the new release 2.13.0, dvc plots show (or diff) does not work with a setup when the dvc.yaml file is not in the root directory and when the data (for plots) are on a different path relative to the root.

For example with a repo structure like this

.
├── data
├── dvc_plots
├── modules
├── notebooks
├── pipelines

where

pipelines
├── segment_X
│   └── classification
│       ├── product_A
│       │   ├── dvc.lock
│       │   ├── dvc.yaml
│       │   └── params.yaml
│       ├── product_B
│       │   ├── dvc.lock
│       │   ├── dvc.yaml
│       │   └── params.yaml

and

data
├── segment_X
│   ├── classification
│   │   ├── product_A
│   │   │   └── precision_recall_curve.csv
│   │   ├── product_B
│   │   │   └── precision_recall_curve.csv

where in each dvc.yaml file and each stage we have

wdir: ../../../..

(i.e. the working directory is the root of the repo)

you get the following warning when calling dvc plots show

WARNING: 'pipelines/segment_X/classification/product_A/data/segment_X/classification/product_A/precision_recall_curve.csv' was not found in current workspace. 

and similarly with other pipelines and plots. The issue is clearly that dvc seems to use the path to the corresponding dvc.yaml as the working directory (as the path above is indeed not in the workspace since the data directory is in the root directory)

Reproduce

  1. Create a pipeline which is structured as above with dvc.yaml in a different directory than the root and outputs in yet another directory.
  2. add some plots to a stage in dvc.yaml, the cause might also theoretically come from templating, so try something like
stages:
  plot:
    wdir: ../../../..
    cmd: >-
      python modules/evaluate.py
      --params=${paths.params_file}
    deps: ...
    params:
      - ${paths.params_file}:
          - paths
    metrics: ...
    plots:
      - ${paths.precision_recall_curve}:
          title: "Precision recall curve"
          x: recall
          y: precision

where the params.yaml should be in the same directory as the dvc.yaml and contain the following:

paths:
  params_file: pipelines/segment_X/classification/product_A/params.yaml
  precision_recall_curve: data/segment_X/classification/product_A/precision_recall_curve.csv

  1. Call dvc repro to create the precision_recall_curve.csv in the first place
  2. Call dvc plots show or dvc plots diff

Expected

The behaviour of dvc up till 2.12.0 where the plot paths are found correctly and relative to the $PWD instead of the directory where the coresponding dvc.yaml is located.

Environment information

Output of dvc doctor:

$ dvc doctor
DVC version: 2.13.0 (pip)
---------------------------------
Platform: Python 3.10.5 on Linux-5.18.7-1-MANJARO-x86_64-with-glibc2.35
Supports:
        azure (adlfs = 2022.4.0, knack = 0.9.0, azure-identity = 1.10.0),
        webhdfs (fsspec = 2022.5.0),
        http (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
        https (aiohttp = 3.8.1, aiohttp-retry = 2.4.6)
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/nvme0n1p2
Caches: local
Remotes: azure
Workspace directory: ext4 on /dev/nvme0n1p2
Repo: dvc, git

Additional Information (if any):
The new version of dvc has two new dependencies:

  • dvc-data-0.0.23
  • dvc-objects-0.0.23

I am not sure if either of these two could be the cause, but they were both bumped from version 0.0.16 in dvc 2.13.0

@tibor-mach tibor-mach changed the title dvc plots show: working directory conflict (paths relative to dvc.yaml instead of $PWD) in version 2.13.0 dvc plots show: paths relative to dvc.yaml instead of $PWD in version 2.13.0 Jul 11, 2022
@tibor-mach tibor-mach changed the title dvc plots show: paths relative to dvc.yaml instead of $PWD in version 2.13.0 dvc plots show: paths relative to dvc.yaml instead of $PWD in version 2.12.1 and 2.13.0 Jul 11, 2022
@daavoo
Copy link
Contributor

daavoo commented Jul 11, 2022

Sounds like an unexpected consequence of #7477 . cc @pared

@tibor-mach
Copy link
Contributor Author

Could it be related to the change to dvc/repo/plots/init.py in this commit and also this parent commit?

Not that I understand the details of the inner workings of dvc, but these two commits seem to have changed the way plots are accessed.

@daavoo daavoo added A: plots Related to the plots regression Ohh, we broke something :-( labels Jul 11, 2022
@pared
Copy link
Contributor

pared commented Jul 11, 2022

@tibor-mach not sure which one causes it, but I think its the latter.

@pared pared self-assigned this Jul 11, 2022
@pared
Copy link
Contributor

pared commented Jul 11, 2022

@tibor-mach can you check if #8004 fixes the problem? (if its possible for you to install from branch)

@tibor-mach
Copy link
Contributor Author

@pared
Yep, works as intended with the following version

(obtained from your branch, installation by pip
pip install git+https://github.com/pared/dvc.git@8003_fix_wdir_plots
)

DVC version: 2.3.1.dev1038+gdce18876 
---------------------------------
Platform: Python 3.10.5 on Linux-5.18.7-1-MANJARO-x86_64-with-glibc2.35
Supports:
        azure (adlfs = 2022.4.0, knack = 0.9.0, azure-identity = 1.10.0),
        webhdfs (fsspec = 2022.5.0),
        http (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
        https (aiohttp = 3.8.1, aiohttp-retry = 2.4.6)
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/nvme0n1p2
Caches: local
Remotes: azure
Workspace directory: ext4 on /dev/nvme0n1p2
Repo: dvc, git

pared added a commit to pared/dvc that referenced this issue Jul 13, 2022
@tibor-mach
Copy link
Contributor Author

@pared Hi, how's the progress on this one?

pared added a commit that referenced this issue Jul 26, 2022
@pared
Copy link
Contributor

pared commented Jul 26, 2022

Hi @tibor-mach! Should be fixed in next release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: plots Related to the plots regression Ohh, we broke something :-(
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants