Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better support project visibility / situational awareness #834

Closed
mattseddon opened this issue Sep 22, 2021 · 7 comments
Closed

Better support project visibility / situational awareness #834

mattseddon opened this issue Sep 22, 2021 · 7 comments
Assignees
Labels
A: trees Area: SCM and DVC-tracked trees discussion enhancement New feature or request product PR that affects product

Comments

@mattseddon
Copy link
Member

mattseddon commented Sep 22, 2021

Original use case:

"View all directories and files available to the project (Visibility / Situational awareness)".

In the extension we use dvc list . target --dvc-only to display the data in our DVC Tracked tree (generally shown below file explorer). We want to use this tree for situational awareness w.r.t data management in a workspace.

A user should be able to open a dataset (folder) view all of the available files, pull a selection of files and then inspect them further once they have been pulled.

They will also be able to perform basic actions against these files in the initial release (see #569 for details of what is already there).

Problem:

The data that dvc list . target --dvc-only provides.

Need:

A command that always returns the full list of files from a directory (including additions), regardless of the current state of the workspace.

Explanation:

We see the following behaviour against tracked directories:

directory state command shows
full all files are shown
files have been removed remaining files
files have been added extra files always show as directories - iterative/dvc#6094
directory has been added list fails -iterative/dvc#6695
empty fails - iterative/dvc#6106
directory deleted all files are shown

This means that when a user pulls a single file they can no longer see what else is available to them inside of that directory.

Demos:

Adding

Screen.Recording.2021-09-27.at.10.58.35.am.mov

Removing

Screen.Recording.2021-09-20.at.12.27.32.pm.mov

Related Issues:

  1. ls: optimize --dvc-only dvc#5712
  2. list . : inconsistent behaviour dvc#5866
  3. list: when adding a file to a dataset isdir is always true dvc#6094
  4. list: fails if a DVC-tracked directory is empty but present dvc#6106
  5. list: doesn't ignore gitignored files dvc#6122
  6. pull: granular pull of a single file triggers full imported dataset download dvc#6124
  7. pull: fails with NoneType error on non-existent target inside a DVC-tracked out dvc#6125
  8. dvc list: handle local repos differently? dvc#3590
  9. DVC tracked file rename fails when file is not pulled yet #808
  10. list .: fails on added directory path (tracked dataset) dvc#6695
  11. Empty directory causes error #529
@mattseddon mattseddon added discussion enhancement New feature or request large files product PR that affects product A: trees Area: SCM and DVC-tracked trees labels Sep 22, 2021
@mattseddon
Copy link
Member Author

mattseddon commented Sep 22, 2021

@shcheklein @efiop @dberenbaum moved this out from #772 (comment) and tidied up. Hopefully it makes more sense now. We can discuss next week.

@mattseddon
Copy link
Member Author

mattseddon commented Nov 24, 2021

We have actually fulfilled this requirement by switching over to use dvc list . -R --dvc-only (tree shares data with the SCM view now) but I will leave this issue open until the status command has been reworked. Can revisit then.

@dberenbaum
Copy link
Contributor

@mattseddon Can you clarify? Do you still expect status to return this info? I don't think there's much justification to show unmodified files in status unless it's needed for VS Code, so those requirements will likely impact the plans.

@mattseddon
Copy link
Member Author

@dberenbaum we were hoping that we could get that info via status and a hidden option.

The basic premise for us is that the fewer commands we have to run against the CLI the better.

@dberenbaum
Copy link
Contributor

@mattseddon Makes sense. Are you running into issues with getting the info now? We can always add that hidden flag later if dvc list is already working at the moment. Not sure yet if that's an easy addition.

@mattseddon
Copy link
Member Author

mattseddon commented Dec 1, 2021

@dberenbaum The two main problems with the current approach are performance and having another processing locking the repository (we will have to run status and list sync which takes a long time).

@mattseddon
Copy link
Member Author

Work here has been done. Waiting for data:status.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: trees Area: SCM and DVC-tracked trees discussion enhancement New feature or request product PR that affects product
Projects
None yet
Development

No branches or pull requests

3 participants