Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dvc queue start checkout more files than required #10293

Open
PythonFZ opened this issue Feb 8, 2024 · 0 comments
Open

dvc queue start checkout more files than required #10293

PythonFZ opened this issue Feb 8, 2024 · 0 comments
Labels
A: experiments Related to dvc exp p2-medium Medium priority, should be done, but less important performance improvement over resource / time consuming tasks

Comments

@PythonFZ
Copy link
Contributor

PythonFZ commented Feb 8, 2024

Bug Report / Feature Request

I have a project with a data directory (150 GB) containing 11 files. I have added the entire directory using dvc add data.
In my workflow each experiment I want to conduct only depends on a single file in the data directory.
Running an experiment using the dvc queue will dvc checkout the entire data directory.
It would be much faster if the command only dvc checkout data files, which are actually required by the workflow, as defined in the dvc.yaml .

Expected

The data directory in .dvc/tmp/exp/... would only contain files, specified as explicit dependencies in the dvc.yaml workflow file.

Environment information

Output of dvc doctor:

DVC version: 3.43.1 (pip)
-------------------------
Platform: Python 3.11.7 on Linux-6.5.0-15-generic-x86_64-with-glibc2.35
Subprojects:
        dvc_data = 3.9.0
        dvc_objects = 3.0.6
        dvc_render = 1.0.1
        dvc_task = 0.3.0
        scmrepo = 2.0.2
Supports:
        http (aiohttp = 3.9.1, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.9.1, aiohttp-retry = 2.8.3),
        s3 (s3fs = 2024.2.0, boto3 = 1.34.34)
Config:
        Global: /tikhome/fzills/.config/dvc
        System: /etc/xdg/dvc
Cache types: hardlink, symlink
Cache directory: nfs on 129.69.120.13:/share/work_icp/fzills
Caches: local
Remotes: None
Workspace directory: nfs on 129.69.120.13:/share/work_icp/fzills
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/240bb452ebd33bc5c31f30d78040c7d2
@dberenbaum dberenbaum added p2-medium Medium priority, should be done, but less important performance improvement over resource / time consuming tasks A: experiments Related to dvc exp labels Feb 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: experiments Related to dvc exp p2-medium Medium priority, should be done, but less important performance improvement over resource / time consuming tasks
Projects
None yet
Development

No branches or pull requests

2 participants