Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline is not executed for parameter with name size or nfiles #10296

Open
shcheklein opened this issue Feb 9, 2024 · 3 comments
Open

Pipeline is not executed for parameter with name size or nfiles #10296

shcheklein opened this issue Feb 9, 2024 · 3 comments
Labels
A: pipelines Related to the pipelines feature bug Did we break something?

Comments

@shcheklein
Copy link
Member

Bug Report

Description

See this link https://stackoverflow.com/questions/77962532/dvc-using-cached-run-although-parameter-changed

Reproduce

Use this repo: https://github.com/shcheklein/test-dvc-so-77962532

Run with size 30, then change to 40, run dvc status, run dvc repro again. It's not running the pipeline, saying this:

Stage 'data_ingestion' is cached - skipping run, checking out outputs
Updating lock file 'dvc.lock'

To track the changes with git, run:

	git add dvc.lock

To enable auto staging, run:

	dvc config core.autostage true
Use `dvc push` to send your updates to remote storage.

File size stays the same.

Logs

2024-02-08 20:48:29,997 DEBUG: v3.44.0 (pip), CPython 3.11.4 on macOS-13.3.1-arm64-arm-64bit
2024-02-08 20:48:29,998 DEBUG: command: /Users/ivan/Projects/test-dvc-so/.venv/bin/dvc repro -v
2024-02-08 20:48:30,158 DEBUG: Dependency 'params.yaml' of stage: 'data_ingestion' changed because it is '{'size': 'modified'}'.
2024-02-08 20:48:30,159 DEBUG: stage: 'data_ingestion' changed.
2024-02-08 20:48:30,159 DEBUG: Removing output 'artifacts/data_ingestion' of stage: 'data_ingestion'.
2024-02-08 20:48:30,160 DEBUG: Removing '/Users/ivan/Projects/test-dvc-so/artifacts/data_ingestion'
2024-02-08 20:48:30,161 DEBUG: {}
2024-02-08 20:48:30,161 DEBUG: defaultdict(<class 'dict'>, {'params.yaml': {'size': 'modified'}})
Stage 'data_ingestion' is cached - skipping run, checking out outputs
2024-02-08 20:48:30,163 DEBUG: Removing '/Users/ivan/Projects/test-dvc-so/artifacts/.COXZdYuRz3gn4oeArpSdWQ.tmp'
2024-02-08 20:48:30,164 DEBUG: Removing '/Users/ivan/Projects/test-dvc-so/artifacts/.COXZdYuRz3gn4oeArpSdWQ.tmp'
2024-02-08 20:48:30,164 DEBUG: Removing '/Users/ivan/Projects/test-dvc-so/.dvc/cache/files/md5/.wnNey-IUBNjTwgwkjkVJoQ.tmp'
2024-02-08 20:48:30,170 DEBUG: built tree 'object 3d7dd9c155ee06ec6ff8fa04e49f49fe.dir'
2024-02-08 20:48:30,170 DEBUG: Computed stage: 'data_ingestion' md5: '91baabba76b22d5f1480db2cfe105d8b'
2024-02-08 20:48:30,173 DEBUG: built tree 'object 3d7dd9c155ee06ec6ff8fa04e49f49fe.dir'
2024-02-08 20:48:30,173 DEBUG: Preparing to transfer data from 'memory://dvc-staging-md5/2b21226c06eec22f3477afe4c6de75a80828635723b703713230e4c3c4c39626' to '/Users/ivan/Projects/test-dvc-so/.dvc/cache/files/md5'
2024-02-08 20:48:30,173 DEBUG: Preparing to collect status from '/Users/ivan/Projects/test-dvc-so/.dvc/cache/files/md5'
2024-02-08 20:48:30,173 DEBUG: Collecting status from '/Users/ivan/Projects/test-dvc-so/.dvc/cache/files/md5'
2024-02-08 20:48:30,174 DEBUG: built tree 'object 3d7dd9c155ee06ec6ff8fa04e49f49fe.dir'
2024-02-08 20:48:30,174 DEBUG: Removing '/Users/ivan/Projects/test-dvc-so/artifacts/.z_523r89dhvz_hXD3vW61g.tmp'
2024-02-08 20:48:30,174 DEBUG: Removing '/Users/ivan/Projects/test-dvc-so/artifacts/.z_523r89dhvz_hXD3vW61g.tmp'
2024-02-08 20:48:30,174 DEBUG: Removing '/Users/ivan/Projects/test-dvc-so/.dvc/cache/files/md5/.OiTk5AM8wHoSOkDtcj45sA.tmp'
2024-02-08 20:48:30,175 DEBUG: Removing '/Users/ivan/Projects/test-dvc-so/artifacts/data_ingestion/test_data.csv'
2024-02-08 20:48:30,177 DEBUG: stage: 'data_ingestion' was reproduced
Updating lock file 'dvc.lock'

To track the changes with git, run:

	git add dvc.lock

To enable auto staging, run:

	dvc config core.autostage true
Use `dvc push` to send your updates to remote storage.
2024-02-08 20:48:30,182 DEBUG: Analytics is enabled.
2024-02-08 20:48:30,222 DEBUG: Trying to spawn ['daemon', 'analytics', '/var/folders/8f/fbysfztx1mb953_gpwl477p80000gn/T/tmpf_cyrru9', '-v']
2024-02-08 20:48:30,226 DEBUG: Spawned ['daemon', 'analytics', '/var/folders/8f/fbysfztx1mb953_gpwl477p80000gn/T/tmpf_cyrru9', '-v'] with pid 6119

Expected

Running the stage.

Environment information

(.venv) √ Projects/test-dvc-so % dvc version
DVC version: 3.44.0 (pip)
-------------------------
Platform: Python 3.11.4 on macOS-13.3.1-arm64-arm-64bit
Subprojects:
	dvc_data = 3.11.0
	dvc_objects = 5.0.0
	dvc_render = 1.0.1
	dvc_task = 0.3.0
	scmrepo = 3.1.0
Supports:
	http (aiohttp = 3.9.3, aiohttp-retry = 2.8.3),
	https (aiohttp = 3.9.3, aiohttp-retry = 2.8.3)
Config:
	Global: /Users/ivan/Library/Application Support/dvc
	System: /Library/Application Support/dvc
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk3s1s1
Caches: local
Remotes: None
Workspace directory: apfs on /dev/disk3s1s1
Repo: dvc, git
Repo.site_cache_dir: /Library/Caches/dvc/repo/4883da32ce8435ea352f10b710b4a968
@shcheklein shcheklein added bug Did we break something? A: pipelines Related to the pipelines feature labels Feb 9, 2024
@skshetry
Copy link
Member

skshetry commented Feb 9, 2024

Note that this only happens if the param name is size or nfiles. 😅

@skshetry skshetry changed the title Pipeline is not executed even if parameter dep chaged Pipeline is not executed for parameter with name size or nfiles Feb 9, 2024
@skshetry skshetry added the p1-important Important, aka current backlog of things to do label Feb 9, 2024
@dberenbaum
Copy link
Contributor

@skshetry Do you have an idea for a fix? Or do we need to document these as reserved parameter names?

@skshetry
Copy link
Member

skshetry commented Feb 9, 2024

We are recursively excluding nfiles and size before "hashing" for stage cache, which is incorrect. But I have to think it through what impact this can have. Most likely, we'll be able to remove size and nfiles only from outputs that are not parameter dependencies.

return dict_sha256(cache, exclude=[Meta.PARAM_SIZE, Meta.PARAM_NFILES])

@skshetry skshetry removed their assignment Mar 4, 2024
@dberenbaum dberenbaum removed the p1-important Important, aka current backlog of things to do label Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: pipelines Related to the pipelines feature bug Did we break something?
Projects
No open projects
Status: Backlog
Development

No branches or pull requests

3 participants