Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

checkout: use dvcignore #7963

Merged
merged 1 commit into from Jul 7, 2022
Merged

Conversation

alexmojaki
Copy link
Contributor

@alexmojaki alexmojaki commented Jul 2, 2022

Fixes #7374

Note that there are many places the ignore argument could be added instead. This is the deepest call, right before delegating to dvc_data. Not sure if that's the right choice.

Will add tests soon.

assert not foo_path.exists()

remove(tmp_dir / ".dvc/cache")
dvc.pull()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I change this to dvc.checkout() then it fails again with the same error about removing files without confirmation. I tried to fix this but got stuck, I'm not familiar enough with this codebase. I can share what I found while debugging. I'm actually not even sure if the error is expected or not. The issue gave me the impression that both pull and checkout should succeed, is that correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried it on my computer it also fails, but I noticed that if I status -c or fetch before the chekout then it will succeed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, so basically it seems that checkout breaks down when the cache is deleted, which seems fair and maybe not worth trying to change? fetch updates the cache, I'm guessing status -c does too. And pull does a fetch before checkout.

Do you know if there are other ways to trigger this bug of checkout previously not ignoring files properly that don't involve deleting the cache?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

status -c doesn't pull down the cache but it will pull down the .dir index. And it is used in generating a data tree object.

obj = self.get_obj(filter_info=filter_info)

I think the current problem is that, in dvc-data the diff function didn't take dvcignore into account when the tree object is missing.

https://github.com/iterative/dvc-data/blob/010cdfb0527209eebbcec4a33c476cf4f532f963/src/dvc_data/checkout.py#L252

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To solve this completely solve this problem, some modifications to dvc-data is required.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you like to solve it yourself or wait for someone else to do this? But I think it is different problem. We can merge this one as it already solved one part of the question.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how to solve it myself, someone else should do this.

@alexmojaki alexmojaki marked this pull request as ready for review July 3, 2022 18:59
@alexmojaki alexmojaki requested a review from a team as a code owner July 3, 2022 18:59
@@ -583,6 +583,7 @@ def _checkout(self, *args, **kwargs):
from dvc_data.checkout import CheckoutError as _CheckoutError
from dvc_data.checkout import LinkError, PromptError

kwargs.setdefault("ignore", self.dvcignore)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only two places are calling this _checkout method. One in line 583 had already had the ignore argument, and another one is in line 785 in this file. I think we should either remove both of them and add a default value to ignore or remove the default one but pass them explicitly in the calls.

@karajan1001
Copy link
Contributor

Thanks for your PR @alexmojaki.

@karajan1001
Copy link
Contributor

One thing remained, please use rebase -i to squash all of the commits into one.

@efiop efiop self-requested a review July 6, 2022 08:11
@alexmojaki alexmojaki force-pushed the checkout-ignore branch 2 times, most recently from 1b0fbb1 to 1325f19 Compare July 6, 2022 11:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

checkout/pull: dvcignored contents are removed on checkout
2 participants