Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exp: speed up repro execution with untracked directories in workspace #7786

Merged
merged 1 commit into from Jun 7, 2022

Conversation

dtrifiro
Copy link
Contributor

When large untracked directories are present in the workspace, a lot of time is spent collecting untracked files in scm.status and scm.is_dirty (which also calls status). scmrepo==0.0.23 adds untracked_files=no flag which avoids collecting untracked files.

@dtrifiro

This comment was marked as outdated.

@dtrifiro

This comment was marked as outdated.

@dtrifiro dtrifiro force-pushed the feature/speed-up-repro branch 3 times, most recently from 77c1207 to 5add9bc Compare June 1, 2022 07:53
When large untracked directories are present in the workspace, a lot of
time is spent collecting untracked files in `scm.status` and `scm.is_dirty`.
@dtrifiro dtrifiro marked this pull request as ready for review June 7, 2022 11:41
@dtrifiro dtrifiro requested a review from a team as a code owner June 7, 2022 11:41
@dtrifiro dtrifiro requested a review from pmrowla June 7, 2022 11:41
@pmrowla pmrowla added A: experiments Related to dvc exp bugfix fixes bug labels Jun 7, 2022
@pmrowla pmrowla merged commit e849162 into iterative:main Jun 7, 2022
@dtrifiro dtrifiro deleted the feature/speed-up-repro branch June 10, 2022 11:38
Comment on lines +401 to 402
staged, _, _ = self.scm.status(untracked_files="no")
if staged:
Copy link
Member

@skshetry skshetry Jul 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While having some discussions related to using pygit2 for status for performance reasons with @dtrifiro, I noticed that this is equivalent to git diff --staged --quiet. So, this is equivalent to following:

Suggested change
staged, _, _ = self.scm.status(untracked_files="no")
if staged:
if self.scm.pygit2.repo.diff(cached=True):

Leaving comment for posterity's sake. The performance with dulwich + untracked_files is good enough and is comparable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: experiments Related to dvc exp bugfix fixes bug
Projects
No open projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

3 participants