Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scm: fix clone #7674

Merged
merged 1 commit into from Jun 7, 2022
Merged

scm: fix clone #7674

merged 1 commit into from Jun 7, 2022

Conversation

dtrifiro
Copy link
Contributor

@dtrifiro dtrifiro commented May 2, 2022

fetch_all_exp() in clone() was called with url="origin", which resulted in the operation being performed with the remote URL defined in the cloned repo's config, which did not include any credentials initially provided to clone.

Fixes #7670

@dtrifiro dtrifiro requested a review from a team as a code owner May 2, 2022 10:00
@dtrifiro dtrifiro requested a review from skshetry May 2, 2022 10:00
@efiop efiop requested review from karajan1001 and pmrowla and removed request for skshetry May 2, 2022 10:01
@dtrifiro dtrifiro marked this pull request as draft May 2, 2022 10:37
@dtrifiro dtrifiro changed the title scm: fix clone [WIP] scm: fix clone May 2, 2022
@dtrifiro
Copy link
Contributor Author

dtrifiro commented May 2, 2022

Seems windows tests are failing. Will look at this in a bit, possibly adding some tests for this scenario

@pmrowla
Copy link
Contributor

pmrowla commented May 3, 2022

fetch should work with git remote names, so I would expect origin to work after clone (origin should be a git remote name pointing to this same URL). Maybe there's an underlying issue with the HTTP token/credentials from the URL string not getting written to the git config for origin when we clone?

I'd be willing to merge this as a temporary workaround, but when we get into actually caching the erepo clones, we will need to make sure that we are properly handing URL string credentials on clone.

@karajan1001
Copy link
Contributor

dvc.scm.InvalidRemoteSCMRepo: 'file://C:\Users\runneradmin\AppData\Local\Temp\pytest-of-runneradmin\pytest-0\popen-gw0\erepo3' is not a valid Git remote or URL

Looks like under the Windows platform, The prefix file:// can not work properly before C:\

@dtrifiro
Copy link
Contributor Author

dtrifiro commented May 3, 2022

Maybe there's an underlying issue with the HTTP token/credentials from the URL string not getting written to the git config for origin when we clone?

Indeed. The issue is that when the repo is cloned, the token is stripped from the URL saved in .git/config, so when trying to fetch from origin, the token is missing, thus making fetch fail.

I'd be willing to merge this as a temporary workaround, but when we get into actually caching the erepo clones, we will need to make sure that we are properly handing URL string credentials on clone.

Do you have anything in mind? Storing tokens in credential.helper, if available, maybe?

@guyrosin
Copy link

guyrosin commented May 3, 2022

Sorry to jump in but it seems Poetry has just dealt with a similar issue a few days ago (python-poetry/poetry#5428), maybe you can use a similar approach?
They provide a fallback to system-provided git client in case of an HTTPUnauthorized error - until Dulwich will start supporting git credentials.

@pmrowla
Copy link
Contributor

pmrowla commented May 3, 2022

Do you have anything in mind? Storing tokens in credential.helper, if available, maybe?

Ideally we would implement support for credential helpers in either dulwich (jelmer/dulwich#873) or libgit2/pygit (libgit2/libgit2#4873)

Sorry to jump in but it seems Poetry has just dealt with a similar issue a few days ago (python-poetry/poetry#5428), maybe you can use a similar approach?

Yes, this would also be an option. If we raise NotImplementedError in the underlying scmrepo dulwich/pygit fetch_refspecs implementations upon getting a credentials related exception, it would force falling back to gitpython/CLI git. (fetch_refpecs is not actually implemented in our gitpython backend right now, but would be simple to add)

But IMO we should be avoiding re-introducing the use of gitpython in DVC if at all possible, and personally I would prefer the workaround from this PR as a temporary fix until we implement credentials support in dulwich over using gitpython as a fallback for fetch_refspecs.

@dtrifiro
Copy link
Contributor Author

dvc.scm.InvalidRemoteSCMRepo: 'file://C:\Users\runneradmin\AppData\Local\Temp\pytest-of-runneradmin\pytest-0\popen-gw0\erepo3' is not a valid Git remote or URL

Finally had a chance to look at this. It seems that the issue is in the tests: we use a file:// prefix for the local directory so that the local directory is treated as an url and an erepo clone is forced.

This works for Git.clone because scmrepo has a dulwich clone-specific workaround for file urls on windows but, since dulwich does not support windows file:// urls, this fails for fetch_all_exps.

dvc/dvc/scm.py

Lines 126 to 129 in dc1d3e8

git = Git.clone(url, to_path, progress=pbar.update_git, **kwargs)
if "shallow_branch" not in kwargs:
fetch_all_exps(git, "origin", progress=pbar.update_git)
return git

It seems to me that the cleanest solution is to add support for file:// urls on windows to dulwich, which is not major (patch).

@dtrifiro
Copy link
Contributor Author

Windows tests should succeed once scmrepo 0.0.23 iterative/scmrepo#72 is released and we bump its version in dvc.

@dtrifiro dtrifiro force-pushed the fix/7670-get-failure branch 3 times, most recently from a1a8df2 to 5a74202 Compare May 31, 2022 19:44
@dtrifiro
Copy link
Contributor Author

dtrifiro commented May 31, 2022

merge after scmrepo==0.0.23 version bump #7831

@dtrifiro dtrifiro marked this pull request as ready for review May 31, 2022 20:08
@dtrifiro dtrifiro changed the title [WIP] scm: fix clone scm: fix clone May 31, 2022
@pmrowla
Copy link
Contributor

pmrowla commented Jun 2, 2022

@dtrifiro looks like theres a linter issue?

@dtrifiro
Copy link
Contributor Author

dtrifiro commented Jun 2, 2022

@dtrifiro looks like theres a linter issue?

Indeed, but it was not related to this PR and was fixed in #7836. Just rebased.

`fetch_all_exp()` in `clone()` was called with `url="origin"`, which
resulted in the operation being performed with the remote URL defined
in the cloned repo's config, which did not include any credentials
initially provided to clone.

Fixes iterative#7670
@efiop
Copy link
Member

efiop commented Jun 7, 2022

@pmrowla Could you take a look, please?

Copy link
Contributor

@pmrowla pmrowla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved w/the note that this is a temporary workaround until #7674 is supported

@pmrowla pmrowla merged commit 41ecd2e into iterative:main Jun 7, 2022
@dtrifiro dtrifiro deleted the fix/7670-get-failure branch June 7, 2022 14:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

get: fails to clone because "no valid credentials provided"
5 participants