import: hangs when pulling many files from GCS remote and one fails #10417
Labels
A: data-sync
Related to dvc get/fetch/import/pull/push
performance
improvement over resource / time consuming tasks
Bug Report
import/pull: hangs when pulling many files from GCS and one (or a few) fails
Description
I have a directory in a data registry that is tracked with DVC which contains ~1.6k files. I pushed it to a Google Cloud Storage remote, and now I am trying to either
dvc import
ordvc pull
the data from the GCS remote to a new machine. This works well for ~99% of the files, but sometimes a few seem to simply fail silently and never actually download.When this happens, the entire process hangs once all successful downloads have completed, leaving only the hung/failed/paused (not sure) downloads remaining. They never seem to resume and I am forced to
Ctrl+C
out of the process.This is less of an issue when using
dvc pull
as I can simply re-run the command and it will only download the missing files, but withdvc import
I am forced to re-run the process.Reproduce
dvc init
(data registry project)dvc add <large directory containing many moderately-sized files (each around 30MB)>
dvc remote add <google cloud storage bucket>
dvc push <google cloud storage>
git commit -am "commit msg" && git push
git init <new-project>
dvc import https://github.com/<data-registry-repo> path/to/large-dir
Most files finish downloading, but some fail (or hang) silently and prevent the entire
dvc import
from completing:I eventually have to
Ctrl-C
to exit the process or else it hangs for hours. Here is the full output before and after theCtrl-C
interrupt:Expected
I expect
dvc import
to detect download timeouts and either restart the hanging download or to complete the process with a warning and to then allow for a partialdvc import
to attempt to gather only the missing files instead of requiring a complete re-download of the data.Environment information
Output of
dvc doctor
:Additional Information (if any):
The text was updated successfully, but these errors were encountered: