Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

old timestamp error in datalad export-to-figshare #3753

Open
dnkennedy opened this issue Oct 5, 2019 · 3 comments · May be fixed by #7450
Open

old timestamp error in datalad export-to-figshare #3753

dnkennedy opened this issue Oct 5, 2019 · 3 comments · May be fixed by #7450

Comments

@dnkennedy
Copy link

dnkennedy commented Oct 5, 2019

What is the problem?

Using

> datalad export-to-figshare"

I get an error:
[INFO   ] Exporting current tree as an archive under /Users/davidkennedy/ReproKwyk/reprokwyk since figshare does not support directories 
[**ERROR**  ] ZIP does not support timestamps before 1980 [zipfile.py:__init__:357] (ValueError)


Examining my directory shows me:
lrwxr-xr-x  1 davidkennedy  staff  125 Dec 31  1969 kwyk-img/manifest.json -> ../.git/annex/objects/v3/F2/MD5E-s830--3ab5a2ca526be51ac9234e4a2b6cd560.json/MD5E-s830--3ab5a2ca526be51ac9234e4a2b6cd560.json
lrwxr-xr-x  1 davidkennedy  staff  115 Dec 31  1969 kwyk-img/repositories -> ../.git/annex/objects/jX/Vx/MD5E-s101--6093c6e8ea97699f56198fb9dcef12e4/MD5E-s101--6093c6e8ea97699f56198fb9dcef12e4

which indeed are pretty old. How did I get a 'kwyk-img/manifest.json' or 'kwyk-img/repositories' file, you might ask?

Well. I had put the kwyk docker image under datalad containers control:
"datalad containers-add -i kwyk-img -u dhub://neuronets/kwyk:latest-cpu kwyk"

and done a few datalad containers-run procedures, such as:
"datalad containers-run -n kwyk --input anat.nii --output "kwyk-output*" -- -m ..."

What version of DataLad are you using (run datalad --version)? On what operating system (consider running datalad wtf)?

datalad 0.11.7

You asked:
datalad wtf

WTF

configuration <SENSITIVE, report disabled by configuration>

datalad

  • version: 0.11.7
  • full_version: 0.11.7

dataset

  • path: /Users/davidkennedy/ReproKwyk/reprokwyk
  • repo: AnnexRepo
  • metadata: <SENSITIVE, report disabled by configuration>

dependencies

  • cmd:annex: 7.20190819
  • tqdm: 4.32.1
  • cmd:git: 2.23.0
  • cmd:bundled-git: UNKNOWN
  • cmd:system-git: 2.23.0
  • cmd:system-ssh: 7.4p1
  • appdirs: 1.4.3
  • boto: 2.49.0
  • git: 3.0.2
  • gitdb: 2.0.5
  • humanize: 0.5.1
  • iso8601: 0.1.12
  • keyring: 19.2.0
  • keyrings.alt: 3.1.1
  • msgpack: 0.6.1
  • requests: 2.22.0
  • six: 1.12.0
  • wrapt: 1.11.2

environment

  • PATH: /Users/davidkennedy/miniconda3/bin:/Users/davidkennedy/miniconda3/condabin:/Library/Frameworks/Python.framework/Versions/3.7/bin:/Users/davidkennedy/.rvm/gems/ruby-2.3.7/bin:/Users/davidkennedy/.rvm/gems/ruby-2.3.7@global/bin:/Users/davidkennedy/.rvm/rubies/ruby-2.3.7/bin:/Users/davidkennedy/anaconda/bin:/Users/davidkennedy/bin:/Users/davidkennedy/Library/Python/2.7/bin:/Applications/freesurfer/bin:/Applications/freesurfer/fsfast/bin:/Applications/freesurfer/tktools:/usr/local/fsl/bin:/Applications/freesurfer/mni/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/opt/X11/bin:/Library/TeX/texbin:/Users/davidkennedy/.rvm/bin:/Users/davidkennedy/abin
  • LANG: en_US.UTF-8

extensions

  • container:
    • load_error: None
    • description: Containerized environments
    • module: datalad_container
    • version: 0.5.0
    • entrypoints:
      • datalad_container.containers_list.ContainersList:
        • module: datalad_container.containers_list
        • class: ContainersList
        • names:
          • containers-list
          • containers_list
        • load_error: None
      • datalad_container.containers_remove.ContainersRemove:
        • module: datalad_container.containers_remove
        • class: ContainersRemove
        • names:
          • containers-remove
          • containers_remove
        • load_error: None
      • datalad_container.containers_add.ContainersAdd:
        • module: datalad_container.containers_add
        • class: ContainersAdd
        • names:
          • containers-add
          • containers_add
        • load_error: None
      • datalad_container.containers_run.ContainersRun:
        • module: datalad_container.containers_run
        • class: ContainersRun
        • names:
          • containers-run
          • containers_run
        • load_error: None

git-annex

  • version: 7.20190819
  • build flags:
    • Assistant
    • Webapp
    • Pairing
    • S3
    • WebDAV
    • FsEvents
    • TorrentParser
    • MagicMime
    • Feeds
    • Testsuite
  • dependency versions:
    • aws-0.21.1
    • bloomfilter-2.0.1.0
    • cryptonite-0.26
    • DAV-1.3.3
    • feed-1.2.0.0
    • ghc-8.6.5
    • http-client-0.6.4
    • persistent-sqlite-2.10.5
    • torrent-10000.1.1
    • uuid-1.3.13
    • yesod-1.6.0
  • key/value backends:
    • SHA256E
    • SHA256
    • SHA512E
    • SHA512
    • SHA224E
    • SHA224
    • SHA384E
    • SHA384
    • SHA3_256E
    • SHA3_256
    • SHA3_512E
    • SHA3_512
    • SHA3_224E
    • SHA3_224
    • SHA3_384E
    • SHA3_384
    • SKEIN256E
    • SKEIN256
    • SKEIN512E
    • SKEIN512
    • BLAKE2B256E
    • BLAKE2B256
    • BLAKE2B512E
    • BLAKE2B512
    • BLAKE2B160E
    • BLAKE2B160
    • BLAKE2B224E
    • BLAKE2B224
    • BLAKE2B384E
    • BLAKE2B384
    • BLAKE2BP512E
    • BLAKE2BP512
    • BLAKE2S256E
    • BLAKE2S256
    • BLAKE2S160E
    • BLAKE2S160
    • BLAKE2S224E
    • BLAKE2S224
    • BLAKE2SP256E
    • BLAKE2SP256
    • BLAKE2SP224E
    • BLAKE2SP224
    • SHA1E
    • SHA1
    • MD5E
    • MD5
    • WORM
    • URL
  • remote types:
    • git
    • gcrypt
    • p2p
    • S3
    • bup
    • directory
    • rsync
    • web
    • bittorrent
    • webdav
    • adb
    • tahoe
    • glacier
    • ddar
    • git-lfs
    • hook
    • external
  • operating system: darwin x86_64
  • supported repository versions:
    • 5
    • 7
  • upgrade supported from repository versions:
    • 0
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
  • local repository version: 5

location

  • path: /Users/davidkennedy/ReproKwyk/reprokwyk
  • type: dataset

metadata_extractors

  • annex:
    • module: datalad.metadata.extractors.annex
    • version: None
    • load_error: None
  • audio:
    • module: datalad.metadata.extractors.audio
    • load_error: No module named 'mutagen' [audio.py::17]
  • datacite:
    • module: datalad.metadata.extractors.datacite
    • version: None
    • load_error: None
  • datalad_core:
    • module: datalad.metadata.extractors.datalad_core
    • version: None
    • load_error: None
  • datalad_rfc822:
    • module: datalad.metadata.extractors.datalad_rfc822
    • version: None
    • load_error: None
  • exif:
    • module: datalad.metadata.extractors.exif
    • load_error: No module named 'exifread' [exif.py::16]
  • frictionless_datapackage:
    • module: datalad.metadata.extractors.frictionless_datapackage
    • version: None
    • load_error: None
  • image:
    • module: datalad.metadata.extractors.image
    • load_error: No module named 'PIL' [image.py::16]
  • xmp:
    • module: datalad.metadata.extractors.xmp
    • load_error: No module named 'libxmp' [xmp.py::20]

system

  • type: posix
  • name: Darwin
  • release: 16.7.0
  • version: Darwin Kernel Version 16.7.0: Sun Jun 2 20:26:31 PDT 2019; root:xnu-3789.73.50~1/RELEASE_X86_64
  • distribution: 10.12.6/x86_64
  • max_path_length: 295
  • encoding:
    • default: utf-8
    • filesystem: utf-8
    • locale.prefered: UTF-8
@kyleam
Copy link
Contributor

kyleam commented Oct 7, 2019

Thanks for the report.

Those files come from a tar file generated by docker save. Overriding the date to a fixed date (in particular, the Unix epoch, which is the date you're seeing) is common when trying to generate bit-for-bit reproducible output.

For an immediate workaround, you can adjust those files on your system to have a later date (e.g., with touch). To avoid this in the future, the docker adapter extension could update the file timestamps after extraction. I'll open an issue on datalad-container's end.

@yarikoptic
Copy link
Member

But I also feel that we better provide safely net within datalad itself, eg verifying that time stamps are in, and if not asking user permission to correct them on their behalf. What do you think?

@kyleam
Copy link
Contributor

kyleam commented Oct 7, 2019

But I also feel that we better provide safely net within datalad itself, eg verifying that time stamps are in, and if not asking user permission to correct them on their behalf. What do you think?

No, in my view datalad should not stat every file that it zips and offer to adjust it for the user if it's older than 1980 (or newer than 2107).

Looking forward, datalad could use the strict_timestamps option that will come with Python 3.8.

adswa added a commit to adswa/datalad that referenced this issue Jul 3, 2023
This is done using zipfile.ZipFile's strict_timestamps option, which can be
used as soon as Python 3.8 is the minimum supported Python version.
Fixes datalad#3753.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants