Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Jib core] Tar archives with same contents are not reproducible #3158

Closed
davidtron opened this issue Mar 20, 2021 · 2 comments · Fixed by #3159
Closed

[Jib core] Tar archives with same contents are not reproducible #3158

davidtron opened this issue Mar 20, 2021 · 2 comments · Fixed by #3159

Comments

@davidtron
Copy link
Contributor

davidtron commented Mar 20, 2021

Environment:

  • Jib version: v0.18.0-core
  • Build tool: Gradle 6.01
  • OS: macOS 10.15.7

Description of the issue:
I create a tar file from a registry image.
Jib.from(RegistryImage.named(image)).containerize(Containerizer.to(TarImage.at(outputPath).named(image)));

This is being executed as part of a bazel genrule, with the tar file as the output. We discovered that if this is rerun, the tar file created is different when using the same inputs, run on the same machine but at a later time.

Expected behavior:
We expect the tar file to be identical if the inputs have not changed.

Steps to reproduce:

  1. Create tar file using Jib as above
  2. Rename the tar file output.tar
  3. Wait 5 seconds, and create another tar file using Jib as above, rename the tar output2.tar
  4. shassum -a 256 output.tar
  5. shassum -a 256 output2.tar

Additional Information:
The issue is due to TarStreamBuilder creating blobs using new TarArchiveEntry(name)
By default this sets the mod time of the blob to be current time, thus making the resulting Tar image not reproducible.

this.modTime = new Date().getTime() / MILLIS_PER_SECOND;

#3159 3159

@chanseokoh
Copy link
Member

chanseokoh commented Mar 23, 2021

Thanks for the PR!

So a "tar image" is just a local archive (for whatever purpose), simply grabbing necessary files (such as image layer tars, container config JSON, etc) that can constitute a complete image when thought conceptually. AFAIK, there's no standard about how one should create this archive. I believe this is out of the scope of Docker or OCI specifications, because it's simply irrelevant to them. All that matters in the end is how an image can actually be stored in a registry or a Docker daemon; it's irrelevant how one can temporarily carry necessary files from one machine to another until you finally store them in a registry.

IMO, it's debatable whether Jib should force a specific timestamp (e.g., epoch) for file entries when creating a "tar image" archive. For building a reproducible container image (not a "tar image"), it's unfortunate that file timestamps in a container image affect reproducibilty according to the current Docker and OCI specifications, and that tools (including Jib) have to make a painful compromise to force a specific timestamp for files inside a container; there's no other choice, and you can't achieve image reproducibilty otherwise. If the specifications had a way to achieve reproducibility without resetting file timestamps in a container image, we (and all other tools) would certainly have kept original file timestamps. OTOH, for a "tar image" archive, nothing forces us to reset timestamps of tar entries; why erase useful information (i.e., timestamp, which many people think is important) for no good reason?

However, I know the Bazel philosophy where ensuring reproducibility of every build artifact is so fundamental to the build system, so I understand where this is coming from. AFAIK, for example, when you create a zip file in Bazel (using the general rules_pkg mechanism), the Bazel rule resets the timestamp of all the entries in the zip file to be like 1999-12-31, because that's the only way to achieve reproducibility of a zip file. This kind of behavior is really unique to the Bazel build system.

But I think it's not unreasonable to reset timestamps of tar entries when creating a "tar image." I don't think people will complain if we do so.

davidtron added a commit to davidtron/jib that referenced this issue Apr 8, 2021
… contents are not reproducible"

This reverts commit 562e2b8
davidtron added a commit to davidtron/jib that referenced this issue Apr 8, 2021
davidtron added a commit to davidtron/jib that referenced this issue Apr 9, 2021
davidtron added a commit to davidtron/jib that referenced this issue Apr 9, 2021
davidtron added a commit to davidtron/jib that referenced this issue Apr 12, 2021
davidtron added a commit to davidtron/jib that referenced this issue Apr 12, 2021
davidtron added a commit to davidtron/jib that referenced this issue Apr 12, 2021
davidtron added a commit to davidtron/jib that referenced this issue Apr 12, 2021
davidtron added a commit to davidtron/jib that referenced this issue Apr 12, 2021
@chanseokoh chanseokoh added this to the common next release milestone Apr 13, 2021
chanseokoh added a commit that referenced this issue Apr 13, 2021
…le (#3159)

* #3158 - [Jib core] Tar archives with same contents are not reproducible

* Revert "#3158 - [Jib core] Tar archives with same contents are not reproducible"
This reverts commit 562e2b8

* #3158 - [Jib core] Tar archives with same contents are not reproducible

* Update jib-core/src/main/java/com/google/cloud/tools/jib/image/ImageTarball.java

Co-authored-by: Chanseok Oh <chanseok@google.com>

* Update jib-core/src/test/java/com/google/cloud/tools/jib/tar/TarStreamBuilderTest.java

Co-authored-by: Chanseok Oh <chanseok@google.com>

* Update jib-core/src/test/java/com/google/cloud/tools/jib/tar/TarStreamBuilderTest.java

Co-authored-by: Chanseok Oh <chanseok@google.com>

* Update jib-core/src/main/java/com/google/cloud/tools/jib/tar/TarStreamBuilder.java

Co-authored-by: Chanseok Oh <chanseok@google.com>

* Update jib-core/src/main/java/com/google/cloud/tools/jib/tar/TarStreamBuilder.java

Co-authored-by: Chanseok Oh <chanseok@google.com>

* #3158 - [Jib core] Tar archives with same contents are not reproducible

* #3158 - [Jib core] Tar archives with same contents are not reproducible

* #3158 - [Jib core] Tar archives with same contents are not reproducible

* #3158 - [Jib core] Tar archives with same contents are not reproducible

* #3158 - [Jib core] Tar archives with same contents are not reproducible

* #3158 - [Jib core] Tar archives with same contents are not reproducible

* #3158 - [Jib core] Tar archives with same contents are not reproducible

Co-authored-by: Chanseok Oh <chanseok@google.com>
@mpeddada1
Copy link
Contributor

@davidtron thanks again for your contribution! Jib-Core 0.19.0 has been release your fix (#3158).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment