Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

COPY layer uses cached data despite file content changes in build context #4817

Open
100paperkite opened this issue Apr 2, 2024 · 2 comments

Comments

@100paperkite
Copy link

100paperkite commented Apr 2, 2024

Description

In the COPY layer, even if the build context to be copied has changed, if the metadata(filename, size, modified time, ... etc) of the changed file remains the same, it appears that cached data is used instead of copying a new file.
If I build after running buildctl prune, this problem disappears. I think internal buildkit cache(?) is used.

(+ It's not a layer cache. In this scenario, only some files are changed in the layer, and the output of the docker layer build process does not show as CACHED)

Reproduce

reproducible repository: https://github.com/100paperkite/buildkit-cache-issue

Run below command. This command generates a random number with the number of digits(to make file size equal) passed as an argument, creates 2 files(with the number as the filename and as its content), and then builds a COPY layer that copies these files.
In the first build, it has the same files as the build context, but in the second build, it has the files that were built previously.

./gen 5 # <digit of number>

first build and running output

...

 => [2/3] WORKDIR /app                                                                                                                                                                                                                                                                             0.1s
 => [3/3] COPY output/ 

...

Loaded image: 5494:latest
total 8
-rw-rw-r--    1 root     root             5 Jan  1  1970 5494
-rw-rw-r--    1 root     root             5 Jan  1  1970 same
equal

second build and running output

...

 => CACHED [2/3] WORKDIR /app                                                                                                                                                                                                                                                                      0.0s
 => [3/3] COPY output/ .  

...

Loaded image: 9372:latest
total 8
-rw-rw-r--    1 root     root             5 Jan  1  1970 9372
-rw-rw-r--    1 root     root             5 Jan  1  1970 same
not equal. 5494 value in 'same' file.

Expected behavior

Files in the build context should be copied identically.

Environment

  • buildctl
  • version: github.com/moby/buildkit v0.13.0 2afc050
@tonistiigi
Copy link
Member

This is expected. Transferring file changes from same local directory to the daemon uses the same semantics that rsync does by default(based only on metadata). You need to intentionally reset the timestamp to hit this.

Because some users hit this, we changed the semantics for the case of uploading the Dockerfile couple of releases ago, where it now always transfers full Dockerfile again.

@100paperkite
Copy link
Author

100paperkite commented Apr 8, 2024

Thanks for your response. It seems like only metadata is being considered due to performance issues.
Is there any option to check the file contents as well, like checksum option in rsync?

The reason is that for reproducible container builds, I'm currently aligning the file timestamps in the build context using the touch command. (I'm aware of the rewrite-timestamp option, but even with that option, there have been cases where the SHA of image differs, and manual adjustment of timestamps has resolved the issue)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants