-
Notifications
You must be signed in to change notification settings - Fork 18.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MARK and SQUASH builder instructions #12198
Conversation
Can you please sign your commits following these rules: https://github.com/docker/docker/blob/master/CONTRIBUTING.md#sign-your-work The easiest way to do this is to amend the last commit: $ git clone -b "builder-squash" git@github.com:burke/docker.git somewhere
$ cd somewhere
$ git commit --amend -s --no-edit
$ git push -f This will update the existing PR, so you do not need to open a new one. |
36bdd42
to
3ee6186
Compare
burke!!! cooooll!!!!! On Wed, Apr 8, 2015 at 2:43 PM, Gordon notifications@github.com wrote:
|
Interesting. Should I see this as a continuation of #8574 ( Some quick thoughts;
Will something like this be problematic?: MARK a
ADD a
ENV hello=world
RUN b
MARK b
ADD c
ENV hello=galaxy
SQUASH a foo
RUN echo $hello
SQUASH b bar Basically; nested squashes, and, not really sure how |
Good points; I think it makes sense to remove the first argument from both commands so that:
As for the rest:
What's actually going on under the hood is that all the layers that will be squashed are generated completely normally, then the squash does exactly the same thing as a git rebase -- it creates a new layer (using the last image ID, the description, and the number of layers squashed as the cache key) and orphan, but does not immediately nuke, all the squashed layers. This way, future builds can still take advantage of the cache. when re-building those orphaned layers.
Your multi-mark example looks like it might break Thanks for the feedback! |
Yes, I think that's reasonable; but it should be documented somewhere so that users are aware that this is the case.
+1; they can always be added in future, if the need arises Thanks for creating this! |
0601e77
to
6aa3559
Compare
|
e5476dc
to
17829f8
Compare
fe1e886
to
42d50a2
Compare
* MARK sets a mark * SQUASH squashes together all layers between a MARK and itself * SQUASH also renames the layer as viewed by `docker history` The scope of change required to the builder to implement this feature is surprisingly constrained. Signed-off-by: Burke Libbey <burke@libbey.me>
I like this feature as an optimization! I also appreciate the observations concerning SQASH's effect on an image's history and the exposure of a secret within the committed container (image cache) on the build host. However, the first two justifications arguing for SQUASH's inclusion: to eliminate secrets and build-time resources from the run-time image, aren't durable ones. It would be preferable for transient files and the processes that act upon them to never have been included in the resulting run-time image nor directly affected its environment. These reasons represent symptoms that are better addressed by isolating build-time and run-time concerns using mechanisms that don't rely on "sanitizing" the resultant image, as it's difficult to ensure complete removal of contaminates. Also, for secrets in particular, additional safeguards should be implemented to protect their visibility/accessibility by build-time processes. Since SQUASH doesn't affect the visibility of a secret, available through the build context, the secret is accessible to all processes executed by the Dockerfile. Therefore, a secret can be consumed by any of these processes and made public. This issue certainly restrains the desire to reuse Docker Hub images that employ Dockerfile commands like "ADD ." especially when attempting to build components requiring secrets, as the secret will be silently copied by the "ADD ." command. Here's a reference to a detailed explanation of this vulnerability. I hope you're not surprised by my position, as even visiting only the locomotive cab of the #8660 train would suggest the assessment above. There has been an effort to permit the inclusion of secrets via build time environment variable assignment. See pull request #9176. Finally, I sympathize with the desire to resolve issues, like build-time secrets, that resulted in the invention of SQUASH. Without a viable alternative, it might be prudent, especially in the short term, to ignore my position. |
It would be awesome if docker daemon had API endpoint for that.
I think that overall performance is way more important than caching and/or build time. I mean, yeah, it's very nice when I'm testing some image and I can continuously rebuild it in a matter of seconds. But in the end, if my image is 1 GB big and squashed version could take only 800 MB or even 600 MB and I really need to put a semi-complex bash script inside Dockerfile to reduce the final size and number of layers, I pick performance over build time. Also, I totally don't understand why instructions like Since cache was already mentioned: I really miss precise cache control. E.g. when I want to update all my packages in one |
|
I really think part of the problem with issues like this is that there are some fundamental differences around what a For example, if you view a Dockerfile as the "definition" of how to build a single Docker image then it makes sense that would might push back on the notion of However, if you view a Dockerfile more like a Makefile, a set of steps that you're trying to automate, then you'll want to add as many of those steps to the Dockerfile as you can because, as the author, you're the one who knows what's going on; knows what's supposed to be generated; and (let's be honest :-) ) we don't trust the person kicking off the build to remember to squash, turn off caching, etc.... so we need to be able to force those things to happen via the one common automation mechanism we have, Dockerfiles. The point of this... until we come to an agreement on what a Dockerfile is and what's its meant to be for, I think we'll continue to frustrate people on both sides of issues like this. It might be helpful if some old-timers on the project ( e.g. @shykes @crosbymichael etc...) could help set some direction on this topic. Personally, I view it more like a Makefile and want to push as much as possible into it so that the person running the build needs to do (and remember to do) as little as possible. If they really don't like what's in there, they can fork the Dockerfile. |
@duglin well put 👍 |
People have been asking for this for well over a year. Everytime we get a good solution, someone comes in an derails it. Why can't we have both docker squash and MARK/SQUASH in Dockerfile. I am tired of waiting for the "promised land", lets give users/developers something to fix this problem, and continue working on breaking Dockerfile functionality into base primitives. |
Closing due to comment from #332 (comment)
|
This is fine (and IMO the correct course), but I'd like to point out that this has been the official position for 1.5 years. EDIT: By which I mean: refusing to address this deficiency in the builder because of other, better improvements that should be made instead. |
I think it can be defined very easily:
I personally don't care about this but can imagine that saving bandwidth can be very important for someone.
Yes.
Yes (due to performance).
Please, provide a link in this issue to the debate. |
Ok, reasonable, I guess
Obviously, this won't allow optimised images when using automated builds on Docker Hub. Which makes it not very useful, or, just as useful as creating an image interactively and committing the result |
Excellent. My single, focused issue is reducing the overall size of images to preserve disk space on servers. Reducing total bytes transferred is useful, but secondary. My standard Dockerfile development process tends towards starting with multiple run statements to gain the benefits of caching on slow operations: RUN git checkout https://github.com/foo/bar.git
RUN cd /bar && make && make install && cd ..
RUN rm -rf bar Then when the build succeeds, I need to manually coalesce and rebuild to avoid including source files and intermediate build files in the image: RUN git checkout https://github.com/foo/bar.git && \
cd /bar && make && make install && cd .. && \
rm -rf bar This is obviously more work for me with no obvious gain, so either there's a problem in my methodology or the tool. This PR removes the extra work, with the SQUASH comment handily able to supplement my existing Dockerfile commenting.
Much like cloned git repos are always bigger than their archives for any non-trivial project, you can't have both a version-based layering system and the smallest image possible for its contents. The extra information has to be stored somehow, even if it has no practical use in the context you're using it. If the intermediate steps serve no purpose to the final product, why insist on them being kept to gain the benefits of build caching? The same applies for Git pull requests. You submit squashed commits based on discrete chunks of functionality, not based on the writing process that produced the code, so you have a more useful historical record, rather than a more complete but noisy one.
This is lovely, except that you can't have a Docker Hub trusted build and do this unless the squashing tool is also going to be integrated with Docker Hub. Merging this PR would improve transparency of Docker Hub images compared with the suggested fix, and Docker Hub doesn't benefit from build caching anyway. |
Any news on this issue? Maybe a link to the focused discussion @cpuguy83 ? |
@t128 This issue is closed and won't be supported. In master currently (and what will be Docker 1.10) the concept of a layer will be gone from the user perspective, though still there in implementation. |
@cpuguy83 Is there a discussion somewhere on the subject?
and
Until the distinction between those 2 dockerfiles goes away, users need to be aware of how their images are built. |
@cpuguy83 you've mention this on IRC too and I keep meaning to ask you about this. How are the layers gone? When I build an image I still the layers in |
@burke Thanks a lot for your work on this feature! Are you committed to maintain an up-to-date fork? Is there another approach which superseded your efforts? |
@wmark nope, sorry -- we work around this at a deeper level now by not using the builder in the first place. |
I guess the question though is there a difference now between
and
Are the temp files still present in the image somewhere? |
@rhatdan The fact that engine still use layers is unchanged and as such the resulting fs is also still the same. |
That is what I figured and the reason people still want squash. |
the new "solution" is to hide the fact from the user why his container size is exploding? |
@t128 you can read about some of the reasons for this change here; #17924, here https://gist.github.com/aaronlehmann/b42a2eaf633fc949f93b#new-image-config, and #18378 (comment) |
TL;DR
There are actually docs in the diff, but the 10-second overview is:
Summary
This PR adds two commands --
MARK
andSQUASH
-- to the builder. I've wanted some way to implement "transactions", whereby multiple instructions are ultimately reduced to a single layer, inDockerfile
s for a long time. This was by far the least invasive way to implement it.MARK
stashes a tiny bit of context on theBuilder
struct so thatSQUASH
can look it up later.SQUASH
squashes the indicated layers betweenMARK
andSQUASH
together into a single new layer.SQUASH
also gives the new layer a newContainerConfig.Cmd
, whose only apparent use is to show up indocker history
. This can be used to substantially prettifydocker history
output.Why is this necessary?
SQUASH
makes it possible to have temporary build-time secrets without them existing in the image hierarchy.ADD
some files, take some action with them, then to remove them. This causes the final image distribution to be much larger than necessary. If this can be compressed to a single layer, network and disk are used more efficiently.This
MARK
/SQUASH
would make our rather insane custom build process theoretically-achievable using only a handful of strategically-composedDockerfile
s.Conclusions...
There's still a lot of testing and polish to be applied here, but before I continue, I want to get a sense of how likely this is to be accepted; i.e. is the docker project still philosophically opposed to acknowledging the existence of layers from the context of a
Dockerfile
?Does this seem like the kind of that that could possibly move forward?
/cc Shopify peeps: @graemej @sirupsen @thegedge @shivnagarajan
Appendix A: Obvious ways to get secrets during a build
ADD
them,RUN
something, then remove the secrets. This has the unfortunate effect of the secrets existing in plaintext in theADD
layer. This is the case thatSQUASH
ameliorates.RUN
command. We did this in the past, and I've heard of others doing it, but I find it rather scary.