Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MARK and SQUASH builder instructions #12198

Closed
wants to merge 1 commit into from

Conversation

burke
Copy link
Contributor

@burke burke commented Apr 8, 2015

TL;DR

There are actually docs in the diff, but the 10-second overview is:

FROM scratch
MAINTAINER me
MARK
ADD a
RUN b
SQUASH ">> important things <<"
$ docker history ${id}
IMAGE               CREATED                  CREATED BY                          SIZE
2a1491d233a1        Less than a second ago   >> important things <<              20 KB
df7546f9f060        6 months ago             /bin/sh -c #(nop) MAINTAINER me     0 B
511136ea3c5a        22 months ago                                                0 B
$

Summary

This PR adds two commands -- MARK and SQUASH -- to the builder. I've wanted some way to implement "transactions", whereby multiple instructions are ultimately reduced to a single layer, in Dockerfiles for a long time. This was by far the least invasive way to implement it.

  • MARK stashes a tiny bit of context on the Builder struct so that SQUASH can look it up later.
  • SQUASH squashes the indicated layers between MARK and SQUASH together into a single new layer.
  • SQUASH also gives the new layer a new ContainerConfig.Cmd, whose only apparent use is to show up in docker history. This can be used to substantially prettify docker history output.

Why is this necessary?

  1. At Shopify, some of our containers require secrets to be temporarily decrypted during build time. There are trivially-conceivable three strategies for this. Two of them are impossible currently, and one is sketchy. See Appendix A. SQUASH makes it possible to have temporary build-time secrets without them existing in the image hierarchy.
  2. It's fairly common to want to ADD some files, take some action with them, then to remove them. This causes the final image distribution to be much larger than necessary. If this can be compressed to a single layer, network and disk are used more efficiently.
  3. At Shopify, we enjoy having prettified image histories. For example, this is our current core platform production image history:
IMAGE               CREATED             CREATED BY                                      SIZE
3a342817ba4a        37 minutes ago      ##build 3 & finalize @d9c88f1c                  x MB
4a4f06aab8d0        9 hours ago         ##build 2 @1d9a653a                             y MB
5e004daccb40        2 days ago          ##build 1 @a1ecb04a                             z MB
38fee11f123e        2 weeks ago         ##prepare                                       w MB
623b6b3205b7        2 weeks ago         ##baseimage v1.3.11                             v MB
d0955f21bf24        2 weeks ago         /bin/sh -c #(nop) CMD [/bin/bash]               0 B
9fec74352904        2 weeks ago         /bin/sh -c sed -i 's/^#\s*\(deb.*universe\)$/   1.895 kB
a1a958a24818        2 weeks ago         /bin/sh -c echo '#!/bin/sh' > /usr/sbin/polic   194.5 kB
f3c84ac3a053        2 weeks ago         /bin/sh -c #(nop) ADD file:777fad733fc954c0c1   188.1 MB
511136ea3c5a        22 months ago                                                       0 B

This MARK/SQUASH would make our rather insane custom build process theoretically-achievable using only a handful of strategically-composed Dockerfiles.

Conclusions...

There's still a lot of testing and polish to be applied here, but before I continue, I want to get a sense of how likely this is to be accepted; i.e. is the docker project still philosophically opposed to acknowledging the existence of layers from the context of a Dockerfile?

Does this seem like the kind of that that could possibly move forward?

/cc Shopify peeps: @graemej @sirupsen @thegedge @shivnagarajan


Appendix A: Obvious ways to get secrets during a build

  1. Volumes. Volumes don't work during build.
  2. ADD them, RUN something, then remove the secrets. This has the unfortunate effect of the secrets existing in plaintext in the ADD layer. This is the case that SQUASH ameliorates.
  3. Operate a HTTP server, poke the secrets into it, and have the build retrieve, use, and delete them in a single RUN command. We did this in the past, and I've heard of others doing it, but I find it rather scary.

@GordonTheTurtle
Copy link

Can you please sign your commits following these rules:

https://github.com/docker/docker/blob/master/CONTRIBUTING.md#sign-your-work

The easiest way to do this is to amend the last commit:

$ git clone -b "builder-squash" git@github.com:burke/docker.git somewhere
$ cd somewhere
$ git commit --amend -s --no-edit
$ git push -f

This will update the existing PR, so you do not need to open a new one.

@burke burke force-pushed the builder-squash branch 2 times, most recently from 36bdd42 to 3ee6186 Compare April 8, 2015 21:44
@thaJeztah thaJeztah removed the dco/no label Apr 8, 2015
@jessfraz
Copy link
Contributor

jessfraz commented Apr 8, 2015

burke!!! cooooll!!!!!

On Wed, Apr 8, 2015 at 2:43 PM, Gordon notifications@github.com wrote:

Can you please sign your commits following these rules:

https://github.com/docker/docker/blob/master/CONTRIBUTING.md#sign-your-work

The easiest way to do this is to amend the last commit:

$ git clone -b "builder-squash" git@github.com:burke/docker.git somewhere
$ cd somewhere
$ git commit --amend -s --no-edit
$ git push -f

This will update the existing PR, so you do not need to open a new one.


Reply to this email directly or view it on GitHub
#12198 (comment).

@burke
Copy link
Contributor Author

burke commented Apr 8, 2015

cc @jvgogh because we were chatting about weird ways to work around this at the kubernetes thing the other month.

cc also @proppy @bwilkins @dysinger @WhisperingChaos via the #8660 train

@thaJeztah
Copy link
Member

Interesting. Should I see this as a continuation of #8574 (BEGIN .. COMMIT) - should that be closed?

Some quick thoughts;

  • Are squashed layers still cached individually? This may be important information for users, because those layers (possibly containing secrets) will persist on the server
  • I don't think naming marks is needed at this point (unless nested marks are possible); SQUASH without number could just squash everything since the last MARK; if no MARK exists, either bail out or squash everything since the beginning of the Dockerfile

Will something like this be problematic?:

MARK a
ADD a
ENV hello=world
RUN b

MARK b
ADD c
ENV hello=galaxy

SQUASH a foo
RUN echo $hello 
SQUASH b bar

Basically; nested squashes, and, not really sure how ENV is handled (not given that much thought yet as to what the expected result should be)

@burke
Copy link
Contributor Author

burke commented Apr 8, 2015

Good points; I think it makes sense to remove the first argument from both commands so that:

  1. Only one mark is ever active
  2. SQUASH must be called after MARK -- no more integer argument option.

As for the rest:

  • The squashed layers are still present on the server doing the building. For my purposes at least, this is acceptable because it's kind of taken for axiomatic that that server has access to generate secrets.

What's actually going on under the hood is that all the layers that will be squashed are generated completely normally, then the squash does exactly the same thing as a git rebase -- it creates a new layer (using the last image ID, the description, and the number of layers squashed as the cache key) and orphan, but does not immediately nuke, all the squashed layers. This way, future builds can still take advantage of the cache. when re-building those orphaned layers.

ENV is handled exactly as it should be, I think. The Config is just copied from the last image pre-squash to the new squashed image, which will necessarily include all ENV, etc.

Your multi-mark example looks like it might break ENV, but it would actually just squash away a bunch of layers, or error, or something. I don't know. I'll just remove mark names.

Thanks for the feedback!

@thaJeztah
Copy link
Member

For my purposes at least, this is acceptable because it's kind of taken for axiomatic that that server has access to generate secrets

Yes, I think that's reasonable; but it should be documented somewhere so that users are aware that this is the case.

I'll just remove mark names.

+1; they can always be added in future, if the need arises

Thanks for creating this!

@burke
Copy link
Contributor Author

burke commented Apr 8, 2015

MARK names are now completely gone. MARK takes no arguments, and SQUASH will always and only squash back to the most recent MARK.

@burke burke force-pushed the builder-squash branch 2 times, most recently from e5476dc to 17829f8 Compare April 8, 2015 23:46
@burke burke force-pushed the builder-squash branch 8 times, most recently from fe1e886 to 42d50a2 Compare April 9, 2015 15:48
* MARK sets a mark
* SQUASH squashes together all layers between a MARK and itself
* SQUASH also renames the layer as viewed by `docker history`

The scope of change required to the builder to implement this feature is
surprisingly constrained.

Signed-off-by: Burke Libbey <burke@libbey.me>
@WhisperingChaos
Copy link
Contributor

@burke

I like this feature as an optimization! I also appreciate the observations concerning SQASH's effect on an image's history and the exposure of a secret within the committed container (image cache) on the build host.

However, the first two justifications arguing for SQUASH's inclusion: to eliminate secrets and build-time resources from the run-time image, aren't durable ones. It would be preferable for transient files and the processes that act upon them to never have been included in the resulting run-time image nor directly affected its environment. These reasons represent symptoms that are better addressed by isolating build-time and run-time concerns using mechanisms that don't rely on "sanitizing" the resultant image, as it's difficult to ensure complete removal of contaminates.

Also, for secrets in particular, additional safeguards should be implemented to protect their visibility/accessibility by build-time processes. Since SQUASH doesn't affect the visibility of a secret, available through the build context, the secret is accessible to all processes executed by the Dockerfile. Therefore, a secret can be consumed by any of these processes and made public. This issue certainly restrains the desire to reuse Docker Hub images that employ Dockerfile commands like "ADD ." especially when attempting to build components requiring secrets, as the secret will be silently copied by the "ADD ." command. Here's a reference to a detailed explanation of this vulnerability.

I hope you're not surprised by my position, as even visiting only the locomotive cab of the #8660 train would suggest the assessment above.

There has been an effort to permit the inclusion of secrets via build time environment variable assignment. See pull request #9176.

Finally, I sympathize with the desire to resolve issues, like build-time secrets, that resulted in the invention of SQUASH. Without a viable alternative, it might be prudent, especially in the short term, to ignore my position.

@TomasTomecek
Copy link
Contributor

@cpuguy83

I don't like anything that exposes the caching implementation in the Dockerfile. I think it's the wrong approach.

Caching exists to speed up the build at the expense of extra storage.
Images can be squashed completely separate from the build process.

It would be awesome if docker daemon had API endpoint for that.

I think if on push (or optionally at the end of build it would keep the cache layers but also make a squashed version?) we squash down to the nearest tagged image this ought to be good enough. This would also preserve the distribution/storage benefits that layers provide by preserving the Base layers.

I think that overall performance is way more important than caching and/or build time. I mean, yeah, it's very nice when I'm testing some image and I can continuously rebuild it in a matter of seconds. But in the end, if my image is 1 GB big and squashed version could take only 800 MB or even 600 MB and I really need to put a semi-complex bash script inside Dockerfile to reduce the final size and number of layers, I pick performance over build time.

Also, I totally don't understand why instructions like ENV, VOLUME or EXPOSE create new layer. I feel like those should be completely separated from build itself. (I guess that's the same thing which was already mentioned here: build time vs. runtime)

Since cache was already mentioned: I really miss precise cache control. E.g. when I want to update all my packages in one RUN instruction, I would like to be sure that I get latest packages all the time (not cached version which is 1 month old). Controlling cache within dockerfile or via docker build could help, but that's certainly not ideal. (I mean that I would like to invalidate cache of some layers)

@duglin
Copy link
Contributor

duglin commented May 13, 2015

On May 13, 2015, at 11:47 AM, Tomas Tomecek notifications@github.com wrote:
Since cache was already mentioned: I really miss precise cache control. E.g. when I want to update all my packages in one RUN instruction, I would like to be sure that I get latest packages all the time (not cached version which is 1 month old). Controlling cache within dockerfile or via docker build could help, but that's certainly not ideal. (I mean that I would like to invalidate cache of some layers)

Be careful, you may be asking for #10682 to enable turning off caching at a certain point in the Dockefile processing :-)

@duglin
Copy link
Contributor

duglin commented May 14, 2015

I really think part of the problem with issues like this is that there are some fundamental differences around what a Dockerfile is and what is it for.

For example, if you view a Dockerfile as the "definition" of how to build a single Docker image then it makes sense that would might push back on the notion of NOCACHE or MARK/SQUASH in there because you may think that the person executing the build knows what's going on, and should have total control, and the only thing you care about is the end result - meaning the single final image - so squashing outside of the build process seems perfectly reasonable.

However, if you view a Dockerfile more like a Makefile, a set of steps that you're trying to automate, then you'll want to add as many of those steps to the Dockerfile as you can because, as the author, you're the one who knows what's going on; knows what's supposed to be generated; and (let's be honest :-) ) we don't trust the person kicking off the build to remember to squash, turn off caching, etc.... so we need to be able to force those things to happen via the one common automation mechanism we have, Dockerfiles.

The point of this... until we come to an agreement on what a Dockerfile is and what's its meant to be for, I think we'll continue to frustrate people on both sides of issues like this.

It might be helpful if some old-timers on the project ( e.g. @shykes @crosbymichael etc...) could help set some direction on this topic.

Personally, I view it more like a Makefile and want to push as much as possible into it so that the person running the build needs to do (and remember to do) as little as possible. If they really don't like what's in there, they can fork the Dockerfile.

@rdsubhas
Copy link

@duglin well put 👍

@rhatdan
Copy link
Contributor

rhatdan commented May 14, 2015

People have been asking for this for well over a year. Everytime we get a good solution, someone comes in an derails it. Why can't we have both docker squash and MARK/SQUASH in Dockerfile. I am tired of waiting for the "promised land", lets give users/developers something to fix this problem, and continue working on breaking Dockerfile functionality into base primitives.

@cpuguy83
Copy link
Member

Closing due to comment from #332 (comment)

Review with @duglin @crosbymichael @diogomonica @LK4D4

The problem with this issue is that it provides a solution to a problem that yet has to be defined

The bigger question is what is the end goal for users who need to squash layers? Faster pull/push? Removing some layers (with secrets)? Maximum number of layers?

It feels like all these problems are implementation details and solving them by exposing squashing is >probably the wrong way.

We suggest to address each independent issue you guys have, in more focused separate issues.

For relieving the immediate pain, we also suggest updating the documentation on how to manually squash layers if people really need it, with external tools. @crosbymichael volunteered on writing such a tool.

We're closing this issue. Would love to continue the debate on more focused issues.

@cpuguy83 cpuguy83 closed this May 14, 2015
@burke
Copy link
Contributor Author

burke commented May 14, 2015

We suggest to address each independent issue you guys have, in more focused separate issues.

This is fine (and IMO the correct course), but I'd like to point out that this has been the official position for 1.5 years.

EDIT: By which I mean: refusing to address this deficiency in the builder because of other, better improvements that should be made instead.

@TomasTomecek
Copy link
Contributor

The problem with this issue is that it provides a solution to a problem that yet has to be defined

I think it can be defined very easily:

  • we can't remove "secrets" from final image
  • we can't precisely control number of layers

Faster pull/push?

I personally don't care about this but can imagine that saving bandwidth can be very important for someone.

Removing some layers (with secrets)?

Yes.

Maximum number of layers?

Yes (due to performance).

We're closing this issue. Would love to continue the debate on more focused issues.

Please, provide a link in this issue to the debate.

@thaJeztah
Copy link
Member

We suggest to address each independent issue you guys have, in more focused separate issues.

Ok, reasonable, I guess

For relieving the immediate pain, we also suggest updating the documentation on how to manually squash layers if people really need it, with external tools. @crosbymichael volunteered on writing such a tool.

Obviously, this won't allow optimised images when using automated builds on Docker Hub. Which makes it not very useful, or, just as useful as creating an image interactively and committing the result

@tjdett
Copy link
Contributor

tjdett commented Jun 10, 2015

The bigger question is what is the end goal for users who need to squash layers? Faster pull/push? Removing some layers (with secrets)? Maximum number of layers?

We suggest to address each independent issue you guys have, in more focused separate issues.

Excellent. My single, focused issue is reducing the overall size of images to preserve disk space on servers. Reducing total bytes transferred is useful, but secondary.

My standard Dockerfile development process tends towards starting with multiple run statements to gain the benefits of caching on slow operations:

RUN git checkout https://github.com/foo/bar.git
RUN cd /bar && make && make install && cd ..
RUN rm -rf bar

Then when the build succeeds, I need to manually coalesce and rebuild to avoid including source files and intermediate build files in the image:

RUN git checkout https://github.com/foo/bar.git && \
  cd /bar && make && make install && cd .. && \
  rm -rf bar

This is obviously more work for me with no obvious gain, so either there's a problem in my methodology or the tool. This PR removes the extra work, with the SQUASH comment handily able to supplement my existing Dockerfile commenting.

It feels like all these problems are implementation details and solving them by exposing squashing is probably the wrong way.

Much like cloned git repos are always bigger than their archives for any non-trivial project, you can't have both a version-based layering system and the smallest image possible for its contents. The extra information has to be stored somehow, even if it has no practical use in the context you're using it.

If the intermediate steps serve no purpose to the final product, why insist on them being kept to gain the benefits of build caching? The same applies for Git pull requests. You submit squashed commits based on discrete chunks of functionality, not based on the writing process that produced the code, so you have a more useful historical record, rather than a more complete but noisy one.

For relieving the immediate pain, we also suggest updating the documentation on how to manually squash layers if people really need it, with external tools.

This is lovely, except that you can't have a Docker Hub trusted build and do this unless the squashing tool is also going to be integrated with Docker Hub. Merging this PR would improve transparency of Docker Hub images compared with the suggested fix, and Docker Hub doesn't benefit from build caching anyway.

@vincentwoo
Copy link
Contributor

I, too, agree with @tjdett on his ails. I also think it a bit odd that very long, storied threads like this are being closed with no link to a place to continue discussion. It's been ages now, where are the links? Would love to gripe in a more focused thread.

ping @cpuguy83

@chpio
Copy link

chpio commented Dec 8, 2015

Any news on this issue? Maybe a link to the focused discussion @cpuguy83 ?

@cpuguy83
Copy link
Member

cpuguy83 commented Dec 8, 2015

@t128 This issue is closed and won't be supported.
Layers are an implementation detail, the user should not even know about them -- the fact that they do is a bug.

In master currently (and what will be Docker 1.10) the concept of a layer will be gone from the user perspective, though still there in implementation.
Instead of having images made up of a bunch of layers, you just have images.
Images can still share layers but to the user it's just an image, there is no parent image or layer. You can't download a specific layer only an image.

@phemmer
Copy link
Contributor

phemmer commented Dec 8, 2015

@cpuguy83 Is there a discussion somewhere on the subject?
I strongly disagree with hiding the concept of layers from users. There is a very significant difference in a Dockerfile between:

RUN something_which_generates_temp_files
RUN rm /path_to_temp_files

and

RUN something_which_generates_temp_files && rm /path_to_temp_files

Until the distinction between those 2 dockerfiles goes away, users need to be aware of how their images are built.

@duglin
Copy link
Contributor

duglin commented Dec 8, 2015

@cpuguy83 you've mention this on IRC too and I keep meaning to ask you about this. How are the layers gone? When I build an image I still the layers in docker history and I still see the "parent" field in docker inspect` for the images/layers. What appears to be different is that they're now sha's instead of just random IDs. When I walk the history to an "old" image (like ubuntu) things get a bit weird, but I assumed this was because of the migration stuff, but for "new" images it sure looks like the old stuff just with new types of IDs. Can you elaborate on how things are different?

@mark-kubacki
Copy link

@burke Thanks a lot for your work on this feature! Are you committed to maintain an up-to-date fork? Is there another approach which superseded your efforts?

@burke
Copy link
Contributor Author

burke commented Dec 8, 2015

@wmark nope, sorry -- we work around this at a deeper level now by not using the builder in the first place.

@cpuguy83
Copy link
Member

cpuguy83 commented Dec 8, 2015

@duglin For instance, docker images -a does not yield any untagged layers as it used to.
The only images that have a populated "parent" field (apparently... no pun intended?) are from images you build locally.

@phemmer This is still available in docker history.

@rhatdan
Copy link
Contributor

rhatdan commented Dec 8, 2015

I guess the question though is there a difference now between

RUN something_which_generates_temp_files
RUN rm /path_to_temp_files

and

RUN something_which_generates_temp_files && rm /path_to_temp_files

Are the temp files still present in the image somewhere?

@cpuguy83
Copy link
Member

cpuguy83 commented Dec 8, 2015

@rhatdan The fact that engine still use layers is unchanged and as such the resulting fs is also still the same.

@rhatdan
Copy link
Contributor

rhatdan commented Dec 8, 2015

That is what I figured and the reason people still want squash.

@chpio
Copy link

chpio commented Dec 10, 2015

the new "solution" is to hide the fact from the user why his container size is exploding?

@thaJeztah
Copy link
Member

@t128 you can read about some of the reasons for this change here; #17924, here https://gist.github.com/aaronlehmann/b42a2eaf633fc949f93b#new-image-config, and #18378 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet