Squash build dependencies #6906

shykes · 2014-07-08T18:12:13Z

The current implementation of “docker commit” and “docker build” makes it difficult to strip images of their build dependencies. This causes 2 problems:

Many images are unnecessarily bloated.
To avoid the bloat, some developers avoid using “docker build”, creating unnecessary fragmentation.

unclejack · 2014-07-08T18:18:50Z

We could collapse instructions which only introduce metadata changes into single layers.

Something like:

FROM someimage
MAINTAINER someone
RUN apt-get update
RUN apt-get install -y somepackage
ENV foo bar
ENV bar baz
ENV boo foo

could be safely reduced to fewer layers:

FROM someimage
MAINTAINER someone
RUN apt-get update
RUN apt-get install -y somepackage
ENV foo bar ENV bar baz ENV boo foo

We could also introduce automated squashing which would keep the original images around to make rebuilds faster.

shykes · 2014-07-08T18:43:09Z

@unclejack collapsing layers in this way addresses a different problem: the limit in number of filesystem layers. It doesn't address the problem of disk space. I suggest we address it in a separate issue.

aweiteka · 2014-07-14T20:48:58Z

Assumptions:

cached image layers are only useful or meaningful to the application developer during the build process
new Docker users naively expect each docker build to result in a single new image layer

I've consider build artifact layer management to be a good candidate for image signing which would squash the new build layers into a single new image layer.

docker build -t my/app . [--sign] adds one new layer to the application and optionally signs it.

trevorjay · 2014-07-16T16:14:39Z

The caching is most useful for cases like package system commands that take a long time to run. However, even these commands you often want to explicitly re-run. What about a simple --no-cache flag for the build command?

Alternatively, what about a FRESH command for Dockerfiles? Basically it would have the same affect as:

RUN echo "nonce"

but wouldn't require the author to keep changing it.

Tweaking @aweiteka's idea: since you only want to sign "finished" images anyway, what about making --sign (if and when implemented) implicitly also behave as a --no-cache ?

proppy · 2014-07-17T18:47:24Z

A somewhat "emergent" pattern is to use a builder and runner image.

Only the builder one contain the build dependencies, and the artefacts are extracted either using volumes, docker cp or even stdout and injected into another context.

I wonder how (and if) the docker CLI / API could bless and facilitate the pattern in a way that's compatible with the hub.

A more importantly: would that fixes this issue or is it a separate discussion?

thaJeztah · 2014-07-17T22:30:05Z

since you only want to sign "finished" images anyway, what about making --sign (if and when implemented) implicitly also behave as a --no-cache

-1 on that, or (at least) have --no-cache as a separate flag as well; if I don't need to sign my image, I still want to be able to disable caching layers or be able to squash.

shykes · 2014-07-22T01:20:56Z

@proppy yes, I think we should support the "builder / runner" pattern you talk about. That is the goal of nested builds (#7115). Note that even with nested builds, you still need dependency squashing to make sure the leftover "unpublished" build dependencies are not carried into the image.

SvenDowideit · 2014-07-22T01:50:32Z

the example I have is a Dockerfile with

RUN apt-get install make
RUN make whatever
RUN apt-get remove make

even if i want my build artifacts, right now, its not simple enough to get rid of the build tools.

plus, it enforces a Dockerfile hygiene thing - it will encourage users to extract the

RUN apt-get update
RUN apt-get install apache
ADD certificates
RUN echo "" > domain_settings_files

that they have copied and pasted into a few places into one common image that they then FROM local-web-debian - and update and manage more carefully..

phemmer · 2014-07-23T12:50:03Z

I'm just curious, how does this issue differ from #332 or #4232?

aigarius · 2014-07-28T18:52:26Z

Wasn't there supposed to be work on "ghost" layers that would be present in the build process, but disappear from the final image? If such a feature would be technically possible, one could easily imagine the following Dockerfile:

FROM debian
RUN apt-get install -y libjpeg
~RUN apt-get install -y libjpeg-dev build-essential gcc
~ADD source /build
~WORKDIR /build
~RUN ./configure
~RUN make
RUN make install
CMD /usr/local/bin/myexe

Where all layers generated with lines that start with a "~" would actually not appear in the final image.

phemmer · 2014-07-28T19:56:14Z

I personally like the syntax @txomon mentioned in #332, which uses a COMMIT directive.
The reason being is it seems clearer where the resulting image would be generated.

For example

FROM debian
RUN foo
~RUN bar
~RUN baz
CMD bash

Will the resulting image have the RUN bar and RUN baz as a single image, and CMD bash as another? Or will there be one image for all three?

On the other hand, if we re-use ideologies from database systems:

FROM debian
RUN foo
BEGIN
RUN bar
RUN baz
COMMIT
CMD bash

Seems very clear that RUN bar and RUN baz will be squashed into a single image, and CMD bash will be a separate one.

aigarius · 2014-07-28T20:03:26Z

I think that there is a miscommunication of what "squash" build dependencies means in the context of this ticket. I understand it like "remove", thus any changes to filesystem that are made by tilde commands will not show up at all in the final image. This means that you can install build dependencies, do the compiles and none of that will actually be included in the final image. Only the result of "RUN make install" in my example above (only the actual already compiled binaries in /usr/local/bin) and their runtime dependencies installed in the second line of the example would be present in the final image.

As far as I am reading into it, half of the comments here confuse this with #332 which keeps all the changes to the filesystem in the final image, but just in fewer layers.

phemmer · 2014-07-28T20:13:40Z

Ahh, thank you. That makes a lot more sense. Perhaps we could call it "stripping" instead of "squashing".

TomasTomecek · 2015-01-14T09:54:15Z

Any updates?

stain · 2015-01-18T22:40:42Z

What about just having a multi-line RUN-MANY mean the equivalent of the awkward && \ chaining pattern?

RUN-MANY
  apt-get update
  apt-get install wget unzip build-essentials
  wget http://example.com/source.zip
  mkdir /tmp/src
  cd /tmp/src
  unzip source.zip
  # CRAZY - comments allowed in the middle!
  make install
  apt-get remove --purge wget unzip build-essentials
  apt-get --purge autoremove
  apt-get autoclean
  rm -rf /tmp/*
END

This would give a lean image, but also make it easier to copy-paste in any existing install-scripts.

Also - say you start with a traditional RUN style to have faster development time - now you can just insert RUN-MANY and END, search-replace the multiple RUN - and hey presto - you have the lean, autocleaning version of your script.

phemmer · 2015-01-19T06:47:23Z

@stain As you mentioned, you can already accomplish the same thing with a simple && shell operator. What this issue is about is doing things which aren't possible at all (not just inconvenient). Such as removing layers which are not necessary in the final image.

thaJeztah · 2015-01-19T07:04:34Z

@phemmer to give some context; @stain's original Dockerfile used a COPY to add the source. I suggested a build-container or curl/wget workaround and pointed to existing issues on the issue tracker wrt getting rid of large intermediate layers.

stain · 2015-01-19T17:34:41Z

But isn't there something wrong when most of the Dockerfile commands can't
be used in a production image because they generate enormous intermediate
images that nobody else will ever need?

The tiny difference between ADD and COPY does not help.

The COMMIT should be a better solution then my RUN-MANY, as it would allow
you to do COPY and so on without worrying about waste, and just do multiple
RUN in a naive way.

In one project I have a github repository which somehow is 600 MB when
checked out. Ideally I would like to host the Dockerfile right there, so
the image on the hub would track updates to the github code.

Then I would just COPY . /src and do the build within the Docker image
(installing about 40 MB of binaries) and then clean up before a COMMIT.

currently I need to have a massive && which in the end does all the cleanup
like deleting /tmp and deleting all the library files needed for
compilation. Obviously testing this takes forever as it always does
everything.

Moving from Dockerfile style during development to mega-RUN-&& style takes
considerable effort, as one has to do so much more housekeeping like
installing wget and unzip, keep track of the temporary files downloaded (or
use pipes if tar.gz), and clean everything up afterwards.

During such a sequence not a single Dockerfile command can be used as it
would make it pointless to clean up.

Flattening layers should be straight forward as its just set difference
(after all it is done at runtime by the container). you can set the
boundary to each Dockerfile, so you can't flatten outside layers.
On 19 Jan 2015 07:05, "Sebastiaan van Stijn" notifications@github.com
wrote:

@phemmer https://github.com/phemmer to give some context; @stain
https://github.com/stain's original Dockerfile used a COPY to add the
source. I suggested a build-container or curl/wget workaround and pointed
to existing issues on the issue tracker wrt getting rid of large
intermediate layers.

—
Reply to this email directly or view it on GitHub
#6906 (comment).

aigarius · 2015-01-19T20:18:08Z

@stain you are also confusing this with #332 , please read my comments above for a clarification.

jessfraz · 2015-07-10T19:00:20Z

Hello!
We are no longer accepting patches to the Dockerfile syntax as you can read about here: https://github.com/docker/docker/blob/master/ROADMAP.md#22-dockerfile-syntax

Mainly:

Allowing the Builder to be implemented as a separate utility consuming the Engine's API will open the door for many possibilities, such as offering alternate syntaxes or DSL for existing languages without cluttering the Engine's codebase

Then from there, patches/features like this can be re-thought. Hope you can understand.

vbatts added the Distribution label Jul 17, 2014

shykes mentioned this issue Jul 19, 2014

ADD enhancement: allow adding from another image #4933

Closed

LK4D4 mentioned this issue Jul 23, 2014

Using ENTRYPOINT gives unexpected arguments to executable #7178

Closed

This was referenced Jul 29, 2014

Proposal: INCLUDE syntax for specifying multiple images per context #7277

Closed

Proposal: Nested builds #7115

Closed

jlhawn mentioned this issue Dec 10, 2014

Adds build option to squash newly built layers #9591

Closed

jessfraz added kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny /dist/registry area/builder labels Feb 27, 2015

cpuguy83 mentioned this issue Mar 7, 2015

"depth" function wanted #11226

Closed

phemmer mentioned this issue Apr 8, 2015

Add ability to `ADD files and then remove them in the same layer in a Dockerfile #12169

Closed

thaJeztah mentioned this issue May 5, 2015

Add 'TAG' and 'COMPRESS' command to Dockerfile #12987

Closed

jvassev mentioned this issue May 11, 2015

Serve the build context #13124

Closed

jessfraz added area/distribution and removed Distribution labels Jul 10, 2015

jessfraz closed this as completed Jul 10, 2015

phemmer mentioned this issue Dec 1, 2015

dockerfile instruction for running script from temporary location #18332

Closed

sheerun mentioned this issue Dec 15, 2015

Support COPY from other builds #18596

Closed

phemmer mentioned this issue Jan 23, 2016

Feature Request: "AND" instruction in Dockerfile #19597

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Squash build dependencies #6906

Squash build dependencies #6906

shykes commented Jul 8, 2014

unclejack commented Jul 8, 2014

shykes commented Jul 8, 2014

aweiteka commented Jul 14, 2014

trevorjay commented Jul 16, 2014

proppy commented Jul 17, 2014

thaJeztah commented Jul 17, 2014

shykes commented Jul 22, 2014

SvenDowideit commented Jul 22, 2014

phemmer commented Jul 23, 2014

aigarius commented Jul 28, 2014

phemmer commented Jul 28, 2014

aigarius commented Jul 28, 2014

phemmer commented Jul 28, 2014

TomasTomecek commented Jan 14, 2015

stain commented Jan 18, 2015

phemmer commented Jan 19, 2015

thaJeztah commented Jan 19, 2015

stain commented Jan 19, 2015

aigarius commented Jan 19, 2015

jessfraz commented Jul 10, 2015

Squash build dependencies #6906

Squash build dependencies #6906

Comments

shykes commented Jul 8, 2014

unclejack commented Jul 8, 2014

shykes commented Jul 8, 2014

aweiteka commented Jul 14, 2014

trevorjay commented Jul 16, 2014

proppy commented Jul 17, 2014

thaJeztah commented Jul 17, 2014

shykes commented Jul 22, 2014

SvenDowideit commented Jul 22, 2014

phemmer commented Jul 23, 2014

aigarius commented Jul 28, 2014

phemmer commented Jul 28, 2014

aigarius commented Jul 28, 2014

phemmer commented Jul 28, 2014

TomasTomecek commented Jan 14, 2015

stain commented Jan 18, 2015

phemmer commented Jan 19, 2015

thaJeztah commented Jan 19, 2015

stain commented Jan 19, 2015

aigarius commented Jan 19, 2015

jessfraz commented Jul 10, 2015