Proposal: Nested builds #7115

shykes · 2014-07-19T02:31:36Z

Some images require not just one base image, but the contents of multiple base images to be combined as part of the build process. A common example is an image with an elaborate build environment (base image #1), but a minimal runtime environment (base image #2) on top of which is added the binary output of the build (typically a very small set of binaries and libraries, or even a single static binary). See for example "create lightweight containers with buildroot" and "create the smallest possible container"

1. New Dockerfile keywords: IN and PUBLISH

IN defines a scope in which a subset of a Dockerfile can be executed. The scope is like a new build, nested within the primary build. It is anchored in a directory of the primary build. For example:

PUBLISH changes the path of the filesystem tree to use as the root of the image at the end of the build. The default value is / (eg. "publish the entire filesystem tree"). If it is set to eg. /foo/bar, then the contents of /foo/bar is published as the root filesystem of the image. All filesystem contents outside of that directory are discarded at the end of the build.

FROM ubuntu

RUN apt-get install build-essentials
ADD . /src
RUN cd /src && make

IN /var/build {
    FROM busybox
    EXPOSE 80
    ENTRYPOINT /usr/local/bin/app
}

RUN cp /src/build/app /var/build/usr/local/bin/app

PUBLISH /var/build

Behavior of RUN

When executing a RUN command in an inner build, the runtime uses the inner build directory as the sandbox to execute the command. So for example: IN /foo { touch /hello.txt } will create /foo/hello.txt.

Behavior of ADD

When executing ADD in an inner build, the original source context does not change. In other words, ADD . /dest will always result in the same content being copied, regardless of where in the Dockerfile it is invoked. Note: the destination of the ADD will change in a nested build, since the destination path is scoped to the current inner build.

The outer build can access the inner build

Note that filesystem changes caused by the inner build are visible from the outer build. For example, /usr/local/bin was created by FROM busybox and is therefore accessible to the final RUN command in the build.

Behavior of PUBLISH

Also note that PUBLISH /var/build causes the result of the inner build (the busybox image) to be published. Everything else (including the outer Ubuntu-based build environment) is discarded and not included in the image.

The text was updated successfully, but these errors were encountered:

SvenDowideit · 2014-07-21T00:15:49Z

I asked if we could invert the syntax and achieve the same function - and after lots of IRC discussion I think the answer is not really.

This Proposal has some interesting possible effects that we should list:

you can use IN and PUBLISH entirely independently.
there may be a third parameter to PUBLISH to give it a subname (perhaps registry/image/subname:tag when you docker build -t registry/image:tag)
you could PUBLISH more than once
you could overlay more than one IN / {FROM app} to do image mixins - and PUBLISH any dir you like, including leaving it as default

some of these may be bad, some may just need more info in the proposal :)

timthelion · 2014-07-21T11:50:35Z

Hm, @shykes version makes more technical sense where-as @SvenDowideit's version seems more logical. I'm +1 for @SvenDowideit's version.

srlochen · 2014-07-21T21:04:15Z

+1 Having the ability to inject build/test dependencies and discard them at publishing time would simplify a lot for our docker build/release pipelines.

vmarmol · 2014-07-21T21:58:21Z

It would also potentially make the final images much smaller :)
On Jul 21, 2014 2:04 PM, "srlochen" notifications@github.com wrote:

+1 Having the ability to inject build/test dependencies and discard them
at publishing time would simplify a lot for our docker build/release
pipelines.

—
Reply to this email directly or view it on GitHub
#7115 (comment).

wyaeld · 2014-07-21T23:37:21Z

Can someone elaborate where/how layer caching would work into either use-case, from the stated goal of trying to minimize overall size, is the inner buildfile cached completely as a separate container, and only the result is added to the parent layer?

The build process is typically the most time consuming, and benefits the most from caching.

proppy · 2014-07-22T05:15:53Z

I'm not sure if the context needs to be implicitly added/bound in the inner image fs (this could maybe be introduced later and separately from this proposal).

I deleted my earlier syntax change suggestion and created a separate proposal to discuss a more explicit way to bind the context, as per IRC discussion, see #7149.

shykes · 2014-07-22T05:45:50Z

Guys I ask that you focus on criticizing the proposal instead of pushing completely different proposals in the comments. By all means create a separate issue if you have a proposal of your own!

Thanks.

proppy · 2014-07-22T05:48:32Z

@shykes, agreed switching to constructive critism mode.

IN defines a scope in which a subset of a Dockerfile can be executed

Please specify which subset (are ADD and COPY available?)
Also specify what is the context of an inner build (inside IN{}).

shykes · 2014-07-22T05:52:36Z

@proppy

Please specify which subset (are ADD and COPY available?)

I didn't mean a subset of available instructions (all instructions should be available), but a subset of the Dockerfile content - in other words, whatever is enclosed in the curly braces. Happy to change the wording to something more clear.

Also specify what is the context of an inner build (inside IN{}).

The source context would be the same in all images. In other words, ADD . /dest will always result in the same content being copied, regardless of where in the Dockerfile it is invoked. Note: the destination of the ADD will change in a nested build, since the destination path is scoped to the current inner build.

proppy · 2014-07-22T05:55:52Z

@shykes, thanks I suggest adding this to your original proposal description, as those were the first question I had while reading it.

proppy · 2014-07-22T05:59:43Z

It is anchored in a directory of the primary build

What happens if a file exists in both the anchored directory and the fs of the base image used in the FROM of the inner build? Does an anchored directory have to be empty or IN will fail? Are multiple IN with the same anchored directory forbidden?

fiadliel · 2014-07-22T14:53:55Z

I have a possible use case for nested builds which doesn't seem to be covered (yet) by this proposal.

In some cases, the information written into a Dockerfile is duplicated information from an existing build system, which could have been auto-generated instead.

It would be nice if (optionally) the nested build would look for a Dockerfile at the root of the filesystem for the nested build, at that point in the build process. This means that previous steps could generate the Dockerfile and build context used to create the image.

More concretely, http://www.scala-sbt.org/sbt-native-packager/DetailedTopics/docker.html#tasks shows an example where a build system can create a Dockerfile and context, ready to use with Docker.

One example implementation here could be to look for a second Dockerfile if IN /var/build included no commands to execute.

vbatts · 2014-07-22T15:07:00Z

@shykes after looking over this proposal, it satisfies the use-case that #4933 was targeting.

Also, to further this functionality, the path argument to IN ought to expand ENV variables declared in the parent Dockerfile. This way something like $DESTDIR would be a natural flow from build image to runtime image.

Another topic, how will this relationship be tracked with the image metadata stored? will the IN image track the outer image or it FROM as the parent? or will there need to be an additional field for such? or perhaps a noop record layer that indicated where the image came from or what image copied bits into it?

shykes · 2014-07-22T21:50:28Z

@proppy updated

SvenDowideit · 2014-07-23T03:32:42Z

@shykes on irc, you mentioned the possibility of having more than one PUBLISH instruction in a single Dockerfile. until the subname functionality lands, can you please define what happens when there are multiple PUBLISH instructions, possibly with different paths, and possibly in different places in the Dockerfile.

Similarly, can you define what happens with multiple IN's

OH - and nesting. can I have an IN inside an IN, and how deep, and can I have a PUBLISH inside an IN - what does that do?

I'm curious how IN will work - will the outer build create a context, upload that to the Daemon to build fresh, then download it and insert the result, or will it happen in the same build, thus possibly have access to the original context?

Can we define what happens when the IN /dir is not empty (1 error, 2 discarded before we enter, 3 the new image starts from there and magically mixes its FROM fs in)

I'm thinking I could use this as a build pipeline for boot2docker, with the final PUBLISHed image containing the docker and boot2docker binaries and the installers - each of which is built IN separate inner sections, and all the working is discarded. (or better, each is PUBLISHed separately). Is that a useful use-case?

ibuildthecloud · 2014-07-24T18:19:39Z

I very much like (and need) this functionality. My main comment is that when I first read the Dockerfile, I didn't understand what was going on. It took me a bit to get it. So a couple comments

I think IN is a bit too abstract of a keyword. What about BUILDIN to indicate your are doing a build in that directory.
If we go with this feature, I think immediately people will want externalize the Dockerfile of the inner build. So a syntax like BUILDIN /var/build Dockerfile, where Dockerfile is interpreted the same as the SRC in a ADD command
PUBLISH directory seems a bit problematic. It seems you should only be able to publish a directory that was first specified by IN. You wouldn't want to allow to PUBLISH any random folder because then the result image would have to be a full cp/tar of the directory. We would loose the image layering (unless there's a clever approach I don't know). I wonder if we can invent a syntax in which the IN context is named. like IN /var/build BINARIES { ... } and then PUBLISH BINARIES. The name should be optional, because people may not always want to publish the inner context.

A final general comment is how are we going to layer the inner context. It seems that with each ADD or RUN command in the outer Dockerfile context you could be modifying the contents of /var/build. So (assuming were bind mounting /var/build) you would need to create a new layer for the parent context and then all inner contexts for every Dockerfile directive. It seems the implementation of this could be messy.

It would be cleaner to implement if we explicitly knew for each Dockerfile invocation if it was going to modify the one of the contexts. For example, the below syntax would be easier to implement IMO, but it is uglier.

FROM ubuntu

RUN apt-get install build-essentials
ADD . /src
RUN cd /src && make

BUILD BINARIES {
    FROM busybox
    EXPOSE 80
    ENTRYPOINT /usr/local/bin/app
}

WITH ["BINARIES:/var/build"] RUN cp /src/build/app /var/build/usr/local/bin/app

PUBLISH BINARIES

icecrime · 2014-07-25T07:45:59Z

I can see how the proposal elegantly solves the issue of complex build workflows, but don't you fear it'll be misused as a mean to "sum" images? For example in:

FROM busybox
IN /redis/ { FROM redis }
IN /python/ { FROM python }

Perhaps IN and PUBLISH should be merged in a single keyword which does both (run a nested build and publish its result as output of the outer build), which would in effect restrict the feature to a way of defining "build steps" rather than a way of combining images.

tianon · 2014-07-25T18:50:45Z

Honestly, I see that use as a cool bonus feature, especially since the two
images are placed neatly in separate directories, although the image size
will likely balloon in that case, but I don't think that's really avoidable
with this feature unless it's implemented very very cleverly (which is
obviously possible :P).

ibuildthecloud · 2014-07-26T03:47:58Z

@tianon i don't think this needs to be implemented by actually coping the contents of the inner build to the outer layer. Instead setup two rootfs directoies for outer and inner context and mount the inner in to the /var/build. This will mean if you don't publish the inner context the resulting image will have none of the contents of the inner context because it was bind mounted.

This approach also mean this feature would not be able to "sum" up a bunch of images (which is not something we want to allow).

SvenDowideit · 2014-07-26T03:55:40Z

just to note - I would like to be able to sum up a bunch of images.

Doing so makes Docker interesting from a 'replacement for packages' perspective.

its basically making a way to turn off (or make a shared space in) the FS namespace.

so @ibuildthecloud @icecrime could you perhaps expand on your opinion - as it doesn't sound like we all have the same fear of doing it :)

ibuildthecloud · 2014-07-26T05:01:48Z

@SvenDowideit I can't say I'm totally opposed to it in general, but it is a separate topic. This proposal is to address the very real issue of separating your build and runtime environments in an elegant way. Anytime a new feature is proposed you must consider how it might be used in some unexpected way and what is that impact.

Allowing one to sum up a bunch of images will fundamentally change the nature of images. As you indicated, you move from an image essentially being a "full OS image" to an image being a "package." If we were to go in this direction we will need to invent new concepts and technology to describe, manage, and create images. At this point in time I don't think it would be helpful to bifurcate the nascent image ecosystem. Instead we should focus on the specific issue at hand and not focus on changing the nature of images.

icecrime · 2014-07-26T08:07:43Z

@SvenDowideit Don't give my opinion too much credit, I'm a beginner with Docker ;-) TBH I'm not sure I understand how the 'replacement for packages' perspective relates to images combination.

I just have the impression that "how can I get both X and Y in my Docker image" is a recurring beginner question (that I've been asking myself): there's no easy way to do this today, which is probably a good thing as it encourages the "one process for one container" approach.

To sum up: using IN without PUBLISH as in my previous comment seems to me like providing an accessible way to do a discouraged thing (both technically by resulting in a bloated image, and functionally by facilitating multiple-responsibilities container). Thus my question: should we be able to use them independently?

proppy · 2014-07-28T07:10:35Z

What makes me inconfortable with the proposal in its current form is the tight coupling between the inner instructions and the outer ones.

In the example of the description, the outer RUN cp has to know about .../usr/local/bin to match the inner ENTRYPOINT /usr/local/bin/....

And the inner instructions don't need to ADD the binary, unlike a regular Dockerfile used with a binary context.

This create a model where the set of inner Dockerfile instructions and outer ones are unlikely to be composable across images, even more so if this is combined later with something like INCLUDE: /me imagine Dockerfiles only suitable for usage in IN block.

With the existing build model this is nicely abstracted by the context notion, and some docker users already compose builds today, by chaining multiple docker build with external scripts: where the output of the previous build is passed as the context of the next one.

Maybe the description could expand a little more on the methods used today, and which tradeoffs (if any) the proposal has to make to simplify and improve them.

jakirkham · 2015-10-14T16:38:08Z

Also, would be really interested in seeing something like this or some variant. From my understanding, it would be very helpful for testing a layer without including artifacts from testing in the final tagged commit.

ghost · 2015-11-06T04:09:15Z

+1 , would like this feature. really useful.

sleaze · 2016-03-07T23:10:52Z

+1 for docker multiple inheritance functionality

koliyo · 2016-04-01T08:52:32Z

👍

ionelmc · 2016-04-14T12:45:16Z

Does this https://github.com/docker/docker/blob/master/ROADMAP.md#22-dockerfile-syntax mean this proposal won't be implemented any time soon? (if ever)

sr-nu · 2016-05-05T02:59:08Z

The proposal need to split so that each can be discussed and closed independently.

mercuriete · 2016-07-26T21:29:24Z

👍
Another use case...
Maven image to build java artifacts (jar)
then put this artifacts inside a smaller runtime image (java jre)

hiroshi · 2016-10-16T03:35:21Z

Hi, I'm working on a small tool. It can build small docker image with multiple steps. Some may feel it useful.

graingert · 2016-10-18T13:34:27Z

have a look at https://github.com/6si/shipwright

rescribet · 2016-12-01T11:11:59Z

Since we're posting utilities, I've built one to build minimal Golang images in two steps based on the scratch image

xenoterracide · 2016-12-23T18:18:00Z

I think multiple inheritance is a bad idea, see diamond problem, but composable traits a good one, I wrote on the multiple inheritance ticket how I think it could be accomplished safely syntactically.

that said glancing at this or the issue that I'm interested in is a syntactic sugar around temporary build layers for multiple && commands

for example this nasty piece of code

# oracle hackery that lies to it for it's bad installer
RUN mv /usr/bin/free /usr/bin/free.bak \
    && printf "#!/bin/sh\necho Swap - - 2048" > /usr/bin/free \
    && chmod +x /usr/bin/free \
    && mv /sbin/sysctl /sbin/sysctl.bak \
    && printf "#!/bin/sh" > /sbin/sysctl \
    && chmod +x /sbin/sysctl \
    && rpm --install /tmp/oracle-xe-$VERSION-1.0.x86_64.rpm \
    && rm /tmp/oracle-xe-$VERSION-1.0.x86_64.rpm* \
    && mv /usr/bin/free.bak /usr/bin/free \
    && mv /sbin/sysctl.bak /sbin/sysctl

the rpm command is actually expensive and takes a while when doing build, so if if something fails after it (while developing the image) I have to do the whole thing again. What'd be nice is a way to denote layers that are to be flattened, in the final build.

RUN mv ... 
FLT curl
FLT tar 
FLT rm tar

or something like that, where if say the tar failed (because I typoed the path) I wouldn't necessarily have to run the curl again, while developing the file. In the final image these would just look like one layer.

AkihiroSuda · 2017-04-04T02:31:59Z

Given that we have multistage build now, can we update the status of this "roadmap" issue?
cc @tonistiigi
#32063 #31257

tonistiigi · 2017-04-10T03:44:55Z

Thanks for the ping @AkihiroSuda . Let's close this as #32063 that addresses this problem is merged.

mercuriete · 2017-04-28T19:48:47Z

thanks you very much
@tonistiigi
I was waiting for this sooo long
👍

shykes added the Distribution label Jul 19, 2014

shykes mentioned this issue Jul 19, 2014

ADD enhancement: allow adding from another image #4933

Closed

erikh added the Proposal label Jul 21, 2014

erikh removed the Proposal label Jul 21, 2014

shykes mentioned this issue Jul 22, 2014

Squash build dependencies #6906

Closed

proppy mentioned this issue Jul 22, 2014

Proposal: Dockerfile BUILD instruction #7149

Closed

cpuguy83 mentioned this issue Jul 28, 2014

Allow specifying of a dockerfile as a path, not piping in. #2112

Closed

duglin mentioned this issue Oct 14, 2015

Docker add TEST command #16993

Closed

jakirkham mentioned this issue Oct 14, 2015

Docker purge unneeded files jupyter/notebook#589

Merged

phemmer mentioned this issue Nov 6, 2015

Feature request: Mount volume when run docker build #17745

Closed

sheerun mentioned this issue Dec 11, 2015

Support COPY from other builds #18596

Closed

bfirsh added the roadmap label Dec 11, 2015

thaJeztah added the area/builder label Dec 11, 2015

This was referenced Mar 2, 2016

Separate out docker image build and deployment steps into an independent command line tool distribution/distribution#1503

Closed

Separate out docker image build and deployment steps into an independent command line tool #20873

Closed

ndeloof mentioned this issue Jun 14, 2016

Extend docker cp to permit copying from images #16079

Closed

NikolausDemmel mentioned this issue Aug 5, 2016

Add tmpfs as a valid volume source command. #13587

Merged

thaJeztah mentioned this issue Aug 9, 2016

Dockerfile Multiple Inheritance with nested builds #25531

Closed

rdsubhas mentioned this issue Nov 18, 2016

build time only -v option #14080

Open

thaJeztah mentioned this issue Dec 27, 2016

[Feature Request]: add UNION to Dockerfile syntax #29719

Closed

tonistiigi closed this as completed Apr 10, 2017

thaJeztah added this to the 17.05.0 milestone Apr 10, 2017

jephir mentioned this issue Apr 21, 2017

node:7.9-alpine unable to build package due python is not installed nodejs/docker-node#384

Closed

Proposal: Nested builds #7115

Proposal: Nested builds #7115

Comments

shykes commented Jul 19, 2014

1. New Dockerfile keywords: IN and PUBLISH

Behavior of RUN

Behavior of ADD

The outer build can access the inner build

Behavior of PUBLISH

SvenDowideit commented Jul 21, 2014

timthelion commented Jul 21, 2014

srlochen commented Jul 21, 2014

vmarmol commented Jul 21, 2014

wyaeld commented Jul 21, 2014

proppy commented Jul 22, 2014

shykes commented Jul 22, 2014

proppy commented Jul 22, 2014

shykes commented Jul 22, 2014

proppy commented Jul 22, 2014

proppy commented Jul 22, 2014

fiadliel commented Jul 22, 2014

vbatts commented Jul 22, 2014

shykes commented Jul 22, 2014

SvenDowideit commented Jul 23, 2014

ibuildthecloud commented Jul 24, 2014

icecrime commented Jul 25, 2014

tianon commented Jul 25, 2014

ibuildthecloud commented Jul 26, 2014

SvenDowideit commented Jul 26, 2014

ibuildthecloud commented Jul 26, 2014

icecrime commented Jul 26, 2014

proppy commented Jul 28, 2014

jakirkham commented Oct 14, 2015

ghost commented Nov 6, 2015

sleaze commented Mar 7, 2016

koliyo commented Apr 1, 2016

ionelmc commented Apr 14, 2016

sr-nu commented May 5, 2016

mercuriete commented Jul 26, 2016

hiroshi commented Oct 16, 2016

graingert commented Oct 18, 2016

rescribet commented Dec 1, 2016

xenoterracide commented Dec 23, 2016

AkihiroSuda commented Apr 4, 2017

tonistiigi commented Apr 10, 2017

mercuriete commented Apr 28, 2017