Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Incorporate docker run IMAGE into BUILD #8660

Closed
WhisperingChaos opened this issue Oct 20, 2014 · 7 comments
Closed

Proposal: Incorporate docker run IMAGE into BUILD #8660

WhisperingChaos opened this issue Oct 20, 2014 · 7 comments
Labels
area/builder kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny

Comments

@WhisperingChaos
Copy link
Contributor

Description

The processes, environments, and resources required to construct artifacts that compose a desired image are typically irrelevant to its runtime. For example, the tool chain: language compilers, their libraries, …, employed to build an application, when incorporated into an image for delivery, encumber the transport of the resultant image and potentially the execution of its derivative container(s). To avoid image pollution and facilitate delivery of a minimally sized one, the Docker BUILD environment must properly separate (isolate) the concerns of image construction from that of its runtime.

This proposal essentially recommends incorporating the docker run IMAGE command to impart the same benefits of containment (isolation/encapsulation), performance, and reusability that have contributed to Docker’s success, to its BUILD environment. It does so by remapping the concepts/implementation of the docker run IMAGE command to a function idiom represented by a new Dockerfile operator called “FUN” (FUNction run) that executes an already know image. A discussion of its benefits, sketch of syntax, overview of semantics, and an example are provided below.

In addition to the new FUN operator, this proposal introduces another one: DEF FUN, to declare/define the body of a transient image (function) within a Dockerfile. The body of a transient image is the set of Dockerfile commands (conceptually a Dockerfile within a Dockerfile) needed to construct it.

Benefits

  • Separates concerns: Fully separates the build environment from the resultant image’s runtime by leveraging the inherent isolation mechanism of a container. Build processes run in their own containers and in whatever environment supported by it, limited to affecting only their container’s file system. When BUILD completes, all transient containers can be destroyed eliminating potential side effects, when the image’s BUILD is re-executed.
  • Permits encoding a Dockerfile through widely understood function idiom. Besides the familiarity Developers have in applying functions to compose a solution, the proposal leverages less obvious/assumed mechanisms to reduce harmful coupling.

The function idiom includes an interface definition that provides a coupling point at the boundary layer separating the internals of a function, its implementation, from the surrounding external invocation environment. At this coupling point, an interface features a mechanism to bind one or more external arguments to a corresponding set of variables internal to the function. This binding mechanism allows external argument names to be properly correlated, even if their names are different, to their internal counterparts, thereby, eliminating the need to synchronize argument names to mirror the variable names internal to a function and avoid binding to a function’s (or invocation environment's) implementation. Finally, since this binding mechanism occurs at each function invocation, it encourages function reuse, as the same function body can be called at various locations with differing argument names and values.

  • Improves Dockerfile performance as the encapsulation mechanisms by function idiom and assumed purity of encoded functions:
    • Limit cache evaluation to what's necessary: the input/output argument bindings, the input file checksums defined by its invocation, and the invoked image's checksum. The individual commands representing the function's body are ignored, as they are reflected by the image's checksum.
    • Realize concurrent/parallel execution of independent functions. Since a function's defined interface permits easy recognition of dependencies between it and other functions, independent invocations, functions whose inputs aren't dependent on another function's output(s), can be executed in parallel, accelerating the apparent execution speed (not CPU time), when compared to serially running the invocations.
    • Avoid repeated evaluations for function invocations whose input argument values are the same. In this situation, the previously computed output value(s) can simply replace the function invocation.
  • Dynamically extends Dockerfile command set through shareable images via Docker Hub.
  • A function’s interface limits its coupling to only those artifacts needed or produced by the function.
  • Preserves the current Dockerfile semantics of first FROM and its ”pull” model when assembling images.
  • Adding or removing generated artifacts can be accomplished by either adding a function invocation, encoded as a few consecutive lines of cohesive code, or removing them.
  • Improves a Developer’s uptake of FUN by leveraging his/her experience with the existing docker run IMAGE command.
  • This function invocation method could be generalized and made available to Docker’s runtime environment.

Syntax

The syntax presented below provides a means to explore concepts. For example, words beginning with '--', like --CONTEXT, reflect keywords whose final form remains undecided.

FUN
  [--CONTEXT { [:]
         [[:]... ]
    | [--FROM_IMAGE :[]
        [[:]... ] } ]
  [ { --IN [ ]...
    | --IN_IMAGE
        [ ]... } ]
  [--OUT [ ]... ]
   {--NOCOMMAND | [] []}

: see docker run IMAGE command
: The files provided by the PATH or URL supplied by the initiating BUILD command.
: A build context assembled from and/or files available from the . This assembled context conforms to the interface expected by/expose to the image when performing BUILD processing.
: File paths/environment variables to be resolved within the function's ('s) body. Although, a Developer could specify a value instead of an environment variable name, avoiding harmful coupling to a function's implementation requires a level of indirection and an associated resolution process which the Dockerfile ENV provides. The use of ENV also offers a method to minimally document the function's interface via the docker inspect command.
: File paths/environment variables to be resolved within the context of the image being built at the moment of invocation.
: see docker run IMAGE command
: see docker run IMAGE command

Semantics

FUN's behavior presented using Dockerfile/docker operations when possible.

# Assemble the <BuilderPhaseContext> from --CONTEXT specifier(s).
# Each <BuildContext>[:<BuilderPhaseContext>] pair generates:
ADD <BuildContext> <BuilderPhaseContext>
ADD...
ADD...
...
# Each <InvokingImageContext>:[<BuilderPhaseContext>] pair identified by the
# --FROM_IMAGE keyword generates: 
COPY_FROM_IMAGE <InvokingImageContext> <BuilderPhaseContext>
COPY_FROM_IMAGE ...
COPY_FROM_IMAGE ...
...
# Inherit file system from an existing image or replace below 
# with DEF FUN body and protect it with a container layer.
# Use <BuilderPhaseContext> as build context. 
FROM <ImageID>
# Build complete.  Container reflects state right before executing "docker run IMAGE".
# Each --IN argument pair is translated to an ADD.
ADD <BuildContext> <FunctionContext>
ADD...
ADD...
...
# Each --IN_IMAGE argument pair generates:.
COPY_FROM_IMAGE <InvokingImageContext> <FunctionContext>
COPY_FROM_IMAGE ...
COPY_FROM_IMAGE ...
...
# Images Entrypoint/<COMMAND> is executed along with its optional <ARGS>.
docker run [<COMMAND>}] [<ARGS>]
# Each --OUT argument pair generates:
COPY_TO_IMAGE <FunctionContext> <InvokingImageContext>
COPY_TO_IMAGE ...
COPY_TO_IMAGE ...
...
RETURN

The example is intended to convey the proposed FUN semantics leveraging the experience of familiar commands. It's not a definitive implementation spec. Here's a written description of FUN's invocation:

  • When one or more --CONTEXT keywords exist, assemble the . Specifying multiple --CONTEXTs creates an aggregate one from disjoint file/directory references while overlapping references generate a build time error, at least in the situation involving the same --CONTEXT specification. If the --CONTEXT specification omits the optional [:], the resources specified by the source context, either or , are copied to the replicating their source path and file names. Absence of --CONTEXT produces an empty . --CONTEXT enables the expression of minimal interfaces to avoid this vulnerability.
  • Allocate the function's file system and protect it with a container layer. Use as build context. For prebuilt images, run ONBUILD triggers, if they exist. For transient images, run Dockerfile commands.
  • Image now reflects state immediately before executing "docker run IMAGE"
  • Copy by value (--IN), all the input arguments resolving the source references within the build context of the "docker build" command and target references within the function's context. For example, target references can be environment variables established during construction of the function being called. During a function's invocation they would be expanded within the function's context and reflect their values, as established when the image (function) was built.
  • Copy by value (--IN_IMAGE), all the input arguments resolving the source references within the invoking image's context (ENV variables & file system) and target references within the function's context.
  • Run the function's (image's) entrypoint/specified command with the arguments provided when the function (image) was created or stated by the FUN operator. If "--NOCOMMAND" keyword specified by function invocation, do not execute the docker run IMAGE command. Use NOCOMMAND in situations where the ONBUILD triggers initiate the RUN command and produce all the output artifacts desired.
  • Copy by value (--OUT), all the output arguments. Resolve source references within the function's context while target references are resolved within the image being built.
  • Upon return, the allocated container can either be immediately or lazily destroyed. A lazy destruction would allow caching the output artifacts for situations where the function is called repeatedly (within a given Dockerfile), with the same input arguments and values. Under these circumstances, and when the function is considered pure, the semantics of FUN can be short circuited to only execute the COPY_TO_IMAGE operations.
  • FUN complete. Perform Dockerfile implicit commit to create an (intermediate/final) image.

Example

Given: An image called “appCompile” already created by the following Dockerfile:

FROM Ubuntu
RUN apt-get install build-essentials
ENV IN_SOURCE /src
RUN mkdir $IN_SOURCE
RUN echo ‘#!/bin/bash’            >/appBuild.sh    ;\
    echo "cd $IN_SOURCE && make" >>/appBuild.sh    ;\
ENV OUT_EXECUTABLE /$IN_SOURCE/build/app
ENTRYPOINT /appBuild.sh

Create the “app” image via this second Dockerfile:

FROM busybox
FUN appCompile --IN . IN_SOURCE --OUT OUT_EXECUTABLE /usr/local/bin/app
EXPOSE 80
ENTRYPOINT /usr/local/bin/app

Additional more substantive Docker Hub example using google/golang image. Example also contrasts Function Idiom approach to Nested/Chained Build.

Description

In addition to the FUN operator described above, the proposal would also include a mechanism to permit the construction of supporting transient functions (images) within a Dockerfile. The mechanism is similar to an inline function declaration which emerged during this discussion with Alexander Larsson.

Benefits

  • Permits refactoring a Dockerfile employing widely understood function idiom. For example, a Dockerfile for an image requiring a complex tool chain can be segmented into cohesive functions. These functions can be organized to both layer concerns and reflect additional ones besides build vs runtime.
  • Transient build functions and their build contexts unique to constructing a resultant image are aggregated into a single Dockerfile eliminating the effort to maintain separate Dockerfiles and build contexts.
  • Reduces dependencies/reliance on resources external to the resulting image’s Dockerfile. If so inclined, the external dependency list might be reduced to just the initial FROM image request.
  • Facilitates the transition from locally defined images to an external one by extracting the inline function and producing a Dockerfile from it. Any invocations in the original Dockerfile would continue to operate without changing the original Dockerfile. The same benefit applies when converting from an external function to a locally defined one.

Syntax

DEF FUN <ImageID>              
# Dockerfile commands.
...
...
...
END FUN

: See docker run IMAGE command. Typically, it will be a human readable label reflective of the function’s primary responsibility using the :[] form. When using :[] form, assumes the default of “latest”. However, could also assume any other valid image label, like a short/long GUID.

Semantics

DEF FUN declares the start of a function (image) definition. When recognized, the current BUILD process writes the commands to a cached file until it detects the matching END FUN. The is placed into a function resolution table maintained by the current BUILD process. Whenever an image name requires resolution, to satisfy either a FUN or FROM operation, the resolution process first reviews the current local function resolution table for the given name. It spawns a child build process and passes the assembled by the initiating function invocation to this child. Once the child build process completes, the function (image), situated in the parent, is executed. The image generated by the child build process can be cached to satisfy future requests initiated by the same parent or a spawned (child) level. In situations where two functions share the same , the definition nearest to the FUN operator will be used. Local inline function definitions override any external function (image) that share the same .

Example

Previous example rewritten to employ DEF FUN aggregating the two distinct Dockerfiles into a composite one:

# declare and define application compiler function:
DEF FUN appCompile
    FROM Ubuntu
    RUN apt-get install build-essentials
    ENV IN_SOURCE /src
    RUN mkdir $IN_SOURCE 
    RUN echo ‘#!/bin/bash’           >/appBuild.sh     ;\
        echo "cd $IN_SOURCE && make" >>/appBuild.sh    ;\
    ENTRYPOINT /appBuild.sh
    ENV OUT_EXECUTABLE /$IN_SOURCE/build/app
END FUN
# now create the surviving image:
FROM busybox
# call inline function:
FUN appCompile --IN . IN_SOURCE --OUT OUT_EXECUTABLE /usr/local/bin/app
EXPOSE 80
ENTRYPOINT /usr/local/bin/app

Additional DEF FUN example contrasting Function Idiom approach to Nested/Chained Build.

@erikh
Copy link
Contributor

erikh commented Oct 20, 2014

This proposal is amazingly detailed, so thanks for that -- it is a model for proposals. That said, I'm not sure it's a good fit for the builder. Builder is extremely declarative and simple, deliberately so. Having switches and the notion of a function really detracts from the goals here.

Although, I'm not the final decider on such topics, (that would be @shykes) and there's plenty of opportunity for the community to convince us as well.\

/cc @tiborvass

@WhisperingChaos
Copy link
Contributor Author

@erikh

Thank you for your kind assessment concerning the detail of my proposal!

Before replying in depth to your post, I would like to ensure my understanding of Builder’s declarative nature and its goals. An attempt to discover them through a Google search for “docker builder declarative goal”, as well as reviewing a few files in the Builder github repository were unsuccessful in revealing them. Therefore, a link to these definitions, or a statement of them would be appreciated to inform the conversation for others and myself.

I’ve also examined Dockerfile syntax and its execution by Builder to unearth its declarative nature and would appreciate feedback regarding the observations below.

Computational Dependency

In several declarative languages, arbitrary execution of operators occurs unless a computational dependency dictates a specific sequence. When a dependency becomes necessary, it’s typically expressed through a coupling mechanism encoded (implemented) via syntax. For example, 5 * sin(90), the multiplication and sin operators are syntactically coupled to one another, such that, the sin(90) must be (deterministically) calculated before computing the answer to the multiplication operator.

However, in a Dockerfile, it seems adjacency/ordering of a command in the file determines the computational dependency, such that, given a command (C1), its successor, the one that follows it (C2=C1+1), is always dependent on the given command (C1). In other words, a successive statement deterministically couples to the one that precedes it. This is similar to the pipeline notion of a monad.

Since Dockerfile statement ordering conveys deterministic intention Ex: 1 would be considered different than Ex: 2 although order in this case doesn’t affect the outcome:

Ex: 1

FROM ubuntu
ADD ContextFile1 ImageFile1
ADD ContextFile2 ImageFile2

Ex: 2

FROM ubuntu
ADD ContextFile2 ImageFile2
ADD ContextFile1 ImageFile1

However, reordering the following commands of Ex: 3 will affect the targeted image of the initiating docker build command:

Ex: 3

FROM ubuntu
ADD ContextFile1 ImageFile1
ADD ContextFile2 ImageFile1

Although statement reordering changes the outcome of Ex: 3, this different outcome isn’t considered a side effect because the dependency between the statements, that’s encoded through the ordering of statements within a Dockerfile, would also be (deterministically) reordered. In other words, as currently encoded above, ADD ContextFile1 ImageFile1 will always (deterministically) be executed before ADD ContextFile2 ImageFile1. If the statements were exchanged, ADD ContextFile2 ImageFile1 would always (deterministically) be executed before ADD ContextFile1 ImageFile1.

The following example, a rewrite of Ex: 3, employs syntactic coupling to express an equivalent encoding of a Dockerfile. In this example, Dockerfile commands are depicted as terms and parentheses denote precedence.

Ex: 4

ADD (ADD (FROM ubuntu)  ContextFile1 ImageFile1) ContextFile2 ImageFile1 

Is a successive statement deterministically coupled to the one that precedes it?

Argument: File System & Image Metadata

In addition to the coupling mechanism conveying the notion of dependency, it delivers the desired thing, a variable’s value, as an argument (input) to the next computation. Referring to the 5 * sin(90) computation above, the term sin(90) generates the variable value (1) that will be passed as an input to the multiplication computation.

Since, if as confirmed above, statements are coupled according to their ordering, so to is the delivery of a variable value. In a Dockerfile, the variable value can be considered an aggregate of an intermediate image’s file system and its metadata. The metadata being the values available through the docker inspect command. Although it’s not apparent when viewing syntax, each Dockerfile command following the first FROM receives this committed aggregate value as an argument (input).

Similar to the examples above, Dockerfile syntax has been extended to convey the delivery of a variable value in a syntactic form for its FROM and ADD commands:

<ExistingFileSystemImage> FROM <ExistingFileSystemImageName> 
    > Select an existing File System Image identified by <ExistingFileSystemImageName>.
    > Return it as <ExistingFileSystemImage>.

<NewFileSystemImage> ADD <ExistingFileSystemImage> <BuildContextFileOrDirectoryName> <ImageContextFileOrDirectoryName> 
    >Given an existing file image <ExistingFileSystemImage>, make a copy of it called <NewFileSystemImage>.
    >Now copy the contents referenced by the <BuildContextFileOrDirectoryName> to the <ImageContextFileOrDirectoryName> location within the <NewFileSystemImage>.
    >Return <NewFileSystemImage>.

Although not syntactically encoded as an argument, is the file system and image metadata successively passed as an argument to the next statement?

Avoiding Side Effects: Pure Functions & Single Assignment Form

Nearly all Dockerfile commands, are considered pure functions because the side effects within the imperative GO implementation language have no observable effect on Dockerfile state, their semantics adhere to a single assignment form (eliminating destructive reassignments), and the execution of a commit between each command ensures atomicity preventing unintentional overlapping state changes.

True?

The Exceptional RUN

RUN deviates from other Dockerfile commands, as it executes one or more arbitrary linux commands/scripts without the protective benefit of any of the mechanisms mentioned above. For example, destructive reassignments can not only occur between linux commands, due to the absence of commit, but also within the execution of a specific command, as all commands read and write to the same image file system state.

True?

@jessfraz jessfraz added kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny Proposal area/builder labels Feb 26, 2015
@WhisperingChaos
Copy link
Contributor Author

Reply to Concerns
Compare Function Idiom to Nested/Chained Build
  Simple Dockerfile
  Substantive Dockerfile

Reply to Concerns

The reply below responds to @erikh concerns and provides an example which contrasts this proposal's promotion of the 'Function Idiom' to another solution called 'Nested Build' #7149 in an effort to more concretely demonstrate the technical merits of the Function Idiom proposal.

Without feedback explaining Docker's notion of “extremely declarative”, a concept called 'declarative benefit' is defined below to provide some guidelines with which to qualify the quality of being declarative.

Using this Declarative Programming definition and its associated links, the term 'declarative benefit' measures the quality/fidelity of a language mechanism to:

  • Reduce observable side affects. Example: As the purity of a function increases, by decreasing a function's reliance on observable side effects, the greater its declarative benefit.
  • Ensure decoupling/detachment of a language element’s purpose (“what”) from its implementation (“how”). The greater the degree of isolation between a language element's “what”/”how” the greater its declarative benefit.
  • Encourage a solution's encoded form to better reflect/map onto its design.

Contrary to the assertion (“…a function really detracts from the goals here”) that incorporating functions would oppose Builder’s goal to achieve “extremely declarative” semantics, functions can significantly improve Builder’s declarative benefit.

  • Functions, written adhering to particular constraints, are known to be purely (extremely) declarative. See haskell.
  • Even imperative functions deliver some level of declarative benefit, as the function idiom encoded in essentially all imperative computer languages separates/decouples a function invocation from its declaration/body (what from how). Additionally, a function's argument binding mechanism, enforces encapsulation greatly reducing observable side effects by first isolating variables internal to a function from externally defined ones and secondly, through a transitory coupling mechanism: the temporary ability of function's arguments to reflect values from an external invocation. This transitory mechanism detaches a variable's value from its external variable name and then couples/assigns this external value to an internally named variable. Without this ability, a solution's body would need to be recoded/encoded to reflect the variable names used at each invocation.
  • Function refactoring supports both the ability to decompose operations/concepts into more elemental ones, as well aggregate them into more complex ones. Therefore, functions encourage concision as they can be refactored into reusable building blocks whose shape/complexity reflects the problem domain.
  • Builder implements Dockerfile functions as imperative GO functions. Therefore, the proposal simply promotes the same internal mechanism be publicly available to developers of Dockerfiles.
  • RUN’s ability to execute arbitrary functions, the very feature implied by (“…a function really detracts from the goals here”) which oppose Builder’s objectives, is the primary reason why RUN is indispensable when encoding Dockerfiles.

Contrary to the assertion of Builder being “extremely declarative” its supported Dockerfile commands:

  • Represent very primitive operations especially from the perspective of a declarative operator that better matches the conceptual/abstract operations needed to construct an image. For example, a declarative operator, like “INSTALL”, conceptually (if not actually) would generate and execute one or more ADD & RUN operations.
  • Do not provide mechanisms to either combine or partition a series of Dockerfile commands to better reflect the abstract operators needed by its problem domain.
  • Actually execute imperative functions written in GO benefiting from intentional side effects, instead of encoding the body of these functions in purely declarative language.
  • The Dockerfile RUN executes one or more arbitrary commands (functions) whose input and solution spaces include the image’s file system and unlike ADD, whose input is limited to a protected context and whose execution implicitly commits the file system before performing the next Dockerfile command, RUN’s ability to potentially destructively write to its inputs or even overwrite its functions increases the opportunity for side effects (especially harmful ones), thereby, potentially diminishing a Dockerfile’s declarative benefit.

In addition to solving the separation of Build vs. Runtime concerns, incorporating functions would:

  • Encourage innovation by quickly and easily extending the Dockerfile language in a very democratic way, as any Dockerfile developer can add functions in any implementation language they choose. Also, these function extensions can be shared with other developers through Docker Hub.
  • Extend Dockerfile abilities through a composition of parts offered by Docker Hub, thereby, maintaining the current size, stability, and reliability of the Docker Daemon eliminating the need to extend Builder with all the various combinations of functions offered by Docker Hub. In other words, incorporate your own and call other functions available as images on Docker Hub to provide endless possibilities without materially affecting Builder or the Docker Daemon.
  • Improve the reliability of the Docker Daemon since the containers used to execute user defined functions, would be isolated from the Daemon's process space, thereby, reducing the risk that anomalous side effects will negatively impact the execution of the Docker Daemon.
  • Relieve core developers from identifying and encoding language extensions permitting Docker to focus its efforts on other priorities.
  • Permit experimentation to discover language elements with more declarative benefit than then the existing ones, by thousands of developers, as they can create their own declarative language using functions as their building blocks.

It’s imperative that concepts/mechanisms, presented by a proposal, be first examined for their technical merit before considering implementation concerns, such as the syntax used to explain the mechanism or the difficulty in adapting the current code base.

  • The proposal's use of 'switches' like '–OUT' attempts to employ an understood convention that differentiates keywords from actual argument variables. If there's demonstrated interest in advancing the proposal then the proposed form of keywords used to map external variables to the function's input and output arguments can be amended to be aesthetically appealing.

Compare Function Idiom to Nested/Chained Build

Simple Dockerfile Example

What follows below is a simple Nested Build example taken from #8021 which will be used to contrast this approach proposed by #7149, as implemented by #8021, to the Function Idiom.

Parent build step can be also used for dynamic Dockerfile creation. In this case BUILD instruction needs to be the final instruction in the Dockerfile.

FROM ubuntu
ADD . /src
RUN /src/build-new-dockerfile.sh > /src/Dockerfile
BUILD /src

Number of build steps isn't limited. Last image gets the tag.

cat <<EOT > Dockerfile
FROM busybox
RUN /bin/mkdir -p /out && /bin/touch /out/a
BUILD /out

FROM busybox
ADD . /out
RUN /bin/mkdir -p /out && /bin/touch /out/b
BUILD /out

FROM busybox
ADD . /out
RUN /bin/mkdir -p /out && /bin/touch /out/c

EOT

Below, the above Nested Build example encoded employing Function Idiom approach.

DEF FUN touchIT
    FROM ubuntu
    ENV OUT_FILE  /out/touchfile
    RUN echo ‘#!/bin/bash’                      >/touchIT.sh     ;\
        echo 'mkdir -p /out && touch $OUT_FILE' >>/touchIT.sh    ;\
    ENTRYPOINT /touchIT.sh
END DEF FUN

FROM busybox
FUN touchIT --OUT OUT_FILE /out/a
FUN touchIT --OUT OUT_FILE /out/b
FUN touchIT --OUT OUT_FILE /out/c
Function Idiom (FI) Nested Build (NB) Remarks
FUN invocation atomically (one statement) describes what to be done and the arguments involved, not how to do it. Therefore, touchIT’s implementation/body could change, without effecting its invocation, as long as its interface doesn’t change. There is no separation between what & how. Variables are intimately coupled to implementation. FI: delivers a solution with demonstrated declarative benefit as it separates what from how and concisely specifies what in a single statement.
Function body statements couple to variables and variable names internal to the function (image), Binding statements, ADD and BUILD couple to adjacent BUILD and ADD statements as well as core statements, like RUN, which either consume resources provided by ADD (net inputs) or produce net output resources created/copied to the resulting build context passed by BUILD to the subsequent step. FI Functions are easily reused and composable, as its body reflects coupling to objects internal to itself. Popular functions can be shared through Docker Hub. NB: difficult to reuse or compose a solution from build steps, as statements within a step bind to both the initially provided build context and the build context pass to the next step. Also, current NB proposal would need to be extended to support labeling build steps, in order to effectively reuse them.
Binding commands for '–OUT' variables are automatically generated and provide encapsulation. Binding commands of ADD and BUILD must be encoded by the developer. FI: offers greater declarative benefit than NB, as FI automatically generates primitive operations
The FUN invocations can obviously be reordered without affecting the resulting image. The binding statements, ADD and BUILD, that surround the core statements, in this case RUN, must be reviewed to determine if they're compatible to the structure of adjacent build contexts determined by the prior BUILD step as well as the build context structure expected by a succeeding BUILD step. Additionally, changes to ADD may or BUILD will require alteration to core statements to either reflect a new source location for consumed resources or new destination location for produced resources. Due to FI's encapsulation mechanisms, it's much easier to avoid harmful coupling and other side effects, reducing dependencies between statements, thereby, permitting their unencumbered reordering.
Only a single implicit build context. One implicit and three explicit build contexts. FI: better matches problem space. NB: introduces needless build contexts.
Improves performance when building from cache. A cache check is performed for only the invocation statement. In this situation 4 comparisons are performed. A cache check is performed for every nested build statement. For this example 10, cache evaluations are performed. The FI encapsulation mechanisms limit cache evaluation to what's necessary: the input/output argument bindings and the input file checksums defined by its invocation. Since NB exposes its implementation, a checksum comparison must be performed for every statement.
Potentially quickens build performance, as only the first function invocation requires execution. The subsequent invocations can be short circuited to a single copy operation, instead of actually running the container. All statements would be executed. FI: Encapsulation facilitates reasoning, by abstracting the function's body, improving its declarative benefit, which in this case, promotes effortless recognition of local/spot optimizations. If the function 'touchIt' is considered a pure function and since the input variable values for each invocation are identical (there are none), it should generate the same output variable values, in this case an empty file. Therefore, only a single copy operation need be executed by subsequent invocations to transfer the output file, produced by the first invocation, to the resultant image's file system. NB: Because its encoding reflects implementation, the reasoning innate to the FI approach has to be distilled from NB's implementation to identify similarities. This would require an analytical algorithm to convert NB build steps into an abstract form that could then be used to search for compatible build steps. Without the complexity of this analytical algorithm, NB cannot discern optimizations that are freely available to FI.
In 4 statements, concisely expresses what's to be done in a manner that's intuitively understood by programmers. Lacks concise statement of what as it's integral to how. Introduces new "chained/nested build" approach. FI: Leverages a universally understood programming idiom to accelerate a solution's development and its assimilation. NB: Introduces an unfamiliar idiom requiring training, diminishing an experienced programmer's ability to quickly develop new or understand existing solutions. To appreciate the differences between these approaches, simply change the final destination directory of the three files 'a', b', and 'c' from '/out' to '/touched' for each approach without adding additional statements. FI requires adapting only 3 easily identifiable statements while NB requires changes to 7.

Substantive Dockerfile Example

Purpose:

To present a simple but more realistic Dockerfile example involving an existing Docker Hub image which transforms one or more input files into a dependent output file. The example consists of an initial go application that decides which one of two competing strategies to execute when solving a problem. The competing strategies are also written in go. All go programs are linked as static images. The selected Docker Hub image: "google/golang" executes a go compiler request converting source file(s) to a dependent executable. The resultant image's Dockerfile reflects the task of generating the executables and adding them to the minimal "scratch" image.

Build Context:

./goCompileStatic.sh
    #!/bin/bash; CGO_ENABLED=0; go get -a -ldflags '-s' app;
/app
    main.go
/stgt1
    main.go
/stgt2
    main.go

# Function Idiom Dockerfile solution:
FROM scratch
  FUN google/golang --IN ./app/*   /$GOPATH/src/app/ ./goCompileStatic.sh / --OUT /$GOPATH/bin/app /app   /goCompileStatic.sh
  FUN google/golang --IN ./stgt1/* /$GOPATH/src/app/ ./goCompileStatic.sh / --OUT /$GOPATH/bin/app /stgt1 /goCompileStatic.sh
  FUN google/golang --IN ./stgt2/* /$GOPATH/src/app/ ./goCompileStatic.sh / --OUT /$GOPATH/bin/app /stgt2 /goCompileStatic.sh
  ENTRYPOINT  [ "/app", "/stgt1", "/stgt2" ]

# Nested/Chained Build Dockerfile solution:
  FROM google/golang
  ADD ./app/* $GOPATH/src/app/
  ADD ./goCompileStatic.sh /
  ENV outApp=/context/ outBin=/outApp/bin/
  RUN /goCompileStatic.sh        ; \
      mkdir -p $outBin           ; \
      mv /$GOPATH/bin/app $outBin
  ADD ./goCompileStatic.sh $outApp
  ADD ./stgt1 $outApp
  ADD ./stgt2 $outAppp
  BUILD $outApp

  FROM google/golang
  ADD ./stgt1/* $GOPATH/src/app/
  ADD ./goCompileStatic.sh /
  ENV outApp=/context/ outBin=/$outApp/bin/

  RUN /goCompileStatic.sh        ; \
      mkdir -p $outBin           ; \
      mv /$GOPATH/bin/app $outBin/stgt1
  ADD ./goCompileStatic.sh $outApp
  ADD ./bin $outBin
  ADD ./stgt2 $outAppp
  BUILD $outApp

  FROM google/golang
  ADD ./stgt2/* $GOPATH/src/app/
  ADD ./goCompileStatic.sh /
  ENV outApp=/context outBin=/$outApp/bin/

  RUN /goCompileStatic.sh        ; \
      mkdir -p $outBin           ; \
      mv /$GOPATH/bin/app $outBin/stgt2
  ADD ./bin $outBin
  BUILD $outApp

  FROM scratch
  ADD ./bin /
  ENTRYPOINT  [ "/app", "/stgt1", "/stgt2" ]
Function Idiom (FI) Nested Build (NB) Remarks
Enhance "apparent" build performance (but not total consumed CPU time), as it's easy to simultaneously execute each function as a separate process. Once complete, the output artifacts for a function can then be added, in the order corresponding to their invocation, to the resulting image conforming to a Dockerfile's implicit commit constraint. Would require a complex optimization algorithm (relative to FI's approach) to decouple the artificial output to input dependencies forged by the chained build contexts (artifacts generated by step N that pass through intermediate steps until reaching final destination M,where M>N+1 ). Additionally, the optimization process would have to identify and isolate the essential core statements of a chained step (a step's body) so they can be encapsulated and executed independently of another chained step. The encapsulation and dynamic coupling mechanisms innate to FI can be leveraged to simplify an optimizer required to realize concurrent/parallel execution. Since a function's defined interface permits easy recognition of dependencies between it and other functions, in this example all the inputs for each function are not dependent on an output of a prior function call, function invocations can be executed in parallel, accelerating the apparent execution speed, when compared to serially running the invocations. Also, the algorithm needed to isolate and dispatch the code to the parallel process would simply transfer the code already delineated by the function's body. NB requires a more costly and complex optimization scheme. NB needs an initial, additional step to essentially constitute the interface and body elements innate to the FI approach from the code before running the next optimization step - the one which reviews the dependencies between function invocations to determine their ability to be independently executed.
Maintains concision in an example that's more reflective of actual usage where output artifacts are produced from input ones. 5 total manually encoded operations/statements. It's concision rapidly deteriorates due to the compounding effect of transferring both input and output artifacts dependencies through artificial build contexts and the duplication required to copy the implementation of core statements (RUN). 25 total operations. 28-(3 ENV statements), ENV statements aren't necessary. Although the total number of Dockerfile statements can be reduced to 21 by specifying multiple copy operations via a single Dockerfile command, the total number of manually encoded operations remains 25. Due to FI's encapsulation mechanisms it maintains concision increasing its declarative benefit when encoding solutions more reflective of the complexity experienced in actual usage. NB's limited encapsulation and tight coupling rapidly deteriorate its declarative benefit. Resulting in a solution that's, difficult to understand maintain, and whose cache build performance significantly slumps, as it requires 21 checks vs. FI's 5.
Compose solutions directly using Docker Hub images without requiring inline functions. Compose solutions directly using components from Docker Hub. FI and NB can compose solutions from existing images.

@bketelsen
Copy link

Best proposal ever. You win the Internet.

@WhisperingChaos
Copy link
Contributor Author

Thanks @bketelsen for your encouraging review of the proposal!

I'm hoping the community's support and the proposal's technical merit will convince core maintainers of its utility to improve a Dockerfile's ability to address a range of build concerns.

@WhisperingChaos
Copy link
Contributor Author

Adoption of this proposal would negate the necessity for/complement the following proposed features:

  • Proposal: Dockerfile BUILD instruction #7149 @proppy 'Dockerfile BUILD instruction'
    Function Idiom proposal negates the need for Proposal: Dockerfile BUILD instruction #7149 for reasons and as demonstrated through an example in a prior post to this thread.

  • Dockerfiles should have a way to perform multiple build actions in one commit #2439 @bwilkins 'Dockerfiles should have a way to perform multiple build actions in one commit'

  • BEGIN/COMMIT syntax for Dockerfile #8574 @burke 'BEGIN/COMMIT syntax for Dockerfile'
    Both proposals Dockerfiles should have a way to perform multiple build actions in one commit #2439 & BEGIN/COMMIT syntax for Dockerfile #8574 implement a scoped, explicit commit operator to essentially avoid pollution of the resulting image with transient build artifacts that may no longer be directly accessible through the file system but continue to exist and consume space in a hidden layer. Since the Function Idiom approach isolates build artifacts from the resulting image's file system and allows selective copying of the a given function's artifacts from its container to the resulting image's file system, the concerns which prompted these Proposals would be eliminated.

    Additionally, due to Function Idiom's ability to aggregate one or more build operations, a function can be devised to represent the desired commit scope, utilizing the current implicit commit which occurs after each Dockerfile command. Again this negates the need for an explicit commit.

  • Proposal: Dockerfile add INCLUDE #735 @dysinger 'Proposal: Dockerfile add INCLUDE'
    Although I appreciate Proposal Proposal: Dockerfile add INCLUDE #735 applicability in situations involving inclusion without differentiation, for example, running packaging commands that assume sensible defaults, INCLUDE lacks the isolation between build and runtime concerns and encapsulation provided by Function Idiom. Even though package management functions address most of these concerns, for example, executing clean up routines after a successful install thereby removing build artifacts from the runtime image, other issues requiring isolation, like license keys, aren't evident until they surface.

    What if INCLUDE presented itself as a template coupled to environment variables? The additional adaptability would allow this approach to be more useful, however, for complex INCLUDE bodies, the number of environment variables needed to bind Dockerfile commands with their proper values maybe cognitively oppressive. Additionally since the environment variable namespace is the same for all INCLUDE bodies, it would be problematic to support environment variables that happen to have the same name but require different values. That said, perhaps adapting INCLUDE to incorporate an interface, similar to template/macro parameters and an explicit parameter substitution phase would improve the approach's encapsulation permitting its wider application to more complex problems. If INCLUDE was implemented more akin to a parametrized template, then this and Function Idiom approach would be complementary ones.

@jessfraz
Copy link
Contributor

Hello!
We are no longer accepting patches to the Dockerfile syntax as you can read about here: https://github.com/docker/docker/blob/master/ROADMAP.md#22-dockerfile-syntax

Mainly:

Allowing the Builder to be implemented as a separate utility consuming the Engine's API will open the door for many possibilities, such as offering alternate syntaxes or DSL for existing languages without cluttering the Engine's codebase

Then from there, patches/features like this can be re-thought. Hope you can understand.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/builder kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny
Projects
None yet
Development

No branches or pull requests

4 participants