-
Notifications
You must be signed in to change notification settings - Fork 18.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Extending Build Context with Intermediate State #12415
Comments
@erikh @tiborvass @proppy @burke @discordianfish Mechanisms promoted by the following proposals:
guided the formulation of the following proposals, that together, may lead to an incremental solution to properly separating build-time and run-time concerns:
Read #12415 TLDR and view its Example, as it concisely (for me) demonstrates both mechanisms. It's worth a look ... |
The "Proposal: SLINK instruction for Dockerfile" #7654 @cmfatih furthers the notion of addressability conferring the following benefits as it:
I would suggest that checksums produced by SLINK represent the "actual" file/path checksums not the file/path checksum computed for the symbolic link, as the symbolic link should present a proxy indistinguishable from the actual artifact it represents. |
@thaJeztah |
I worry that this may be too complicated for the typical Docker use-case.
|
Neither feature is required. However, if you want added security/reliability, the ability to reformulated the build context to match the ONBUILD triggers executed by library go and/or ruby images, a true separation of build-time and run-time concerns, then supplying input/output mapping specifications seems a small price to pay. It's also familiar, as it acts like an argument list to a function call. Do you have something in mind that's more minimal than specifying the data dependencies? |
Honestly it may that me a bit to grok this full PR. To be honest, that may be its downfall. When I read through the example it's not obvious what is going on. Docket generally favors usuability and simplicity often at the cost of correctness. So I will try to understand the concepts of this PR and then maybe they can be presented in a slightly different form. |
I appreciate your responding to my request to review the two proposals.
Sometimes, the perspective of a stranger in a strange land at least provides some level of insight into a problem that may be improved by others. Remember what happens if you don't grok... I can be available via IRC/Skype if you which to discuss the mechanisms and reasoning behind their application. Also, I'm interested in understanding the “complexity” argument. Certainly there's much to read but the essential mechanism is bind mount influenced by the set Union operator. It is this composite mechanism that implements the mapping function common to CONTEXT and MOUNT. The mapping function's arguments are pairs of file/directory names which simply declare the associations, that in the case of CONTEXT, capture the specific input dependencies & structure required by, and expose output artifacts (MOUNT) produced by the processes running within the Dockerfile commands associated to a given FROM statement. Since these dependencies are unavoidable, exposing them facilitates their management by, for example, automatically generating the binding code, reducing the complexity required to write a Dockerfile. Perhaps comparing this Proposal's example solution of (6 statements) above to a similarly encoded Chained Build solution (23 statements) below may help?:
Note - every Dockerfile statement other than FROM represents binding requests to either couple build context objects, that are inputs to the go compiler, to the particular container's file system, or bind generated output objects and those input objects not processed yet, to form a new (extended) build context for the next build step. Implementing CONTEXT and MOUNT eliminates the complexity of manually coding, debugging, and maintaining these 17 extra Dockerfile commands. There are other important differences between the approaches but I don't want to overwhelm. |
I would very excited to see this implemented! My few thoughts on this are:
The bigger issue I am seeing is that people using Node.js, Python, Ruby or other dynamic language don't realise that they have heaps of build-time-only dependencies bundled into their runtime images, things like the compiler that is used to build native extensions and all the Speed of deployment and security surface are probably the most important aspects, it's not the size on disk or theoretical purity. |
Thanks! Community support is critically important to convince core maintainers to adopt both proposals. Therefore, if you know of others interested in achieving what's proposed, please let them know.
I'll put together a TLDR section for the companion proposal #12072. I hope the TLDR above addresses your concern for this one? When implementing proposals whose effect broadly impacts a critical component, like Builder, I would want a thoughtful assessment presenting the reasoning behind the proposal and its scope. Besides what your see is an approach that's a bit ingrained, as I prefer to deconstruct concepts to more abstract ones in order to identify similarities that can then be leveraged to hopefully arrive at a minimal solution. For example, both CONTEXT and MOUNT are very closely related which should improve the reliability of the resultant code and minimize the time required to develop it.
|
Why do you worry that this may be too complicated for the typical Docker use-case? |
Hello! Mainly:
Then from there, patches/features like this can be re-thought. Hope you can understand. |
TOC
Background
Essential
TLDR
Syntax
Semantics
Benefits
Example
Description
This proposal outlines another solution to separate build-time from run-time concerns. Its predecessors #7115, #7149, and #8660 present thorough summaries of the issues arising from the current build system's inability to completely isolate these concerns. Therefore, this proposal, at this time, will not duplicate their content but instead, will focus on describing its solution.
Background
Highly declarative languages rely on the notion of immutable values when executing a program. The docker build context embodies this notion, as it represents an immutable set of values employed to create an image(s). However, although the initial provided state to a declarative program represents all the information that's necessary to produce the final result, the program itself is generally decomposed into a set of transforms which themselves provide/represent intermediate state information required to eventually compute the result. Once produced, this intermediate state information must also be immutable and its production must not destructively overwrite existing execution state. For example, the declarative program (-1 * cos(X)) where X is bound to the value 0 degrees, requires the computation of the transform cos(0) before evaluating the multiplication operation. Computing cos(0) extends the execution state/context to include value 1. Notice, extending the execution state preserves value 0, (it's not destructively overwritten) leaving variable X still bound to this same value.
In an analogous way, a Dockerfile FROM and its associated Dockerfile commands: its ImageContext can implement intermediate transforms when executing process(es) by directly specifying RUN or indirectly through an ONBUILD RUN .., trigger. As above, these intermediate transforms must nondestructively extend the current execution state with intermediate or final values required respectively by a succeeding, dependent transform or to construct the desired image. The new values must also be addressable, so they can be bound (coupled), making their values visible (accessible) or unbound, preventing access to them.
Indeed, the Nested/Chained Build proposal #7149 accomplished the objectives of nondestructively extending the build context and ensuring the addressabilty of execution state, as each intermediate build step would reconstitute the entire build context required to satisfy the remaining ones. Reconstructing the build context included operators to selectively propagate an existing value, by assigning it an address (path/file name) or extending the build context by including values generated by the current step. Both the addressability and extension operators accomplished their tasks by physically copying execution state to a directory within the current step's filesystem. For example, the addressibility operator, ADD, transfers the pre-existing build context values, required by the remaining steps, to the image's file system at a specific address (path), while the implementation of the extension operator relies on processes executing within an ImageContext to eventually write their values to the same directory/subdirectory, as targeted by the addressability operator (ADD).
Although Nested/Chain Build, provided a means to extend the build context and permits its addressability, its lack of a coupling mechanism to declare a value to be either bound (visible) or unbound (inaccessible) to a variable (path/file reference), results in tightly coupled code, artificial dependencies between build steps, and other undesirable traits as discussed in these posts: #7149 comment 1, #7149 comment 2, and demonstrated through coding examples: Compare Function Idiom to Nested/Chained Build. That said, what if Nested/Chained Build's incremental evolutionary approach to deliver isolation between build-time and run-time concerns, as compared to #8660 Function Idiom's more extensive one, be realized without Nested/Chained Build's drawbacks?
Essential
As outlined above a solution would require mechanisms to:
Since this Proposal: Dynamic Coupling via Local Build Context details and suggests syntax to implement a capable coupling/binding mechanism, the remainder of this description will outline the implementation of a language feature to deliver the other required mechanisms mentioned above.
Through its implicit container commits,
docker build
already preserves intermediate execution state and enforces its immutability eliminating the need to formulate another mechanism. Therefore, what remains unimplemented is a suitable addressability mechanism, which until recently, was absent from the Dockerfile language. An appropriate addressability mechanism can be found in the deferred proposal: Build multiple tagged images per Dockerfile #3251 which alluded to "Adding layers to a base image from arbitrary points in the current build" and in particular, this post demonstrates an addressability mechanism, based on the TAG command, which would eliminate the unnecessary physical copying imposed by Nested/Chained Build's implementation. Although deferred, the proposal's addressability mechanism reappears with the recent introduction of the Dockerfile LABEL command as discussed, implemented by Proposal: One Meta Data to Rule Them All => Labels #9882.TLDR
Borrowing from these efforts, this proposal presents an addressability mechanism that integrates LABEL, but doesn't necessitate it's use, to extend the initial build context with references to immutable execution state from other already performed and terminated ImageContexts encapsulated within the given Dockerfile. This addressability mechanism is similar to a bind mount or symbolic link in that, it extends an addressing scheme, in this instance, the build context directory structure, with references to another addressed context, in this case an ImageContext:directory, to provide addressability to the immutable values exposed by ImageContext:directory.
The remainder of this proposal will discuss the feature's Syntax, Semantics, provide an Example , and list its Benefits.
Syntax
Semantics
Benefits
Example
Purpose:
The example consists of an initial go application that decides which one of two competing strategies to execute when solving a problem. The competing strategies are also written in go. Assume all go programs are linked as static images. The selected Docker Hub image: google/golang-runtime executes a go compiler request converting source to a static executable via ONBUILD triggers. The Dockerfile reflects the task of generating the executables and adding them to the minimal "scratch" image.
Build Context:
/
Dockerfile
/app
main.go
/stgt1
main.go
/stgt2
main.go
Dockerfile
docker build
's -t option since it's not exposing intermediate state to be used by other build steps, it's most likely a resultant image that accepts content from other FROMs.The text was updated successfully, but these errors were encountered: