Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[aws-lambda-python]: Ability to customize build environment #16234

Closed
SamStephens opened this issue Aug 25, 2021 · 14 comments
Closed

[aws-lambda-python]: Ability to customize build environment #16234

SamStephens opened this issue Aug 25, 2021 · 14 comments
Labels
@aws-cdk/aws-lambda-python effort/medium Medium work item – several days of effort feature-request A feature should be added or improved. p2

Comments

@SamStephens
Copy link
Contributor

SamStephens commented Aug 25, 2021

Allow the ability to customize the environment Python builds are done within, whilst still taking advantage of the simplicity of what PythonFunction provides.

Specifically, without requiring us to provide a custom docker image, allow us to specify custom docker volumes, and custom shell commands to run before the build.

This is a similar but different requirement to Allow the use of CodeArtifact, and it may be that the same solution applies for both use cases.

Please note that I'm explicitly avoiding customisation of the Docker image build; this is customisation of how the python build is executed using a Docker image.

Use Case

I want my Lambda code to be able to have dependencies on packages in Github private repositories. To allow from this, I want to be able to copy my SSH keys from my host machine into the build Docker volume, so that the build can authenticate to Github using my SSH keys.

Proposed Solution

Allow for syntax something like:

    aws_lambda_python.PythonFunction(
        scope=self,
        id="FunctionId",
        handler="handler",
        runtime=aws_lambda.Runtime.PYTHON_3_8,
        entry="source-entry",
        prebuild_command=[
            "bash",
            "-c",
            "cp -r /tmp/ssh/* ~/.ssh/",
        ],
        build_docker_volumes=[
            core.DockerVolume(
                container_path="/tmp/ssh",
                host_path=f"{Path.home()}/.ssh",
            ),
        ],
    )

This is a 🚀 Feature Request

@SamStephens SamStephens added feature-request A feature should be added or improved. needs-triage This issue or PR still needs to be triaged. labels Aug 25, 2021
@setu4993
Copy link
Contributor

Hey @SamStephens! Good request. I have a PR that would make this possible here: #15324.

Curious if you have thoughts around modifying it to better support your use case as well.

@DarrenForsythe
Copy link

I'm not sure that would solve my use case. Currently we need to add an intermediate certificate in some circumstances which I believe would require the ability to at least run something similar the following on the container,

ADD dir-containing-intercepting-cert /usr/local/share/ca-certificates
RUN update-ca-certificates

@SamStephens
Copy link
Contributor Author

SamStephens commented Sep 28, 2021

I'm not sure that would solve my use case. Currently we need to add an intermediate certificate in some circumstances which I believe would require the ability to at least run something similar the following on the container,

ADD dir-containing-intercepting-cert /usr/local/share/ca-certificates
RUN update-ca-certificates

So what you're looking for is customisation of the Docker image, which is exactly what I'm trying to avoid here. Every existing solution the CDK provides to the problems I'm trying to solve require you to use a custom Docker image.

I want the flexibility to run shell commands and mount docker volumes without having to worry about Docker images, to be able to rely on aws_lambda_python.PythonFunction to deal with knowing with Docker image I should use for building with my Python runtime.

Basically your problem is orthogonal to mine (unless you could do your customisation by running shell commands instead of customising the Dockerfile).

What @setu4993 is describing will solve your needs though, I think.

@SamStephens
Copy link
Contributor Author

@setu4993 what you're doing is interesting, but I think it's orthogonal to what I'm suggesting. You're customising the way the Docker image is built, allowing customisation of the SAM provided Python build image, which is really useful, and likely solves the problem @DarrenForsythe faces.

However what I'm needing is customisation of how the Docker container is then invoked after it is built.

I guess including SSH keys could be done during Docker build.

But then it doesn't meet the need of cases like providing Code Artifact credentials, or other short lived credentials; to provide these credentials, we want to be able to customise the invocation of the Docker container, not how it's built.

@setu4993
Copy link
Contributor

@DarrenForsythe :

I'm not sure that would solve my use case. Currently we need to add an intermediate certificate in some circumstances which I believe would require the ability to at least run something similar the following on the container,

ADD dir-containing-intercepting-cert /usr/local/share/ca-certificates
RUN update-ca-certificates

That's exactly the type of use case #15324 is supposed to help with :). If #15324 is merged, you'd be able to provide a custom Dockerfile.build (with those additional docker steps) and use that instead of the default build image.

@SamStephens :

what I'm needing is customisation of how the Docker container is then invoked after it is built.

I see. I might have misunderstood the request in the issue, my bad.

But then it doesn't meet the need of cases like providing Code Artifact credentials, or other short lived credentials; to provide these credentials, we want to be able to customise the invocation of the Docker container, not how it's built.

I said this on the PR (#15324 (comment)) but addressing separately since it's a slightly larger point here. Being able to customize the build image solves the issue of credentials (Code Artifact or others) required for the purpose of building the Lambda function package.

However, if the credentials are required at function runtime, my 2 cents are that a more appropriate place might be to run it in the function.

If the goal is:

I want my Lambda code to be able to have dependencies on packages in Github private repositories. To allow from this, I want to be able to copy my SSH keys from my host machine into the build Docker volume, so that the build can authenticate to Github using my SSH keys.

that to me seems like a requirement for function build time, not runtime, which #15324 could resolve.

@setu4993
Copy link
Contributor

Specifically, without requiring us to provide a custom docker image, allow us to specify custom docker volumes, and custom shell commands to run before the build.

I could be mistaken, but volumes can be referenced at runtime, not at build time.

@SamStephens
Copy link
Contributor Author

@setu4993 so I should apologise for anything I misunderstand, because I'm not all that familiar with how Docker images are build for use in the CDK.

To clarify, nothing here involves function runtime. The distinction is whether customisation of the function build is done via Dockerfile.build to change the Docker image used for the build, or whether customisation is done by changing how the Docker image is invoked after it is built.

My idea of the way to do the customisation I need is changing how the docker image is invoked, by allowing me to provide custom shell commands and mount volumes on the docker image. Essentially I want to do what #10298 (comment) is doing, except I want to continue to allow aws_lambda_python.PythonFunction to provide the SAM build docker image, and only customise how it is invoked.

It may be that customising Dockerfile.build can also achieve the things I want, and I simply misunderstand how Docker images are built. If so, apologies for my lack of understanding making this more drawn out than it needs to be.

There are two preconceptions I have about the Docker build, I would love to be wrong about both of them:

  • That once a docker image is built for a particular Dockerfile.build file, it is cached. It will then only be rebuilt if Dockerfile.build changes or if the upstream image changes.
  • If this is not true, or if I'm missing a way to use this build process to make the image update as the credentials do, that having to rebuild the Docker image every time I do a build will be substantially slower than being able to build and cache a Docker image and simply change the parameters I invoke it with to deal with the changing credential.

FYI, I hope I'm not seeming too negative here; what you've done is super useful, and even if it doesn't suit Code Artifact short lived credentials, I'll be using it for SSH keys, so thanks.

@setu4993
Copy link
Contributor

Hey @SamStephens! Thanks for that input. I don't think you're being negative or rude. I can share a few things that might help clarify the confusion and why I continue to believe #15324 would satisfy both the issues.

On Code Artifact

Code Artifact can be authenticated in multiple different ways. It can be done with a pip.conf (as illustrated in #10298 (comment)) or by exporting environment variables (what I proposed and use as in #10298 (comment)).

My best understanding of building Lambda functions (normal or Python) are below, which illustrate how they are different and why my proposal solves requests from both the issues.

On Building Lambda Functions

If using aws_lambda.Function, a separate bundling image can be used (as shown in #10298 (comment)). When doing that, what occurs is:

  1. A previously built Docker image is run.
  2. The volumes specified are attached.
  3. The commands specified are used to install the packages in a certain directory (which is why that comment includes a pip install ...).
  4. The directory is extracted from the container into a package that is used at runtime for the Lambda function.

But, building Python functions are different.

On Building Python Lambda Functions

If using aws_lambda_python.PythonFunction, the packaging step is split into 2 steps: Installing dependencies and copying local packages, and it is a executed as a process of building Docker images, from which the package is extracted.

(Separating these into 2 steps is great because credentials would only exist when dependencies are being installed.)

The current values in CDK ensures that steps 1, 3 and 4 from above are set and CDK is opinionated about it. My PR (#15324) would make 1 and 3 customizable, while not supporting 2 (which is a Docker image runtime feature) and 4 is conducted on a separate image (which continues to be this Dockerfile).

How to Pass in GitHub Certificate

If the Dockerfile.build was to be customized to an edited version of Dockerfile.dependencies where a local SSH key is copied into the build dependencies Docker image (which would really come down to adding lines similar to #16234 (comment)).

@SamStephens
Copy link
Contributor Author

Thanks, clearly my preconceptions were incorrect, thanks for your generosity explaining. Your feature sounds powerful given that, thanks.

I guess my one minor quibble is that if I understand you correctly, to use your feature to (for example) customise to include my SSH keys, I'm going to have to duplicate the contents of Dockerfile.dependencies in my custom Dockerfile.dependencies and then add the additional lines needed to copy in my SSH keys. To me this isn't ideal, because it leaves room for your customised Dockerfile.dependencies to drift apart from the canonical aws-lambda-python Dockerfile.dependencies as the CDK evolves. I'd prefer to be able to specify additional commands to include in the canonical aws-lambda-python Dockerfile.dependencies.

Having said that, I don't think my concerns are hugely concerning considering Dockerfile.dependencies will likely change very slowly, and what you are offering is worlds ahead of where we are now.

@setu4993
Copy link
Contributor

setu4993 commented Oct 4, 2021

I guess my one minor quibble is that if I understand you correctly, to use your feature to (for example) customise to include my SSH keys, I'm going to have to duplicate the contents of Dockerfile.dependencies in my custom Dockerfile.dependencies and then add the additional lines needed to copy in my SSH keys. To me this isn't ideal, because it leaves room for your customised Dockerfile.dependencies to drift apart from the canonical aws-lambda-python Dockerfile.dependencies as the CDK evolves. I'd prefer to be able to specify additional commands to include in the canonical aws-lambda-python Dockerfile.dependencies.

I absolutely agree. My ideal case would be to have that be easier to customize and reuse instead of copied over and then customized.

One of the reasons I think the current setup of Dockerfile.dependencies and Dockerfile is better than the one for aws_lambda.Function is because it splits the step into 2 parts: One for getting dependencies, another for getting function code. Splitting over to a different setup just for dependencies is nice because that can be customized depending on project-level requirements rather than CDK-level.

That allows for the ability to customize a few different things:

  1. Customizing how packages get installed (pip, poetry, something else).
  2. Adding credentials to the build step (Code Arifact, SSH keys, etc.).
  3. Adding caching.
  4. Passing in custom build secrets, environment variables.

We usually run in a build environment that doesn't require pipenv (which PythonFunction supports). So, for our use case, we'd likely just drop that line (and conditionals) from our custom Dockerfile.dependencies.

It's not a perfect solution and still relies on some duplication, yes, but it affords enough flexibility that IMHO makes it worth it.

Again, all of this is contingent on the PR being accepted :).

@setu4993
Copy link
Contributor

setu4993 commented Oct 4, 2021

which I believe would require the ability to at least run something similar the following on the container,

ADD dir-containing-intercepting-cert /usr/local/share/ca-certificates
RUN update-ca-certificates

add the additional lines needed to copy in my SSH keys

FWIW, @SamStephens and @DarrenForsythe, Docker recently started supporting passing in SSH keys using build secrets, that might be a more secure way than copying them in / using volumes.

@nija-at nija-at added effort/medium Medium work item – several days of effort p2 and removed needs-triage This issue or PR still needs to be triaged. labels Oct 14, 2021
@nija-at nija-at removed their assignment Oct 14, 2021
mergify bot pushed a commit that referenced this issue Dec 30, 2021
…mage (#18082)

This refactors the bundling process to match the NodeJs and Go Lambda functions and allows providing a custom bundling docker image.

Changes:
- refactor bundling to use `cdk.BundlingOptions`
- Use updated `Bundling` class
- Update tests to use updated `Bundling` class


Fixes #10298, #12949, #15391, #16234, #15306

BREAKING CHANGE: `assetHashType` and `assetHash` properties moved to new `bundling` property.

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
@setu4993
Copy link
Contributor

@SamStephens : #18082 support for customizing the bundling Docker image for Python Lambdas. So that might be an alternative to use here (with SSH keys passed as build secrets).

@corymhall
Copy link
Contributor

@SamStephens I'm going to close this issue in favor of a couple of more specific issues. With the recent updates that were made this should allow you to customize the build environment. If there are any other missing features that you are looking for we can create specific issues to address those.

  1. (lambda-python): add local bundling to PythonFunction #18290 local bundling
  2. (aws-lambda-python): Command Hook support #18621 command hooks

@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

TikiTDO pushed a commit to TikiTDO/aws-cdk that referenced this issue Feb 21, 2022
…mage (aws#18082)

This refactors the bundling process to match the NodeJs and Go Lambda functions and allows providing a custom bundling docker image.

Changes:
- refactor bundling to use `cdk.BundlingOptions`
- Use updated `Bundling` class
- Update tests to use updated `Bundling` class


Fixes aws#10298, aws#12949, aws#15391, aws#16234, aws#15306

BREAKING CHANGE: `assetHashType` and `assetHash` properties moved to new `bundling` property.

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-lambda-python effort/medium Medium work item – several days of effort feature-request A feature should be added or improved. p2
Projects
None yet
Development

No branches or pull requests

5 participants