Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws-s3-deployment: Transfer asset without extracting contents of .zip files #8065

Closed
1 of 2 tasks
baer opened this issue May 18, 2020 · 27 comments
Closed
1 of 2 tasks
Labels
@aws-cdk/aws-s3-deployment effort/medium Medium work item – several days of effort feature-request A feature should be added or improved. p1

Comments

@baer
Copy link
Contributor

baer commented May 18, 2020

Allow the S3 Deployment module to transfer assets to a new bucket without extracting the contents.

Use Case

Some AWS services work natively with zip files. An example is Elastic Beanstalk's SourceBundle. Today, in order to work with these services you must:

  • Rely on zip files in the CDK bucket containing assets
  • Zip files locally, wrap them in a directory (so S3 Deployment doesn't unzip them) and continue using S3 Deployment
  • Use an event to re-zip files after transfer
  • Manual transfer

All of these are viable, but clumsy next to the fantastic S3 Deployment module.

Proposed Solution

Add a property to BucketDeploymentProps, that disables this line of code in the deployment lambda.

  • 👋 I may be able to implement this feature request
  • ⚠️ This feature might incur a breaking change

This is a 🚀 Feature Request

@baer baer added feature-request A feature should be added or improved. needs-triage This issue or PR still needs to be triaged. labels May 18, 2020
@iliapolo
Copy link
Contributor

Hi @baer

Just to make sure I fully understand, are you saying you want to zip a local directory and transfer it in its zipped to a bucket? What do you mean by:

  • Use an event to re-zip files after transfer
  • Manual transfer

I would imagine the the first two steps should be sufficient.

@iliapolo iliapolo added the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label May 20, 2020
@baer
Copy link
Contributor Author

baer commented May 21, 2020

Yes, that's right. I'd like to:

  1. Zip a directory
  2. Send that directory to CDK's asset bucket as a zip file
  3. Use aws-s3-deployment to move that zip file to a new bucket without unzipping it.

This is basically what eb deploy does in the Elastic Beanstalk CLI, but could be generally useful for all sorts of things.

By "use an event to re-zip files after transfer" I meant that it would be possible to do this by creating a lambda + event notification on the target bucket and create a zipped version of the files that aws-s3-deployment adds. It's a bad idea. I was just trying to illustrate that it's currently difficult to use a zip file with CDK unless you want it to be expanded.

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label May 21, 2020
@iliapolo
Copy link
Contributor

@baer Got it. Thanks.

My concern here is that the fact CDK even zips up the asset directory, is somewhat an internal implementation detail. Notice that when a user defines a directory as an asset, no where does it say that the asset is going to be zipped:

new s3deploy.BucketDeployment(this, 'DeployWebsite', {
  sources: [s3deploy.Source.asset('./website-dist')],
  destinationBucket: websiteBucket,
 });

The intent of the user here is to copy the directory from the local machine to the destination bucket. Incidentally, CDK implements this by zipping up and extracting. But that could also have implemented in a different way.

Having said that, i'm not strictly opposed to adding such functionality, but if we do it naively, it might cause confusion. For example, say we did something like:

new s3deploy.BucketDeployment(this, 'DeployWebsite', {
  sources: [s3deploy.Source.asset('./website-dist')],
  destinationBucket: websiteBucket,
  extract: false
});

That would be unclear, because the property extract: false only makes sense to users who know exactly how the internal mechanism works. For others, I believe, it would be unclear.

Of the top of my head, perhaps we can do something like:

new s3deploy.BucketDeployment(this, 'DeployWebsite', {
  sources: [s3deploy.Source.zip('./website-dist')],
  destinationBucket: websiteBucket,
});

The Source.zip will zip up the asset, and wrap it in a directory. This API conveys the intent of the user to treat the directory as a zip.

What do you think? Would you like to continue the discussion on the API and take a stab at implementing it?

@iliapolo iliapolo added the effort/medium Medium work item – several days of effort label May 22, 2020
@SomayaB SomayaB removed the needs-triage This issue or PR still needs to be triaged. label May 27, 2020
@baer
Copy link
Contributor Author

baer commented May 27, 2020

Your concern is valid, and I'm very much on-board with trying to hide implementation details! I was confused about how Assets are supposed to be used when I was first learning CDK, so I'd love to help clarify this for others. Let's talk docs and communication first and the actual code second.

Docs

The docs around assets don't line up with this intent in this issue. The three asset docs (s3-asset, s3-deployment, and the CDK User Guide on Assets) all say or imply different stuff.

aws-s3-asset - Construct Docs

This doc says that an Asset is a thing that your stack needs and that it will be deployed to an S3 bucket. It then goes on to explain how this works, including info about zip and packaging. A few relevant quotes are below.

"when the stack is deployed, the toolkit will package the asset (i.e. zip the directory)"

Explaining the implementation details runs counter to your stated desire to treat asset-deployment, and asset by proxy, as a black box.

"When an asset is defined in a construct, a construct metadata entry aws:cdk:asset is emitted with instructions on where to find the asset and what type of packaging to perform (zip or file)."

This sort of implies the flexibility that I'm suggesting despite it not being possible.

aws-s3-deployment - Construct Docs

This doc says that an s3-deployment is a way to move files/folders, often Assets, into S3 buckets. It has some extensive explanation about how it works, including a bit that gives a specific CLI statement used in the implementation (aws s3 sync --delete against the destination bucket), which also runs counter to the idea of s3-deploy as a "black box."

One particular confusion about this doc is that the examples seem to imply that the bucket used for Assets is NOT where your assets should live long-term. Instead, you should use the Asset construct to upload your files initially, and the s3-deployment to move them to a final home. If this is what AWS' best practice is, I think it would be good to add a note to the Asset docs that say something like "Assets should be moved to a permanent bucket as a part of your stack. See s3-deployment."

CDK User Guide

Finally, there is the User Guide. This seems to imply that Assets shouldn't really be used directly at all, much less as a place to store things long-term by saying that "You typically reference assets through APIs that are exposed by specific AWS constructs."

There are similar disclaimers in several of the CDK construct docs that say "You will probably use ___ with the Construct.addX method, rather than directly." BasePathMapping is an example of this. If it's the intention of the CDK team that, over time Assets will mostly be used via the helper methods like lambda.Code.fromAsset, it would be good to add that to the docs.

Code

It's hard to know what to implement without first understanding the direction the team has for Assets. An additional option would be to consider this a corner case and expose elasticBeanstalk.Code.fromAsset once that construct does more than the Cfn generated code it is today.

@iliapolo
Copy link
Contributor

Hi @baer - Sorry for the delayed response.

Thanks for taking the time and detailing the confusion and intent around asset usage in the CDK. It definitely sounds like its worth some clarification and alignment.

Specifically to focus on this use case, I think the elasticBeanstalk.Code.fromAsset you suggested is actually pretty nice. This does seem appropriate since it conveys Elasticbeanstalk's ability to work with zipped files.

Are you still up for maybe pitching in here? Let me know and we can come up with the final API together and then you can hack away :)

Thanks!

@baer
Copy link
Contributor Author

baer commented Jun 30, 2020

No problem. It looks like, as of right now, Elastic Beanstalk does not have a module in CDK beyond the auto-generated Cfn-prefixed resources. I haven't looked at that code but, I'm assuming that to create something like elasticBeanstalk.Code.fromAsset, somebody would have to scaffold out the initial boilerplate of the non-generated elastic beanstalk module. Since this could have a large-ish code volume and isn't on the roadmap, is this even a PR y'all would accept?

@iliapolo
Copy link
Contributor

@baer for sure. It’s probably gonna take a bit longer but if you’re up for it we would gladly accept it.

If you’re willing, we can hash out the MVP for this API by first creating a small README.

@iliapolo
Copy link
Contributor

@baer P.S regarding the scaffolding, you can have a look at our example construct library package to get a sense of what it should look like.

@kgunny
Copy link

kgunny commented Jul 24, 2020

I was able to create a Node EBS App with Version and Env using python CDK 1.54.0. So EBS is supported by the CDK, and i ran into this very issue trying to setup the EBS App Version using the source_bundle property. That CDK part is fine, by when attempting to upload the zip file for the version using aws_s3_deployment, the zip file contents were extracted, where as EBS App Version source_bundle property needs the zip file un-extracted in the bucket.
ebs_version_bucket = aws_s3.Bucket.from_bucket_name(self, 'ebs_version_bucket', ebs_version_bucket_name) ebs_version_bucket_deployment = s3_deployment.BucketDeployment( self, "ebs_version_bucket_deployment", sources=[s3_deployment.Source.asset("my-app-version-1.zip")], destination_bucket=ebs_version_bucket, destination_key_prefix='my-app-versions/my-app-version-1.zip' )
my-app-version-1.zip ends up being a folder on S3.

@kgunny
Copy link

kgunny commented Jul 24, 2020

Just to confirm, placing the zip in a folder, then making the folder the asset works in that the sub level zip file is not extracted.

@iliapolo iliapolo added the p2 label Aug 3, 2020
@iliapolo
Copy link
Contributor

iliapolo commented Aug 3, 2020

@kgunny Thanks for the info 👍

@IngussNeilands
Copy link

Would be nice to have this, very much needed for lambda zip assets

@nbaillie
Copy link
Contributor

Also looking for this; to send code from git export into S3 to use as a CodePipeline Source. I had looked at doing it direct from an asset but the ToolKit Bucket had no versioning enabled and is therefor not supported by CodePipeline. So wanted to move to seeding the source with aws-s3-deploy, the unzipping was not intuitive..

I can confirm that the bellow worked for me.

const sourceBucket = new Bucket(this, 'sourceBucket', {
      versioned: true
    });

new BucketDeployment(this, 'sourceBucketDeploy', {
      sources: [Source.asset({PATH_FOR_FOLDER_WITH_ZIP_INSIDE})],
      destinationBucket: sourceBucket,
      retainOnDelete: false
    });

there may be cases where extract is useful for me though so might be good to have a flag as was discribed above.
extract: false

or more for my case have an AssetSourceAction added to @aws-cdk/aws-codepipeline-actions would have been easy to use, and abstract all this.. however might be that this is an edge case.

@cmoore1776
Copy link
Contributor

This is also needed beyond Elastic Beanstalk. For example, depoying ansible playbooks for use with SSM (see https://aws.amazon.com/blogs/mt/keeping-ansible-effortless-with-aws-systems-manager/).

I'd like to keep a directory structure in source for my playbooks (without zipping) then have CDK zip and deploy to S3 WITHOUT extracting, so they can be referenced in the CreateAssociation call (currently implemented in CDK as aws_cdk.aws_ssm.CfnAssociation

@McDoit
Copy link
Contributor

McDoit commented Jun 1, 2021

I'm using double ziping as a workaround with my usecase of using cdk to create stacksets

I have my pre-zipped assets in a folder which then gets zipped by the bucket deployment, but would be the same by zipping manually and pointing the bucket deployment on that zip

@iliapolo iliapolo removed their assignment Jun 27, 2021
@jtmichelson
Copy link

AWS MWAA plug-ins are also expected to be zipped, definitely looking forward to this improvement on aws-s3-deployment to deploy a directory as a zip. :)

@tyildirim24
Copy link

Same with Lambda function binaries for Golang.

@github4es
Copy link

I have a similar use case for AWS Glue which uses a Python function which itself it's calling (importing modules).
AWS Glue can't see the modules unless I zip them and upload them to an s3 bucket. The way it works now - I have to manually zip up the modules/folder myself.

Example Project Structure

s3://bucket/make_car.py
s3://bucket/modules.zip

As of now to create the modules.zip, I have to manually zip them. I wish the BucketDeployment had an automated way to zip up a directory and have that be deployed to the destination_bucket.

s3_deployment.BucketDeployment(
    self,"my_s3_deployment",
    sources=[s3_deployment.Source.asset("../glue")],
    zip_this_local_dir="../glue/modules/" ... 
)

@github-actions
Copy link

This issue has received a significant amount of attention so we are automatically upgrading its priority. A member of the community will see the re-prioritization and provide an update on the issue.

@wanjacki
Copy link
Contributor

Are we opposed to this solution proposed by iliapolo?

new s3deploy.BucketDeployment(this, 'DeployWebsite', {
  sources: [s3deploy.Source.zip('./website-dist')],
  destinationBucket: websiteBucket,
});

If not I can go ahead and implement it.
@TheRealAmazonKendra Let me know your thoughts as well as we need this feature(as does others looking at this thread) and are willing to prioritize and implement it ourselves.

@TheRealAmazonKendra
Copy link
Contributor

I've asked @iliapolo to take another look at this to confirm.

@iliapolo
Copy link
Contributor

iliapolo commented Sep 5, 2022

Looking back at this issue, it looks like the prevalent use-case described here is:

  1. I have a local directory, and I need to reference it in another service in zip form

And it seems like we haven't mentioned that this type of use-case is already handled in the CDK. So I want to make sure we are aligned. Usually this is done via support from the service in question, for example the lambda.Code class when configuring the source for a lambda function. This is the recommended path.

However, even if the service construct doesn't offer it explicitly, it can still be achieved by using regular assets, no need for the s3-deployment module. For example:

const app = new s3Assets.Asset(this, 'App', {
  path: path.join(__dirname, 'app'),
});

new bean.CfnApplicationVersion(this, 'Version', {
  applicationName: 'app',
  sourceBundle: {
    s3Bucket: app.s3BucketName,
    s3Key: app.s3ObjectKey,
  }
});

With this code, the zipped up app directory remains in the CDK asset bucket for the entirety of the application.
Note that if your use is:

  1. I have a pre-packaged zip, and I need to reference it in another service in zip form

you can do pretty much the same:

const app = new s3Assets.Asset(this, 'App', {
  path: path.join(__dirname, 'app.zip'),
});

new bean.CfnApplicationVersion(this, 'Version', {
  applicationName: 'app',
  sourceBundle: {
    s3Bucket: app.s3BucketName,
    s3Key: app.s3ObjectKey,
  }
});

@wanjacki Does this address your use-case? or do you have some restrictions that force moving the zipped up app directory to a different bucket?

The use case currently unsupported is:

I have a local directory, and I need to place it in a custom bucket in zip form

For this use-case, the workaround is to do the pre-packaging yourself and fallback to "2)". The solution we proposed still has its quirks, and i'm not fully onboard with it. Would be interested to know what exact use-case are you trying to solve.

@wanjacki
Copy link
Contributor

wanjacki commented Sep 5, 2022

@iliapolo Yes we need are trying to implement: #20690.

Essentially we need to share this asset with another account meaning we need to give permissions to another user to read that bucket and sharing the entire CDK asset bucket is not ideal (There is also an encryption on the CDK asset bucket I believe). Customers have also specified they want to define their own bucket in which we will place all the assets that need to be shared.

This use-case falls under the one that is currently unsupported exactly as you say:
I have a local directory, and I need to place it in a custom bucket in zip form.
We do have a restriction that requires us to move the zip file to a different bucket.

The workaround of double zipping should work for us, but it seems hacky, so we thought it made more sense to have actual support for this. It seems like the two ways to support this is to allow a custom location for CDK Asset or having s3-deployment not unzip to destination bucket or even support for CDK Asset to be shared cross account.

Please advise if you think we should continue persuing support for this (I believe it will be useful for customer and future customers will no doubt also run into this issue again for their use case) or go ahead with our feature by implementing it with the workaround. We do have a deadline of 10/05 we are trying to meet for #20690.

If curious or would like even more context, there is a draft PR we have been working on for our feature with lots of insightful comments:
#21508

@iliapolo
Copy link
Contributor

iliapolo commented Sep 7, 2022

@wanjacki I see, thanks for sharing. Your use-case sounds legit.

So as far as the solution goes, the more I think about it the less I like my proposal of Source.zip. The problem is the semantic inconsistency it creates, because the function names are supposed to refer to the input, not the side-effect.

I was initially opposed adding an extract property because I felt it exposes implementation details that won't be clear to users. However, looking at the Source API I know think an extract property does have merit.

For example, in the case of Source.bucket, the user is expected to provide a zip object key. It makes perfect sense to add an extract: false option instructing the deployment to keep the zip object as is in the destination bucket:

new s3Deploy.BucketDeployment(this, 'Deployment', {
  sources: [s3Deploy.Source.bucket(someBucket, 'archive.zip')],
  destinationBucket: bucket,
  extract: false, // don't extract archive.zip
});

I do think its less clear in the case of configuring Source.asset that points to a directory:

new s3Deploy.BucketDeployment(this, 'Deployment', {
  sources: [s3Deploy.Source.asset(path.join(__dirname, 'directory'))],
  destinationBucket: bucket,
  extract: false, // don't extract...what exactly?
});

However, the pragmatic thing to do here is add this property.

Also going back to @baer comment that rightfully points out we already kind of expose that assets are stored in zip format in the bootstrap bucket.

In any case, your PR seems good to me (but please rename unzipFile to extract, just more concise).

A note about locations

When extract is turned off, the resulting object in the destination bucket will have a name in the form of: asset.<asset-hash>.zip, regardless of the original file / directory you provided. This means that users will have a hard time constructing a location to that file, because they don't have access to the underlying asset hash. To solve this, I suggest adding a method to BucketDeployment that gives the location of a source in the destination.

const myApp = s3Deploy.Source.asset(path.join(__dirname, 'app'));
const deployment = new s3Deploy.BucketDeployment(this, 'Deployment', {
  sources: [app],
  destinationBucket: bucket,
  extract: false, // don't extract...what exactly?
});

const myAppLocation = deployment.s3Location(myApp);

The asset hashes themselves are already available in the BucketDeployment construct:

SourceObjectKeys: sources.map(source => source.zipObjectKey),

But you'll need to figure out a way to map those to their original sources.
Another option would be to somehow expose the underlying asset so that users could call asset.assetHash and manually construct the location. And the final option is to ignore this use-case for now, and assume users are only interested in the file getting to the bucket, and they don't need its specific location.

@TheRealAmazonKendra for you consideration.

@wanjacki
Copy link
Contributor

wanjacki commented Sep 7, 2022

@iliapolo
Thanks for getting back to us and for the very detailed and informative response. We will go ahead and reopen that PR for review then with property renamed to extract.
As for the location part, there is very likely a use case for this, but it is not particularly useful for our use case so we can ignore it for now as we already have have access to the hash (generated by Asset) and can construct the file location.

mergify bot pushed a commit that referenced this issue Sep 22, 2022
)

Currently S3 Deployment will unzip all files when deployed to S3. This PR adds an additional optional prop `extract`, that when set to false will allow files to remain zipped when deployed to S3.

Adds the feature described in issue: #8065 

----

### All Submissions:

* [x] Have you followed the guidelines in our [Contributing guide?](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md)

### Adding new Unconventional Dependencies:

* [ ] This PR adds new unconventional dependencies following the process described [here](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md/#adding-new-unconventional-dependencies)

### New Features

* [x] Have you added the new feature to an [integration test](https://github.com/aws/aws-cdk/blob/main/INTEGRATION_TESTS.md)?
	* [x] Did you use `yarn integ` to deploy the infrastructure and generate the snapshot (i.e. `yarn integ` without `--dry-run`)?

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*

----
Co-authored-by: Alex Makoviecki [b2alexpilot@gmail.com](mailto:b2alexpilot@gmail.com)
hacker65536 pushed a commit to hacker65536/aws-cdk that referenced this issue Sep 30, 2022
…#21805)

Currently S3 Deployment will unzip all files when deployed to S3. This PR adds an additional optional prop `extract`, that when set to false will allow files to remain zipped when deployed to S3.

Adds the feature described in issue: aws#8065 

----

### All Submissions:

* [x] Have you followed the guidelines in our [Contributing guide?](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md)

### Adding new Unconventional Dependencies:

* [ ] This PR adds new unconventional dependencies following the process described [here](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md/#adding-new-unconventional-dependencies)

### New Features

* [x] Have you added the new feature to an [integration test](https://github.com/aws/aws-cdk/blob/main/INTEGRATION_TESTS.md)?
	* [x] Did you use `yarn integ` to deploy the infrastructure and generate the snapshot (i.e. `yarn integ` without `--dry-run`)?

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*

----
Co-authored-by: Alex Makoviecki [b2alexpilot@gmail.com](mailto:b2alexpilot@gmail.com)
homakk pushed a commit to homakk/aws-cdk that referenced this issue Dec 1, 2022
…#21805)

Currently S3 Deployment will unzip all files when deployed to S3. This PR adds an additional optional prop `extract`, that when set to false will allow files to remain zipped when deployed to S3.

Adds the feature described in issue: aws#8065 

----

### All Submissions:

* [x] Have you followed the guidelines in our [Contributing guide?](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md)

### Adding new Unconventional Dependencies:

* [ ] This PR adds new unconventional dependencies following the process described [here](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md/#adding-new-unconventional-dependencies)

### New Features

* [x] Have you added the new feature to an [integration test](https://github.com/aws/aws-cdk/blob/main/INTEGRATION_TESTS.md)?
	* [x] Did you use `yarn integ` to deploy the infrastructure and generate the snapshot (i.e. `yarn integ` without `--dry-run`)?

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*

----
Co-authored-by: Alex Makoviecki [b2alexpilot@gmail.com](mailto:b2alexpilot@gmail.com)
@corymhall
Copy link
Contributor

Looks like this was fixed in #21805

@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-s3-deployment effort/medium Medium work item – several days of effort feature-request A feature should be added or improved. p1
Projects
None yet
Development

No branches or pull requests