Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[suggestion] split the composer package into separate components? #808

Closed
Alorel opened this issue Oct 30, 2015 · 47 comments
Closed

[suggestion] split the composer package into separate components? #808

Alorel opened this issue Oct 30, 2015 · 47 comments
Labels
feature-request A feature should be added or improved. needs-major-version Can only be considered for the next major release no-autoclose This issue should not be auto-closed by stale-issue-cleanup action.

Comments

@Alorel
Copy link

Alorel commented Oct 30, 2015

Hi!

Our company only uses the S3 and SQS libraries, however, in order to have these installed via Composer, the entire AWS package needs to be downloaded, inflating Composer's autoload files. Would it perhaps be possible to split the AWS SDK into separately installable components (e.g. composer require aws/s3-php-dk) which would then have some requirements of their own (e.g. the s3 package would require aws/common-php)?

@jeskew
Copy link
Contributor

jeskew commented Oct 30, 2015

Hi @Alorel,

This is something we've considered but that would be difficult to implement cleanly in a backwards compatible way. One thing we have yet to figure out is if this approach would be any faster: instead of just resolving a single dependency, aws/aws-sdk-php, composer would need to resolve a dependency graph. In your case, this wouldn't just be s3-sdk-php and sqs-sdk-php but also supporting packages that handle fetching credentials, marshalling requests, signing requests, etc. My understanding is that composer's speed tends to be tied more closely to the complexity of a project's dependency graph than to the number of bytes pulled down, so we would need to gauge the size and nature of how such a change would affect installing the SDK.

Are you seeing any particular issues that you believe a smaller dependency would solve?

@jeskew jeskew added the feature-request A feature should be added or improved. label Oct 30, 2015
@Alorel
Copy link
Author

Alorel commented Oct 30, 2015

@jeskew It's mostly about personal preference on not having to download packages you don't need.

In my specific use case (which is not, by far, applicable to a lot of users) having fewer components to download would be a significant speedup as my company requires its employees to live-sync all changes to a virtual host on the development server instead of using localhost. Because Composer's update process involves fully removing a package and then simply replacing it with a new one, the IDE ends up taking a lot of time deleting the entire SDK from the remote server and then re-uploading the new version even though, in many cases, only a few files have changes in them.

@jeskew
Copy link
Contributor

jeskew commented Nov 4, 2015

@Alorel It sounds like you might benefit from using rsync -c or something else that checksums files before replacing them instead of SCPing or SFTPing the whole kit and kaboodle.

I'm going to leave this issue open in case anyone else wants to chime in on why they would or would not want a modularized SDK.

@texdc
Copy link

texdc commented Dec 8, 2015

Pros:

  • Use only what you need (the AWS mantra)
  • If one AWS service changes, the update is localized

Cons:

  • It's a pain to split everything up and maintain version history (but, it's possible)
  • Increased management complexity

Generally speaking, I'd favor keeping packages specific to their function/purpose instead of lumping everything together. Obviously, there are drawbacks. I'm probably overlooking other issues/points.

@jeremeamia
Copy link
Contributor

More Pros:

  • People's Satis repos will be smaller.
  • Can release an "S3-Only" package, which people occasionally look for.
  • Service clients with special dependencies (e.g., openssl, yaml, etc.) can have their own dependency chains.

More Cons:

  • No compelling argument to support the time required to make the change. Are there really any compelling and tangible benefits?
  • Currently service descriptions are all in one folder and the tools that help build the SDK rely on this fact.
  • Messaging – Making sure people know when the services they are using are updated may become more difficult. For example, are the release notes split up as well?
  • People who don't use Composer (there are still a lot) won't care about this at all and might be confused by it.
  • Cross-advertising services via the tooling will no longer happen naturally.

@jeskew
Copy link
Contributor

jeskew commented Dec 9, 2015

For people who don't use Composer, this change would be invisible. We would still distribute the SDK as a phar.

@cdaguerre
Copy link

cdaguerre commented Jun 4, 2016

Having a single package bloats autoloading and actually has runtime performance impacts, which in my opinion are equally important to dev time comfort.
For instance, your composer classmap gets an extra 358 classes only to use S3.
Maybe at least a S3 subtree split would make some people happy!
It would however require a sort of common package for shared classes, of course.
I'd definitely +1 the possibility to require specific components only.

@rlweb
Copy link

rlweb commented Jul 6, 2016

+1

@ericofusco
Copy link

I'd love to see this split on composer too. I'm not sure it's worthy like everybody said, but I believe most of the applications use just few services.

For my case I use S3 and DynamoDB only.

@jjok
Copy link

jjok commented Nov 30, 2016

+1

1 similar comment
@timbotron
Copy link

+1

@AviBueno
Copy link

+1
Having to pull the entire SDK just to be able to send emails via SES is an overkill.
On a different project I use SQS and SES and it's still an overkill to pull, deploy and autoload heaps of unused code.

@mattkamsler
Copy link

+1
I actually came here to see if this had been suggested yet. I am working on a project that just requires S3. I think it would be worth looking into.

@chippyash
Copy link

+1
The AWS composer package is so big that it blows composer memory limits on local and build servers and we have to employ the COMPOSER_MEMORY_LIMIT=-1 trick just because of it. Composer run time is significantly increased because of the size of the lib.

We've just had a situation where composer gave up trying to install aws-sdk because it timed out and left us with a broken package that broke the site completely. Fortunately we had a copy of it elsewhere and was able to upload it manually.

Like others, we really only use S3 so making a separate package for it makes a huge amount of sense to me.

If you can publish the dependency graph for S3 usage, I'd even be prepared to write a script that repackages just the stuff required.

@chippyash
Copy link

I've got a package that just takes in the S3 stuff now with a utility to run to check for updates to the main package. Be useful if one of the aws guys could review it?

@chumanfu
Copy link

+1

@Pierstoval
Copy link

I also agree about splitting the SDK into components.

This would still be possible with a simple git subtree, exactly like what @symfony does. Then you can keep the same workflow for upgrades, versions, etc., and hook in the git/github process to apply the modifications to the subsplits.

Maybe you could watch this conference: https://www.youtube.com/watch?v=ZVsDA6GhKOU to get some inspirations 🙂

@mindplay-dk
Copy link

+1

Hundreds of largely unrelated components and services rolled into one package.

If there's a breaking change to any one of these, every user of any of these hundreds of dependencies will be affected.

I mean, for crying out. This is not how Composer works. This is just not how it's done.

I'd have thought a huge company like Amazon would know better 🙄

This would still be possible with a simple git subtree, exactly like what @symfony does

I'd like to not that this is a really bad idea as well - it only solves the somewhat esoteric problem of deploying too many files (which is mostly not really a problem, assuming you have byte-code caching enabled, and you let Composer generate an optimized auto-loader, which you should be doing in production anyhow) and doesn't address the versioning issue. (all of Symfony's sub-packages get bumped whenever the framework master package gets bumped, regardless of whether there's any breaking changes in the sub-trees, the actual packages - these should have been individually versioned to ensure a meaningful upgrade process for developers; it's just laziness, really.)

I can't really believe I'm out here explaining this to Amazon 😐

@Pierstoval
Copy link

Pierstoval commented Jan 8, 2019

(all of Symfony's sub-packages get bumped whenever the framework master package gets bumped, regardless of whether there's any breaking changes in the sub-trees, the actual packages - these should have been individually versioned to ensure a meaningful upgrade process for developers; it's just laziness, really.)

It's not laziness. It's just that the framework (symfony/symfony) is actually not "one full package", it instead is dozens and dozens of packages that should each follow the same release processes and rules. Therefore follow the same versioning policy.

There are 3 active branches for Symfony, so this means that each month, at least 150 git tags are created, bugs on lower branches should be ported to more recent branches if the bug exists, so in the end, the majority of packages will benefit from at least one new commit on each release.
And if one release does not bring anything to a package (this was the case for symfony/asset:v4.2.0...v4.2.1 for example, check here), then there's no harm in adding a tag. This helps the community and the developers remember that x.y.z should be the same for all the packages they use. This is also highly efficient with symfony/flex that can use the extra.symfony.require Composer parameter to restrict to a specific Symfony version that will apply to EVERY package that is in the main symfony/symfony repository.

I think having such system for AWS could be almost painles, since AWS don't really need full BC and don't seem to need to support "older" branches, therefore there will only be one tag per package, and global maintenance can still be in the main repo (and it must, actually).

Fabien Potencier created splitsh to ease the setup of such system, but git subtree could be nice too, it's just a bit more complex, but completely feasible.

It saddens me that we have to download 11MB of package when the SDK releases a new version, while we could download like 200KB instead for just the S3 client (it's more than 50 times lighter!).

@barryvdh
Copy link

I would usually that size doesn't matter that much, but for things like AWS Lambda size DOES matter. The SDK keeps getting bigger with new services, while often only a few are required. A split could still let this package exist like this, but also create stand alone versions. This main repo would just 'replace' all split versions so doesn't break /conflict existing apps.

It does require either a core/common package for things like http/auth, bit those depencies can be configured once.

@Pierstoval
Copy link

We are in an era where bandwith is really becoming important.

Even @symfony made a small change to make their downloaded packages smaller (symfony/symfony#33579), I think AWS should also embrace some new practices.

@Simperfit
Copy link

Simperfit commented Dec 9, 2019

Is this going to be something we can expect in the futur ?

@mnapoli
Copy link

mnapoli commented Dec 17, 2019

Running PHP on lambda makes this problem much more critical: a larger vendor directory implies longer cold starts, longer deployments and more chance to hit the Lambda max disk size.

Is there any way we can help?

@Simperfit
Copy link

I would help to if needed

@chippyash
Copy link

One of the most annoying things that AWS does is include the .changes directory in the package. Totally unnecessary. It's documentation and can live elsewhere.

I adopted the practice of building rpm packages for my php applications a long time ago to ease rollbacks and ensure non-php dependencies are met. I simply exclude the .changes directory from the package build (using fpm, if you haven't come across it.) This is simple to do, but without some way of creating a dependency graph of the required files it would become an irksome task to apply across the whole aws package. We've even considered installing that 'centrally' on our servers and symlinking to it for the production builds, which I used to do back in the days of hosting multiple clients running the same core code.

Won't help @mnapoli with Lambda, but where you are running multiple services on a single server, it might be a way out until AWS get their act together.

@mnapoli
Copy link

mnapoli commented Dec 20, 2019

@chippyash that's interesting!

What is the .changes for? We can easily exclude it from Composer releases if it's not needed when using the package.

@chippyash
Copy link

@mnapoli .changes is just a a list of package changes, one per release by the looks of it. And it's big. Totally unnecessary. Ditto for vendor/aws/aws-sdk-php/.github directory. Not even sure why that is in the repo at all.

@mnapoli
Copy link

mnapoli commented Jan 13, 2020

If we can get a maintainer to confirm that these directories are not useful at runtime, I can send a pull request excluding them.

@Alorel maybe?

@barryvdh
Copy link

There is a Pull Request open already: #1949

@mnapoli
Copy link

mnapoli commented Jan 13, 2020

Awesome thanks!

@stof
Copy link

stof commented Mar 4, 2020

Installing the SDK currently takes 33MB on disk (as reported by du -h -d 1 vendor/aws/aws-sdk-php |sort -h) and keeps increasing (I just upgraded from 3.133.6 to 3.133.28 and it was only 32MB before). Among this, 22MB come from the src/data folder containing API descriptions for each service. If I use only S3, I only need 220KB from this folder

Such huge package size is a drawback in any place where the size of the source code matters (AWS lambda is such a place, but any place relying on containers also prefers getting them smaller).

So such split should be considered (the impact on BC is for classes which don't belong to a service, but this could potentially even be kept in their current namespace even if they get moved to a separate folder in the mono-repo with some PSR-4 config)

Regarding releases, Symfony chose to release all packages in sync. But it could also be possible to release them separately instead, by releasing only modified packages.

@chippyash
Copy link

chippyash commented Mar 5, 2020

@stof Do you know if that data directory is required to actually run the library in production, or just there as another documentation and/or test aide for AWS? If not required, I can add it to my ignore list when building packages.

[edit]
I took a quick look: https://github.com/aws/aws-sdk-php/blob/master/build/build-manifest.php Looks like it is used to build API documentation. If anyone can confirm this then perhaps it shoudl also be added to the export-ignore list. On the basis that the build directory is in the ignore list, then it makes a good candidate.

@stof
Copy link

stof commented Mar 5, 2020

@chippyash it is also used by the Client class to know which endpoints are available in each service and what are the expected arguments for each of them.

@chippyash
Copy link

@stof

it is also used by the Client class to know which endpoints are available in each service and what are the expected arguments for each of them.

That's a shame.

@howardlopez howardlopez added the no-autoclose This issue should not be auto-closed by stale-issue-cleanup action. label Aug 27, 2020
@PhilETaylor
Copy link

Installing the SDK currently takes 33MB on disk (as reported by du -h -d 1 vendor/aws/aws-sdk-php |sort -h) and keeps increasing (I just upgraded from 3.133.6 to 3.133.28 and it was only 32MB before). Among this, 22MB come from the src/data folder containing API descriptions for each service. If I use only S3, I only need 220KB from this folder

Hmmm.. v3.171.6 today....

du -h -d 1 vendor/aws/aws-sdk-php |sort -h

 36K	vendor/aws/aws-sdk-php/.github
392K	vendor/aws/aws-sdk-php/features
780K	vendor/aws/aws-sdk-php/build
2.7M	vendor/aws/aws-sdk-php/tests
3.3M	vendor/aws/aws-sdk-php/.changes
106M	vendor/aws/aws-sdk-php/src
114M	vendor/aws/aws-sdk-php

114Mb - gulp.

@Nyholm
Copy link
Contributor

Nyholm commented Dec 26, 2020

I know that this is a super old issue. I also know that there is a lot of reasons why this has not been fixed yet.

If you are one of the people really need a small AWS client, there is an AsyncAws organisation that (among other things) address this issue. See https://github.com/async-aws/aws

Normally I would never promote a competing library like this, especially one that I've created. However, any AWS API client is helping the user to buy more AWS products. So in that sense, AsyncAws is not a competing library, just a complement.


Even @symfony made a small change to make their downloaded packages smaller

Thank you for appreciating my PR. =)


@PhilETaylor, Make sure you download the non-source version. Ie:

rm -rf vendor/aws
composer update --prefer-dist

@barryvdh
Copy link

Is it not possible to generate the required data (endpoints) in a compact PHP format, with a build script for each release, and exclude the data folders?

@SamRemis
Copy link
Member

Hi everyone,
This is on the roadmap for the next version of the SDK. There is quite a bit that goes into automatically generating the smaller packages and their upkeep. We are looking into scalable and reliable ways to do that, but the current infrastructure of the SDK makes this difficult.
-Sam

@SamRemis SamRemis added the needs-major-version Can only be considered for the next major release label Dec 29, 2020
@barasimumatik
Copy link

I'm using Docker, and want to minimize the container size as much as possible, so I figured a temporary solution would be to delete unnecessary files after composer has installed them. Is there a safe way to determine which files can be deleted if you know which services you need?

@SamRemis
Copy link
Member

SamRemis commented Mar 5, 2021

@barasimumatik that's a tough question, but I'll do my best to answer. It would be nearly impossible for me to test this without a more specific use case, so this may not apply to your personal container.

  • The easiest thing to delete would be the test folder. Since they're test files, I'd say it's safe to delete the entire test folder assuming that you never run them.
  • You could probably delete any of the clients that you are not directly using- some clients with interconnected services do call each other, but unless you're using a client or passing it in as a variable to a function, deleting it that should be safe. For example, if you aren't using ApplicationInsights, you could delete the entire src/ApplicationInsightsClient folder, as well as the src/data/applicationinsights folder. This won't work for every client; some are referenced by major files (like the S3Client), but the majority should be safe to delete. A good rule of thumb is that if a client's parent folder has more than just the client itself and an 'Exceptions' folder, don't delete it.
  • If you're really trying to minimize space, you can likely delete any of the .json files from the src/data folder as well. We use the generated .json.php files instead when parsing the files in that folder.
  • You could delete many of the folders in the base of the SDK, including 'build', 'features', '.github', and '.changes'.

I have only done some minimal testing on this, so delete them at your own risk. I hope this helps :)

@barasimumatik
Copy link

@SamRemis wow, thank you very much! The SDK dwarfs any other composer dependency I have, and accounts for about 15% of the whole container image right now, so it will definitely help 😄

@stof
Copy link

stof commented Mar 8, 2021

note that if you install from dist, the .gitattributes file already takes care of removing all useless files (tests, build/, data/*/*.json, etc...).
Soon your side, you are mostly left with removing the clients you don't use.

@mbardelmeijer
Copy link

@SamRemis splitting out the AWS PHP package has been on the backlog for a bit now, is there many any update regarding this split out? We wish to only use the S3Client for instance, but that does require the full install. Thanks!

@SamRemis
Copy link
Member

That's not something that's currently being worked on for this version of the SDK unfortunately; the way it's designed would make that very difficult.
I can't make any promises, but when we migrate to a new major version, this is the biggest change I would personally like to see

@stobrien89
Copy link
Member

stobrien89 commented Apr 8, 2022

Like @SamRemis mentioned above, this isn't something we're in a position to do currently, but it's on the horizon via the next major version. We recently had a similar feature request which we've converted into a discussion regarding reducing the V3 package size. We're trying to gauge interest in a feature that cleans up unused services and/or unnecessary directories, but we're open to other suggestions.

@stobrien89
Copy link
Member

Closing this since modularization is planned for the next major version. We'd still like to get feedback or suggestions related to reducing the v3 package size (see my comment above)— the hyperlink will take you to a discussion we have open on the topic.

@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request A feature should be added or improved. needs-major-version Can only be considered for the next major release no-autoclose This issue should not be auto-closed by stale-issue-cleanup action.
Projects
None yet
Development

No branches or pull requests