Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CodePipeline: still getting 'Maximum policy size of 10240 bytes exceeded for role' for cross-account pipeline #19835

Closed
cprice404 opened this issue Apr 8, 2022 · 9 comments · Fixed by #20400
Assignees
Labels
@aws-cdk/aws-iam Related to AWS Identity and Access Management bug This issue is a bug. p1

Comments

@cprice404
Copy link

cprice404 commented Apr 8, 2022

Describe the bug

We recently filed an issue about how CodePipeline was failing due to the generation of extremely large IAM policy documents:

#19243

Our issue was closed and it was suggested that this PR might fix it:

#19114

This tracking issue was closed via that PR:

#19276

We have since upgraded our CDK version to 2.19.0 and set @aws-cdk/aws-iam:minimizePolicies to true.

This did seem to decrease the size of some IAM policies, but we are still hitting the same failure.

Expected Behavior

Expected #19114 to reduce generated IAM policy sizes so that our pipeline would deploy.

Current Behavior

Pipeline still fails to deploy with Maximum policy size of 10240 bytes exceeded for role xxx.

See attachment for example of the offending IAM policy. The policy contains three statements, and the third one is the offending one. It is granting AssumeRole to all of the roles for all of the actions in all of the accounts that this pipeline is trying to deploy to.

See attachment for example (sanitized) policy.
pipelineofpipelinesstackcoreinfrastructurepipeline03AEF9CA.template.JUST_OFFENDING_POLICY.sanitized.json.gz

There are 97 resources in that statement, they look like this:

                {
                  "Fn::GetAtt": [
                    "pipelinecoreinfrastructureDeployProdcelluseast1produseast1synthCodePipelineActionRole8A60FDE1",
                    "Arn"
                  ]
                },

and this:

                {
                  "Fn::Join": [
                    "",
                    [
                      "arn:",
                      {
                        "Ref": "AWS::Partition"
                      },
                      ":iam::111111111111:role/core-infrastructure-pipelhangesactionroleb355ae8d7595154ac81b"
                    ]
                  ]
                },

Reproduction Steps

Create a CodePipeline that creates additional Pipelines. In one of those additional pipelines, create a large number of actions that target multiple AWS accounts.

It would take some non-trivial effort for me to extract a concise reproducer from our current (private) code, but pending the responses to the questions above about whether this is still expected behavior, how to override the Role, etc., I am willing to try to put one together!

Possible Solution

It's not clear to me whether this behavior is still expected after the fix in #19114 was released.

We are looking for guidance on:

  • Whether this is still a known issue in CDK
  • Whether it is expected that additional fixes will be merged to address this

Also, there seem to be a ton of other tickets related to this, and in several of them it is hinted that users with advanced use cases like ours may need to opt out of this automatic policy generation, and override the Role with one that we manage ourselves (which might use wildcards or multiple policy attachments, etc.). If that is the current prescription, it would be wonderful if there were some official docs or examples that illustrated how to do this; I haven't really seen anything so far that gives me a concrete idea of how to even attempt it.

Additional Information/Context

No response

CDK CLI Version

2.19.0

Framework Version

2.19.0

Node.js Version

v16.5.0

OS

Amazon Linux 2

Language

Typescript

Language Version

TypeScript 4.4.2

Other information

No response

@cprice404 cprice404 added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Apr 8, 2022
@github-actions github-actions bot added the @aws-cdk/aws-codepipeline Related to AWS CodePipeline label Apr 8, 2022
@skinny85
Copy link
Contributor

skinny85 commented Apr 8, 2022

@rix0rrr you want to take a look at this one?

@skinny85 skinny85 assigned rix0rrr and unassigned skinny85 Apr 12, 2022
@skinny85 skinny85 added @aws-cdk/aws-iam Related to AWS Identity and Access Management and removed @aws-cdk/aws-codepipeline Related to AWS CodePipeline labels Apr 12, 2022
@cprice404
Copy link
Author

cprice404 commented Apr 13, 2022

To work around this, we had to do the following:

  1. provide our own Role and Policy
  2. Use withoutPolicyUpdates to prevent CDK from trying to generate the policy for it
  3. Add a DependsOn from the Pipeline to the Policy to prevent CFN from trying to deploy them simultaneously
  4. Manage our own cross-region replication buckets, due to the bug described in CodePipeline: when providing custom Pipeline Role, cross-region support stacks cannot be deployed #19881 .

Just in case it helps anyone else, here is roughly what we ended up with:

  const pipelineRolePolicyName = `pipelinepolicy-${pipelineName}`;
  const pipelineRolePolicy = new iam.Policy(scope, pipelineRolePolicyName, {
    policyName: pipelineRolePolicyName,
    // These policy statements were built up by hand after examining the
    // statements in the CDK-generated policy for several pipelines.
    statements: [
      // Grants the pipeline access to the s3 artifact buckets.
      new iam.PolicyStatement({
        effect: iam.Effect.ALLOW,
        actions: [
          's3:Abort*',
          's3:DeleteObject*',
          's3:GetBucket*',
          's3:GetObject*',
          's3:List*',
          's3:PutObject',
          's3:PutObjectLegalHold',
          's3:PutObjectRetention',
          's3:PutObjectTagging',
          's3:PutObjectVersionTagging',
        ],
        // Note that these bucket names correspond to our self-managed replication buckets, we are
        //  opting-out of CDK managing those for us.
        resources: [`arn:aws:s3:::pipe-artifacts-${pipelineAccount}-*`],
      }),
      
      // Grants the pipeline AssumeRole permissions into the various CDK-generated roles, for
      // cross-account / cross-region deploys etc.
      new iam.PolicyStatement({
        effect: iam.Effect.ALLOW,
        actions: ['sts:AssumeRole'],
        resources: targetAccounts.map(targetAccount => {
          // Here we are granting permissions to assume some roles that are generated by CDK
          // for use in pipelines.  We need to wild-card them so that the policy doesn't get
          // too large for IAM.  The roles will begin with a string like `cacheadmin-pipeline`,
          // unless the name of the pipeline is too long; CDK will truncate the prefix portion
          // after 25 chars, and the suffix varies per role, so we just use the first 25 chars.
          // NOTE: your pipeline names must all be lower case!
          const pipelineRoleNamePrefix = `${pipelineName}-pipeline`
            .substr(0, 25);
          return `arn:aws:iam::${targetAccount}:role/${pipelineRoleNamePrefix}*`;
        }),
      }),
    ],
  });

  const pipeline = new codepipeline.Pipeline(
    this,
    `pipeline-${props.pipelineName}`,
    {
      pipelineName: props.pipelineName,
      crossAccountKeys: true,
      reuseCrossRegionSupportStacks: true,

      // passing the role via `withoutPolicyUpdates` prevents CDK from trying to manually manage the policy.  See:
      //
      // https://github.com/aws/aws-cdk/issues/19835
      // https://github.com/aws/aws-cdk/issues/16244#issuecomment-906738231
      // https://docs.aws.amazon.com/cdk/api/v1/docs/aws-iam-readme.html#opting-out-of-automatic-permissions-management
      role: pipelineRole.withoutPolicyUpdates(),

      crossRegionReplicationBuckets: getCrossRegionReplicationBuckets(
        this,
        supportStackInfoForAllRegions
      ),
    }
  );

  // We need to enforce a `DependsOn` from the Pipeline to the Policy,
  // otherwise CFN will try to create them at the same time, and the Pipeline
  // will fail due to missing permissions.  However, if we just call the
  // main CDK `addDependency` function, it will add our policy as a dependency
  // to every resource that the pipeline creates, which will cause circular
  // dependency failures.  We need to use the "escape hatch" to access the CfnResources
  // directly and then add the dependency that way, so that it only gets added to the
  // Pipeline and no other resources.
  const pipelineCfnResource = pipeline.node.defaultChild as CfnResource;
  pipelineCfnResource.addDependsOn(
    pipelinePolicy.node.defaultChild as CfnResource
  );

@rix0rrr
Copy link
Contributor

rix0rrr commented Apr 13, 2022

Can you give me a sense of how many stacks we’re talking about?

@cprice404
Copy link
Author

Our biggest pipeline right now has 4 stacks, all of which are deployed to 4 different accounts (so, 16 stack deploys). We have about 5 actions per deploy (includes generating ChangeSet, and manual approval). So that's about 80 actions. In the generated policy, the AssumeRole policy statement contains one resource for each action. When we tipped the scales and started hitting the IAM policy size limit there were 97 resources in that policy statement.

@peterwoodworth peterwoodworth added p1 and removed needs-triage This issue or PR still needs to be triaged. labels Apr 19, 2022
@rix0rrr
Copy link
Contributor

rix0rrr commented Apr 20, 2022

Best solution I can come up with here is to start reusing the same action role for multiple actions.

@cprice404
Copy link
Author

Wouldn't it also be possible to split the policy into multiple separate policies, and attach multiple policies to the role?

@rix0rrr
Copy link
Contributor

rix0rrr commented Apr 20, 2022

The limit is about the final policy document on the Role. Those fragments will all add up to the same total.

@cprice404
Copy link
Author

Ah. Then perhaps providing an opt-in flag that would switch the AssumeRole policies to use a wildcard that will match all of the Action roles?

rix0rrr added a commit that referenced this issue May 3, 2022
Two changes:

- Collapse CodeBuild action Roles: each CodeBuild step used to create a
  fresh Role to run the CodeBuild action. Change to use one Role for all
  CodeBuild actions. This saves a lot of resources and policy space when
  using a lot of CodeBuild steps, and doesn't appreciably change the
  security posture of the Pipeline (note: this is *not* about the
  Execution Role of the CodeBuild projects, this is about the Role
  assumed by the Pipeline to initiate execution of the Project).
- If inline policies grow bigger than 10k, split additional statements
  off into ManagedPolicies.

Since we want to do the splitting post-merging (to get the most bang for
our buck), we now need to do statement merging during the `prepare`
phase (that is, pre-rendering, instead of post-rendering). That means it
had to be modified to work on `PolicyStatement` objects, instead of on
raw IAM JSON documents.

Closes #19276, closes #19939, closes #19835.
rix0rrr added a commit that referenced this issue May 3, 2022
Two changes:

- Collapse CodeBuild action Roles: each CodeBuild step used to create a
  fresh Role to run the CodeBuild action. Change to use one Role for all
  CodeBuild actions. This saves a lot of resources and policy space when
  using a lot of CodeBuild steps, and doesn't appreciably change the
  security posture of the Pipeline (note: this is *not* about the
  Execution Role of the CodeBuild projects, this is about the Role
  assumed by the Pipeline to initiate execution of the Project).
- If inline policies grow bigger than 10k, split additional statements
  off into ManagedPolicies.

Since we want to do the splitting post-merging (to get the most bang for
our buck), we now need to do statement merging during the `prepare`
phase (that is, pre-rendering, instead of post-rendering). That means it
had to be modified to work on `PolicyStatement` objects, instead of on
raw IAM JSON documents.

Closes #19276, closes #19939, closes #19835.
rix0rrr added a commit that referenced this issue May 18, 2022
(This change has been split off from #20189 because that PR was growing
too big)

Collapse CodeBuild action Roles: each CodeBuild step used to create a
fresh Role to run the CodeBuild action. Change to use one Role for all
CodeBuild actions. This saves a lot of resources and policy space when
using many CodeBuild steps, and doesn't appreciably change the
security posture of the Pipeline (note: this is not about the
Execution Role of the CodeBuild projects, this is about the Role
assumed by the Pipeline to initiate execution of the Project).

Relates to #19276, #19939, #19835.
rix0rrr added a commit that referenced this issue May 18, 2022
We add all Role policy statements to the Inline policy, which
has a maximums size of 10k. Especially when creating CDK Pipelines
that deploy to a lot of environments, the list of Role ARNs
the Pipeline should be allowed to assume exceeds this size.

Roles also have the ability to have Managed Policies attached
(10 by default, 20 with a quota increase), each of them can be 6k
in size. By spilling over from inline policies into Managed Policies
we can get a total of 70k of statements attached to reach Role.

This PR introduces `IComparablePrincipal` to be able to value-compare
two principals: since we want to merge first before we split (to get the
most bang for our buck), we now need to do statement merging during the
prepare phase, while we are still working on the object graph (instead
of the rendered CloudFormation template).
* That means statement merging had to be modified to work on
  PolicyStatement objects, which requires being able to compare
  Principal objects.

Closes #19276, closes #19939, closes #19835.
mergify bot pushed a commit that referenced this issue May 24, 2022
(This change has been split off from #20189 because that PR was growing
too big)

Collapse CodeBuild action Roles: each CodeBuild step used to create a
fresh Role to run the CodeBuild action. Change to use one Role for all
CodeBuild actions. This saves a lot of resources and policy space when
using many CodeBuild steps, and doesn't appreciably change the
security posture of the Pipeline (note: this is not about the
Execution Role of the CodeBuild projects, this is about the Role
assumed by the Pipeline to initiate execution of the Project).

Relates to #19276, #19939, #19835.


----

### All Submissions:

* [ ] Have you followed the guidelines in our [Contributing guide?](https://github.com/aws/aws-cdk/blob/master/CONTRIBUTING.md)

### Adding new Unconventional Dependencies:

* [ ] This PR adds new unconventional dependencies following the process described [here](https://github.com/aws/aws-cdk/blob/master/CONTRIBUTING.md/#adding-new-unconventional-dependencies)

### New Features

* [ ] Have you added the new feature to an [integration test](https://github.com/aws/aws-cdk/blob/master/INTEGRATION_TESTS.md)?
	* [ ] Did you use `yarn integ` to deploy the infrastructure and generate the snapshot (i.e. `yarn integ` without `--dry-run`)?

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
@mergify mergify bot closed this as completed in #20400 May 25, 2022
@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-iam Related to AWS Identity and Access Management bug This issue is a bug. p1
Projects
None yet
4 participants