Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(cdk-assets): osx cannot run identical docker login commands in parallel #20116

Closed
misterjoshua opened this issue Apr 28, 2022 · 1 comment · Fixed by #20117
Closed

(cdk-assets): osx cannot run identical docker login commands in parallel #20116

misterjoshua opened this issue Apr 28, 2022 · 1 comment · Fixed by #20117
Assignees
Labels
@aws-cdk/assets Related to the @aws-cdk/assets package bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. p1

Comments

@misterjoshua
Copy link
Contributor

misterjoshua commented Apr 28, 2022

Describe the bug

Per @brianfrantz in #19367 (comment):

It appears that this change has exposed a race condition when publishing multiple ECS images from a Mac. It is causing our deployments to fail with:

[83%] fail: docker login --username AWS --password-stdin https://285872988623.dkr.ecr.us-east-1.amazonaws.com exited with error code 1: Error saving credentials: error storing credentials - err: exit status 1, out: The specified item already exists in the keychain.

It appears that this issue is caused by a combination of the CDK performing a “docker login” on each published asset, along with specific logic in the docker project for storing credentials in the OSX keychain.

The source code for docker-credentials-osxkeychain doesn’t update existing OSX keychain entries if present. Instead it always does a SecKeychainAddInternetPassword call, which fails if the entry is already in the keychain. That error isn’t caught and handled. Instead the code attempts to avoid this issue by deleting the entry in the keychain before each add attempt. This logic works fine for the normal use case of doing a single login and then making multiple authenticated docker calls (single-threaded or in parallel). In fact, the logic has been in place for five-plus years.

Unfortunately, the CDK calls “docker login” before each “docker push” call instead of once for the batch. Two publish calls started at the same time will (most of the time) result in a credentials sequence of “delete, delete, add, add” such that the second add will fail because the first add created the entry.

The best method for fixing this issue in the CDK would involve refactoring the publishing code to perform a single “docker login” for the batch of container assets. Unfortunately this would be a significant redesign as the logic is shared between lambda assets and container assets.

Another option might be to catch the failed login and try again after a small delay, with or without detecting the specific error message. This doesn’t feel super clean, but is less effort and is less likely to cause side effects. That said, while it should work fine for small numbers of container assets, it may become problematic when there are many publish calls all logging in and retrying all at once.

Expected Behavior

docker login should be run once per ecr repository any docker image asset is being pushed to.

Current Behavior

It runs docker login for every push, concurrently, and this causes the login command to fail.

Reproduction Steps

app.ts

import { App, Stack } from 'aws-cdk-lib';
import { ContainerImage, FargateTaskDefinition } from 'aws-cdk-lib/aws-ecs';

const devEnv = {
  account: process.env.CDK_DEFAULT_ACCOUNT,
  region: process.env.CDK_DEFAULT_REGION,
};

const app = new App();

const stack = new Stack(app, 'cdk-concurrency-bug', { env: devEnv });

for (let i = 0; i < 30; i++) {
  const task = new FargateTaskDefinition(stack, `Task${i}`, {});
  task.addContainer('asset', {
    image: ContainerImage.fromAsset(__dirname, {
      buildArgs: {
        VAR: `var-${i}`,
      },
    }),
  });
}

app.synth();

Dockerfile

FROM debian:11-slim
ARG VAR=foo

WORKDIR /app
RUN echo $VAR >var.txt

Possible Solution

No response

Additional Information/Context

No response

CDK CLI Version

2.21.1

Framework Version

2.21.1

Node.js Version

14.19.1

OS

OSX

Language

Typescript

Language Version

No response

Other information

No response

@misterjoshua misterjoshua added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Apr 28, 2022
@github-actions github-actions bot added the @aws-cdk/assets Related to the @aws-cdk/assets package label Apr 28, 2022
@ryparker ryparker added the p1 label Apr 28, 2022
@mergify mergify bot closed this as completed in #20117 May 18, 2022
mergify bot pushed a commit that referenced this issue May 18, 2022
Changes container image publishing so that publishing doesn't re-run docker logins for the same repository and so logins are run one at a time.

Fixes #20116

----

### All Submissions:

* [x] Have you followed the guidelines in our [Contributing guide?](https://github.com/aws/aws-cdk/blob/master/CONTRIBUTING.md)

### Adding new Unconventional Dependencies:

* [ ] This PR adds new unconventional dependencies following the process described [here](https://github.com/aws/aws-cdk/blob/master/CONTRIBUTING.md/#adding-new-unconventional-dependencies)

### New Features

* [ ] Have you added the new feature to an [integration test](https://github.com/aws/aws-cdk/blob/master/INTEGRATION_TESTS.md)?
	* [ ] Did you use `yarn integ` to deploy the infrastructure and generate the snapshot (i.e. `yarn integ` without `--dry-run`)?

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

wphilipw pushed a commit to wphilipw/aws-cdk that referenced this issue May 23, 2022
Changes container image publishing so that publishing doesn't re-run docker logins for the same repository and so logins are run one at a time.

Fixes aws#20116

----

### All Submissions:

* [x] Have you followed the guidelines in our [Contributing guide?](https://github.com/aws/aws-cdk/blob/master/CONTRIBUTING.md)

### Adding new Unconventional Dependencies:

* [ ] This PR adds new unconventional dependencies following the process described [here](https://github.com/aws/aws-cdk/blob/master/CONTRIBUTING.md/#adding-new-unconventional-dependencies)

### New Features

* [ ] Have you added the new feature to an [integration test](https://github.com/aws/aws-cdk/blob/master/INTEGRATION_TESTS.md)?
	* [ ] Did you use `yarn integ` to deploy the infrastructure and generate the snapshot (i.e. `yarn integ` without `--dry-run`)?

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/assets Related to the @aws-cdk/assets package bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. p1
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants