Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(aws): handle ECR repositories in different regions #6217

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

knrc
Copy link

@knrc knrc commented Feb 28, 2024

Description

This PR modified the ECR integration so that it obtains authorization tokens from the region hosting the ECR. Current behaviour would be to use the default, resulting in authentication errors such as

2024-02-26T18:56:03.738Z	ERROR	Error during vulnerabilities or misconfiguration scan: scan error: unable to initialize a scanner: unable to initialize an image scanner: 4 errors occurred:
	* docker error: unable to inspect the image (127647282379.dkr.ecr.us-east-1.amazonaws.com/undistro-test-image:1.25.3): Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
	* containerd error: containerd socket not found: /run/containerd/containerd.sock
	* podman error: unable to initialize Podman client: no podman socket found: stat podman/podman.sock: no such file or directory
	* remote error: GET https://127647282379.dkr.ecr.us-east-1.amazonaws.com/v2/undistro-test-image/manifests/1.25.3: unexpected status code 401 Unauthorized: Not Authorized

2024-02-26T18:56:03.738Z	ERROR	Error during vulnerabilities or misconfiguration scan: scan error: unable to initialize a scanner: unable to initialize an image scanner: 4 errors occurred:
	* docker error: unable to inspect the image (127647282379.dkr.ecr.sa-east-1.amazonaws.com/undistro-test-image:1.25.3): Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
	* containerd error: containerd socket not found: /run/containerd/containerd.sock
	* podman error: unable to initialize Podman client: no podman socket found: stat podman/podman.sock: no such file or directory
	* remote error: GET https://127647282379.dkr.ecr.sa-east-1.amazonaws.com/v2/undistro-test-image/manifests/1.25.3: unexpected status code 401 Unauthorized: Not Authorized

This was raised in #1026, which is now closed although the underlying issue doesn't appear to be addressed.

Related issues

Checklist

  • I've read the guidelines for contributing to this repository.
  • I've followed the conventions in the PR title.
  • I've added tests that prove my fix is effective or that my feature works.
  • I've updated the documentation with the relevant information (if needed).
  • I've added usage information (if the PR introduces new options)
  • I've included a "before" and "after" example to the description (if the PR is a user interface change).

@CLAassistant
Copy link

CLAassistant commented Feb 28, 2024

CLA assistant check
All committers have signed the CLA.

@knrc knrc changed the title bug(aws): handle ECR repositories in different regions fix(aws): handle ECR repositories in different regions Feb 28, 2024
Copy link
Contributor

@DmitriyLewen DmitriyLewen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @knrc
Thanks for your report!

Left comments. Take a look, when you have time, please.

Regards, Dmitriy

pkg/fanal/image/registry/ecr/ecr.go Outdated Show resolved Hide resolved
pkg/fanal/image/registry/ecr/ecr.go Outdated Show resolved Hide resolved
pkg/fanal/image/registry/ecr/ecr_test.go Outdated Show resolved Hide resolved
pkg/fanal/image/registry/ecr/ecr.go Outdated Show resolved Hide resolved
@knrc knrc force-pushed the ecr_multi_region branch 4 times, most recently from 1d8902b to ed18edb Compare February 29, 2024 13:48
Copy link
Contributor

@DmitriyLewen DmitriyLewen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@knrc Thanks for your work!

@@ -46,11 +46,34 @@ func (e *ECR) CheckOptions(domain string, option types.RegistryOptions) error {
return err
}

// override region with the value from the repository domain
cfg.Region = region
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@knrc I found 1 interesting case:
if AWS_REGION env != region from domain:
Should we use AWS_REGION (we are overwriting this value now)?

IIUC this case is user mistake (wrongAWS_REGION). But perhaps it make sense to show warning log message about this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DmitriyLewen The point of the PR is to override the AWS_REGION setting, if we don't do that then we end up with an authentication token for one region and have no visibility of containers hosted in other regions.

Our use case is multiple private repositories in multiple regions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant that we need to tell the user that region from AWS_REGION != region from domain.
Something like that:

func getSession(region string, option types.RegistryOptions) (aws.Config, error) {
	// create custom credential information if option is valid
	if option.AWSSecretKey != "" && option.AWSAccessKey != "" && option.AWSRegion != "" {
		if region != option.AWSRegion {
			log.Logger.Warnf("The region from AWS_REGION (%s) is incorrect. The region from domain (%s) was used.", option.AWSRegion, region)
		}
		return config.LoadDefaultConfig(
			context.TODO(),
			config.WithRegion(region),
			config.WithCredentialsProvider(credentials.NewStaticCredentialsProvider(option.AWSAccessKey, option.AWSSecretKey, option.AWSSessionToken)),
		)
	}
	return config.LoadDefaultConfig(context.TODO(), config.WithRegion(region))
}

Also i am worried about asff template. We use AWS_REGION env for this template. Perhaps we need to set AWS_REGION env when we have overwritten the region.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DmitriyLewen Ah gotcha. We can certainly add a message, although I'm not sure it would make much sense as it is likely to have been set by the webhook to match the EKS installation. If you consider our use case, with multiple private repositories in different regions, then it would be impossible for the user to set the region appropriately so it would be defaulted to the webhook's view.

I can take a look at the template today, I didn't consider that, and can certainly pass the parameter through to getSession as that seems cleaner than rewriting it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed the changes for getSession and the warning, looking at the template.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it looks like when using asff template, users will have AWS_REGION set. In addition, we display a warning.
We can start with these changes.

If problems arise, we will think about fixing them (as another solution, we can add your regex to asff.tpl).

Copy link
Author

@knrc knrc Mar 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should Region match the image region?

AWS docs(https://docs.aws.amazon.com/securityhub/1.0/APIReference/API_AwsSecurityFinding.html#securityhub-Type-AwsSecurityFinding-ProductArn) say:

Region

    The Region from which the finding was generated.

+1, in my view Arn and Region are not related but the template assumes they are.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it looks like when using asff template, users will have AWS_REGION set. In addition, we display a warning. We can start with these changes.

If problems arise, we will think about fixing them (as another solution, we can add your regex to asff.tpl).

Yes, since the output doesn't change with this PR we are no worse off. I do think there is change needed for the template but that should be a separate issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DmitriyLewen I don't think those other links change anything for this PR.

Copy link
Contributor

@DmitriyLewen DmitriyLewen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work @knrc

@knqyf263 i approved this PR.
If you agree with #6217 (comment) - we can merge it.

@knrc
Copy link
Author

knrc commented Mar 21, 2024

@DmitriyLewen I ran into another problem, it turns out the registry code (at least in 0.49.1) is broken. The three registries (google, azure and ECR) are invoked concurrently, which means their state gets overwritten each time while still being used.

I have some fixes for that, I'll check with 0.50.0 and push up another version of this PR some time next week.

@knrc
Copy link
Author

knrc commented Mar 26, 2024

@DmitriyLewen I pushed up the changes so you can see the difference, I'm just about to test them on 0.50.0. I'll rebase the PR on the latest once I've validated it.

@knrc
Copy link
Author

knrc commented Mar 26, 2024

@DmitriyLewen I've tested and rebased the PR, it's ready again

@DmitriyLewen
Copy link
Contributor

Hello @knrc

The three registries (google, azure and ECR) are invoked concurrently, which means their state gets overwritten each time while still being used.

I'm a little confused

Trivy checks registries sequentially:

for _, registry := range registries {
err := registry.CheckOptions(domain, opt)
if err != nil {
continue
}
username, password, err := registry.GetCredential(ctx)
if err != nil {
// only skip check registry if error occurred
log.Logger.Debug(err)
break
}
return authn.Basic{
Username: username,
Password: password,
}
}
return authn.Basic{}

Which field is overwritten?

@knrc
Copy link
Author

knrc commented Mar 27, 2024

I'm a little confused

Trivy checks registries sequentially:

It does, within GetToken, however GetToken is called concurrently.

Which field is overwritten?

Line 37 calls CheckOptions, which will create client resources in the singleton registry based on the domain. This client is then used later within GetCredentials. Since the singletons are being accessed concurrently the client is not guaranteed to be the one created within the previous call to CheckOptions in the loop.

@DmitriyLewen
Copy link
Contributor

I think I understand your logic.

But I don't see any place where we use GetToken function (or upper function) using goroutines.
We also use 1 image. Therefore, if we overwrite ECRClient -> it will be a new ECRClient but with the same settings.

But i can missing something. Will be great if you can show some example.

Anyway, I think these changes should be made in another PR.
Can you undo the last changes and create a new PR with those changes and the example in the new PR?

@knrc
Copy link
Author

knrc commented Apr 3, 2024

But I don't see any place where we use GetToken function (or upper function) using goroutines. We also use 1 image. Therefore, if we overwrite ECRClient -> it will be a new ECRClient but with the same settings.

But i can missing something. Will be great if you can show some example.

The artifacts are scanned by different workers in parallel, see

p := parallel.NewPipeline(s.opts.Parallel, !s.opts.Quiet, resourceArtifacts, onItem, onResult)
err = p.Do(ctx)

Copy link
Contributor

@DmitriyLewen DmitriyLewen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@knrc sorry for the wait for an answer.

The artifacts are scanned by different workers in parallel, see

Thank you for showing me this. I'm currently seeing this problem!

Your changes look correct for this case.

I left 1 comment about google test.

@knrc
Copy link
Author

knrc commented Apr 18, 2024

@knrc sorry for the wait for an answer.

No worries, we all have our day jobs.

The artifacts are scanned by different workers in parallel, see

Thank you for showing me this. I'm currently seeing this problem!

:)

Your changes look correct for this case.

I left 1 comment about google test.

Sounds good, I'll take a look. I'm at Open Source Summit this week, but will try to get to this as quickly as I can.

@knrc knrc force-pushed the ecr_multi_region branch 2 times, most recently from d0f4e93 to ed0beb2 Compare April 18, 2024 13:33
@knrc
Copy link
Author

knrc commented Apr 18, 2024

Sounds good, I'll take a look. I'm at Open Source Summit this week, but will try to get to this as quickly as I can.

@DmitriyLewen I rebased and updated the PR for your comment, can you take another look?

@DmitriyLewen
Copy link
Contributor

@knrc can you fix linter error?

@knrc
Copy link
Author

knrc commented Apr 19, 2024

@knrc can you fix linter error?

Yes, I can add that to the list since I'm in those files anyway.

Signed-off-by: Kevin Conner <kev.conner@getupcloud.com>
@knrc
Copy link
Author

knrc commented Apr 19, 2024

@DmitriyLewen try now

Copy link
Contributor

@DmitriyLewen DmitriyLewen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@knrc Thanks for your work!

@knqyf263 take a look, when you have time, please.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AWS ECR registry authentication only works in the same/default region as caller
3 participants