Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 - Support virtual-hosted style buckets #1823

Draft
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

tatsuya6502
Copy link

@tatsuya6502 tatsuya6502 commented Sep 19, 2020

[WIP] This pull request adds support for S3 virtual-hosted addressing style to rusoto_s3 crate.

Without this change, users of rusoto_s3 crate cannot access Amazon S3 buckets created after September 30, 2020. (Update: AWS has announced that they will delay the deprecation) This is because AWS has deprecated path-style addressing, which is the only style supported by current rusoto_s3 release. See this blog post from AWS for more details of the path-style deprecation.

Fixes: #1482, #1789

Breaking Changes

This pull request includes breaking changes:

  • Change the default S3 addressing style to auto, which makes S3Client to use virtual-hosted-style whenever possible and to fall back to path-style when necessary.
    • This should be the most preferable choice for the users using Amazon S3.
    • However, other users who use S3 compatible object storages such as Ceph and MinIO may have problem accessing their storages (e.g. by a DNS error; no such host). This is because their storages may not have configured for virtual-hosted-style. For example, they will need to add a DNS entry every time a new bucket is created.
    • They can fix this by changing the addressing style in S3Client back to path-style.
    • This will be documented in the following places:
  • Change the return type of S3 get_presigned_url() from String
    to Result<String, InvalidDnsNameError>.
    • This change is needed to support different S3 addressing styles in presigned URL.

API Changes

This pull request has the following API additions and changes:

  • Add an enum rusoto_s3::util::AddressingStyle with variants Auto, Virtual and Path.
    • The default variant is Auto.
    • These variants behaves exactly like the options in s3.addressing_style in Boto3.
  • Add a struct rusoto_s3::util::S3Config, which has only one public field addressing_style.
  • Add a private filed config: S3Config to rusoto_s3::S3Client.
  • Add a public field addressing_style to rusoto_s3::util::PreSignedRequestOption.
  • Add a struct rusoto_core::error::InvalidDnsNameError.
    • This serves the exact same purpose to InvalidDNSNameError exception in Boto3.
  • Add a variant to rusoto_core::error::Rusoto
  • As mentioned above, chang the return type of S3 get_presigned_url() from String
    to Result<String, InvalidDnsNameError>.

The Behavior of AddressingStyle

AddressingStyle provides the same behavior to s3.addressing_style in Boto3 (Reference).

Here are some details (copied from an issue comment of #1482)

At first this seemed pretty straight forward, but having a look at how it is implemented in botocore it becomes clear that there is some more logic involved.
Quick write-up of the main points:

  • A ListAllBuckets request does not have a bucket associated with it so "virtual hosting" style is not possible at all.
  • bucket names have to comply with valid DNS names, otherwise "path style" will be used. This is the case for:
    • bucket names containing "."
    • bucket_name.length < 3 or bucket_name.length > 63
    • if the bucket name does not match the regex [a-z0-9][a-z0-9\-]*[a-z0-9]
  • The transformation is skipped if its a GetBucketLocation request to avoid any sigv4 issues (not sure why there are issues)

Automated Testing (CI)

This pull request includes some additional unit tests (IN-PROGRESS).

The existing unit tests are updated to use AddressingStyle::Auto and some unit tests are added for ListAllBuckets and GetBucketLocation, which require AddressingStyle::Path.

The integration tests provided by rusoto_tests crate has the following changes:

  • Introduced the following environment variables to the test module rusoto_tests::s3:
    • S3_PROXY to set the HTTP proxy URL to the test S3/HTTP clients.
    • S3_ADDRESSING_STYLE to set the addressing style in S3 client and presigned URL. (Only value path is recognized for Path variant, otherwise Auto variant is used)
  • Integration tests using MinIO is changed to tests virtual-hosted-style addressing with AddressingStyle::Auto.
    • This is done by:
      • Setting the S3 endpoint http://s3.us-east-1.rusoto.example.com:9000 to MinIO server and also to S3Client.
      • And setting the IP address of the S3 endpoint http:://172.0.0.1:9000 as the proxy of the S3Client and, for testing presigned URLs, the proxy of reqwest::Client.
  • Integration tests using Ceph kept testing path-style addressing with AddressingStyle::Path.

Notes

- rusoto_s3 crate:
    - For now, exploring a solution by manually modifying "generated.rs".
      Later, I will apply the changes to the code generators in
      service_crategen.
    - Add AddressingStyle enum to s3::util module and make the Auto variant
      as the default.
    - Add S3Config struct with addressing_style field to s3::util module.
- rusoto_core crate:
    - Add InvalidDnsName variant to RutotoError in rusoto_core crate.
- rusoto_signature crate:
    - Add extract_hostname() method to Region enum (WIP)
- service_crategen crate:
    - Update codegen, rest_xml and rest_request_generator modules to
      generate extra codes for S3 service to handle different addressing
      styles.
- rusoto_s3 crate:
    - No more manual edits on "generated.rs".
    - Change the default valiant for AddressingStyle enum from Auto to
      Path for now, in order to make auto-generate unit tests to pass.
- rusoto_signature crate:
    - Remove extract_hostname() method from Region enum. This method was
      added by a previous commit.
- rusoto_s3 crate:
    - Update PreSignedRequest implementation to support S3 addressing styles.
- service_crategen crate:
    - Add `pub` to generated build_s3_hostname() method of S3Client.
    - Remove a blank line in generated codes right above the client struct.
- rusoto_tests crate (aka integration_tests)
    - Fix compile errors due to the changes in PreSignedRequest
      implementation in previous commit.
@tatsuya6502
Copy link
Author

There is an open pull request #1817 to remove regex dependency from rusoto_credential crate.

In this pull request, I have added a regex dependency to rusoto_s3 crate. Maybe I should remove it. The regex matcher here is simple enough and can be easily replaced with some Rust codes.

rusoto/services/s3/src/custom/util.rs

fn is_valid_dns_name(bucket_name: &str) -> bool {
    use lazy_static::lazy_static;
    use regex::Regex;

    // ...

    lazy_static! {
        static ref LABEL_RE: Regex = Regex::new(r"[a-z0-9][a-z0-9\-]*[a-z0-9]").unwrap();
    }
    LABEL_RE.is_match(bucket_name)
}

- rusoto_s3 crate:
    - Remove regex crate usage from is_valid_dns().
- rusoto_s3 crate:
    - Change the default of AddressingStyle from Path to Auto.
    - Update the custom_tests to check virtual-hosted-style URI.
- rusoto_tests crate (aka integration_tests)
    - Update the TestS3Client to set S3Client's addressing style based
      on the value of an environment variable S3_ADDRESSING_STYLE.
    - Update tests with Ceph and Minio to keep using S3 path-style URI
      because we do not have DNS setup for them and virtual-hosted-style
      will not work.
@tatsuya6502
Copy link
Author

OK. I removed the regex dependency from this pull request in this commit.

- rusoto_tests crate (aka integration_tests)
    - Update tests with Minio to use S3 virtual-hosted-style URI.
    - Update TestS3Client and reqwest::Client to set HTTP proxy to the URI
      specified by an environment variable S3_PROXY. This setting is needed
      to run tests with virtual-hosted-style URI without having a local DNS
      server.
- rusoto_tests crate (aka integration_tests)
    - Create another set of Minio and Ceph tests so that we can test both
      S3 virtual-hosted-style URI and path-style URI.
    - Fix some presigned-URL tests in the s3 module; they were using
      virtual-hosted-style URI regardless the S3_ADDRESSING_STYLE value.
- rusoto_tests crate (aka integration_tests)
    - Reorder test scripts in .semaphoreci so that CI does not have to
      pull and purge the same Docker images twice.
@kennytm
Copy link

kennytm commented Oct 26, 2020

Hello is there any progress on this? 😄

@tatsuya6502
Copy link
Author

@kennytm – Hi. I am basically done with all coding stuff here. I want to add some test cases and documentation. I will work on them in coming weekend.

@tatsuya6502
Copy link
Author

Sorry for the big delay. I am planning to work on this in this week.

@meh
Copy link

meh commented Feb 8, 2021

Would this also fix #1742?

@williamstar
Copy link

This feature is so important but without any update .. 🎱

Conflicts:
    .semaphoreci/test_11_clean_docker_images.sh
    ci/test_11_ceph_path_style.sh
    ci/test_11_clean_docker_images.sh
    ci/test_12_clean_docker_images.sh
    ci/test_21_clean_docker_images.sh
    ci/test_21_minio_path_style.sh
    ci/test_22_clean_docker_images.sh
    integration_tests/README.md
    rusoto/services/s3/src/generated.rs
Remove obsolete CI scripts.
Update the Hyper version for the integration tests.
@tisonkun
Copy link

@tatsuya6502 is there a schedule on reviewing this feature? https://github.com/tikv/rusoto already integrates your patch and push it onto production by tikv/tikv#9425 .

If we can get this feature in upstream, we can brining it to a wider range of developers. Looking forward to your update.

@AdamLeyshon
Copy link

Looks good, @tatsuya6502, can you convert this from a draft to a full PR please? 👍🏻
I'd love for this to get merged.

@tatsuya6502
Copy link
Author

Hi @iliana. Some users still need virtual hosted style access for S3. I did some work here but it is pretty outdated now, so I will need to rebase it. If I do it, is there any chance that this PR will be merged? I see the followings on the README, so I am curious:

https://github.com/rusoto/rusoto#maintenance-status

⚠️ Rusoto is in maintenance mode. ⚠️

The current maintainers only have the bandwidth to review dependency bumps and obvious bugfixes. Our bandwidth for reviewing new features is extremely limited.

While you are welcome to submit PRs that implement new features or refactor existing code, they are unlikely to be merged unless we can find more active maintainers.

Please see #1651 for details.

@zoonman
Copy link

zoonman commented Aug 26, 2022

At this point this PR is a bugfix and should get prioritized.
I created a bucket yesterday and the rusoto is no longer capable to access it.

@davidbarsky
Copy link
Contributor

At this point this PR is a bugfix and should get prioritized.

@zoonman I don't believe that there's any maintainer of Rusoto left, as most people have left Amazon or are no longer interested in supporting Rusoto. You're probably better off using the AWS Rust SDK at this point.

@AdamLeyshon
Copy link

The issue with the AWS Rust SDK is that the project README clearly states that it is not production ready.
I certainly hope that some people interested in Rusoto will take over the project soon as the original maintainers have asked for.

@davidbarsky
Copy link
Contributor

I certainly hope that some people interested in Rusoto will take over the project soon as the original maintainers have asked for.

The official AWS SDK for Rust is that project.

The issue with the AWS Rust SDK is that the project README clearly states that it is not production ready.

Full disclosure: I used to work on the AWS Rust SDK, but haven't for a little bit, so I hope don't think that I'm speaking out of turn for the team that currently works on it. I have shared similar things sentiments with my folks at my current employer with the team's okay, but since this issue might have larger reach, I'll speak in more general terms.

With that out of the way: if you stick with the high-level, vanilla clients in a relatively vanilla way (e.g., using the AWS Rust SDK clients as you would use the Rusoto clients), run cargo update occasionally, and are okay with breaking changes that impact user-configurable, but advanced features such as middleware or updates to pre-1.0 dependencies, then you can comfortably use the official Rust SDK with minimal breaking changes.

For context, AWS has an extremely high bar for what they consider to be "generally available" or "production ready". They consider security backports to older versions and something like a 5+ year stability policy necessary to be labeled "generally available"; the AWS Rust SDK cannot yet commit to such things. If you need those things, then yes: the AWS SDK for Rust isn't ready for you, but if you're commenting on this issue, I strongly suspect that it's not a hard requirement.

@tatsuya6502
Copy link
Author

FYI. The official AWS SDK S3 for Rust now supports virtual-hosted style bucket. It is enabled by default.

https://github.com/awslabs/aws-sdk-rust/releases/tag/release-2023-01-13

  • ⚠🎉 (Endpoints 2.0 smithy-lang/smithy-rs#1784, Enable Endpoints 2.0 smithy-lang/smithy-rs#2074) Integrate Endpoints 2.0 into the Rust SDK. Endpoints 2.0 enables features like S3 virtual addressing & S3 object lambda. As part of this change, there are several breaking changes although efforts have been made to deprecate where possible to smooth the upgrade path.
    ...

  • ⚠🎉 (Endpoints 2.0 smithy-lang/smithy-rs#1784, Enable Endpoints 2.0 smithy-lang/smithy-rs#2074) Add additional configuration parameters to aws_sdk_s3::Config.

    The launch of endpoints 2.0 includes more configuration options for S3. The default behavior for endpoint resolution has been changed. Before, all requests hit the path-style endpoint. Going forward, all requests that can be routed to the virtually hosted bucket will be routed there automatically.

    • force_path_style: Requests will now default to the virtually-hosted endpoint <bucketname>.s3.<region>.amazonaws.com
    • use_arn_region: Enables this client to use an ARN’s region when constructing an endpoint instead of the client’s configured region.
    • accelerate: Enables this client to use S3 Transfer Acceleration endpoints.

Note: the AWS SDK for Rust does not currently support Multi Region Access Points (MRAP).

omegaphoenix added a commit to omegaphoenix/rusoto that referenced this pull request Mar 8, 2023
omegaphoenix added a commit to omegaphoenix/rusoto that referenced this pull request Mar 9, 2023
omegaphoenix added a commit to omegaphoenix/rusoto that referenced this pull request Mar 9, 2023
omegaphoenix added a commit to omegaphoenix/rusoto that referenced this pull request Mar 9, 2023
omegaphoenix added a commit to omegaphoenix/rusoto that referenced this pull request Mar 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Convert S3 URL from "path-style" to "subdomain-style"
8 participants