`s3-algo`

High-performance algorithms for batch operations in Amazon S3, on top of rusoto. Reliability and performance achieved through a configurable timeout/retry/backoff algorithm, for high volumn of requests. Monitor progress closely with closures that get called for every finished request, for accurate user feedback.

https://docs.aws.amazon.com/AmazonS3/latest/dev/optimizing-performance-guidelines.html

Upload multiple files with s3_upload_files.
List files with s3_list_objects or s3_list_prefix, and then execute deletion or copy on all the files.

This crate is only in its infancy, and we happily welcome PR's, feature requests, suggestions for improvement of the API.

Running tests and examples

Both tests and examples require that an S3 service such as minio is running locally at port 9000. Tests assume that a credentials profile exists - for example in ~/.aws/credentials:

[testing]
aws_access_key_id = 123456789
aws_secret_access_key = 123456789

Listing, deleting and copying objects

Is all done with entrypoint s3_list_objects() or s3_list_prefix(), which return a ListObjects object which can delete and copy files. Example:

s3_list_prefix(s3, "test-bucket".to_string(), "some/prefix".to_string())
    .delete_all()
    .await
    .unwrap();

Upload

Features of the `s3_upload_files` function

As generic as possible, to support many use cases.
It is possible to collect detailed data from the upload through a closure - one can choose to use this data to analyze performance, or for example to implement a live progress percentage report.
Backoff mechanism
Fast. Several mechanisms are in place, such as aggressive timeouts, parallelization and streaming files from file system while uploading.

Algorithm details

The documentation for UploadConfig may help illuminate the components of the algorithm. The currnetly most important aspect of the algorithm revolves around deciding timeout values. That is, how long to wait for a request before trying again. It is important for performance that the timeout is tight enough. The main mechanism to this end is the estimation of the upload bandwidth through a running exponential average of the upload speed (on success) of individual files. Additionally, on each successive retry, the timeout increases by some factor (back-off).

Yet to consider

Is the algorithm considerate with respect to other processes that want to use the same network? For example in the case of congestion. It does implement increasing back-off intervals after failed requests, but the real effect on a shared network should be tested.

Examples

`perf_data`

Command-line interface for uploading any directory to any bucket and prefix in a locally running S3 service (such as minio). Example:

cargo run --example perf_data  -- -n 3 ./src test-bucket lala

Prints:

          attempts             bytes        success_ms          total_ms              MBps          MBps est
                 1              1990                32                32           0.06042           1.00000
                 1             24943                33                33           0.74043           1.00000
                 1              2383                29                29           0.08211           1.00000
                 1               417                13                13           0.03080           1.00000
                 1              8562                16                16           0.51480           1.00000

total_ms is the total time including all retries, and success_ms is the time of only the last attempt. The distinction between these two is useful in real cases where attempts is not always 1.

You can then verify that the upload happened by entering the container. Something like:

$ docker exec -it $(docker ps --filter "ancestor=minio" --format "{{.Names}}") bash
[user@144aff4dae5b ~]$ ls s3/
test-bucket/ 
[user@144aff4dae5b ~]$ ls s3/test-bucket/
lala

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
examples		examples
src		src
.gitignore		.gitignore
.pre-commit		.pre-commit
Cargo.toml		Cargo.toml
README.md		README.md
rustfmt.toml		rustfmt.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

examples

examples

src

src

.gitignore

.gitignore

.pre-commit

.pre-commit

Cargo.toml

Cargo.toml

README.md

README.md

rustfmt.toml

rustfmt.toml

Repository files navigation

`s3-algo`

Running tests and examples

Listing, deleting and copying objects

Upload

Features of the `s3_upload_files` function

Algorithm details

Yet to consider

Examples

`perf_data`

About

Releases

Packages

Languages

openanalytics/s3-algo

Folders and files

Latest commit

History

Repository files navigation

s3-algo

Running tests and examples

Listing, deleting and copying objects

Upload

Features of the s3_upload_files function

Algorithm details

Yet to consider

Examples

perf_data

About

Resources

Stars

Watchers

Forks

Languages

`s3-algo`

Features of the `s3_upload_files` function

`perf_data`