Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ListObjectVersionsPagesWithContext can get into an infinite loop when using URL encoding #4672

Open
clumio-nick opened this issue Dec 23, 2022 · 1 comment
Assignees
Labels
bug This issue is a bug. m Effort estimation: medium p3 This is a minor priority issue

Comments

@clumio-nick
Copy link

clumio-nick commented Dec 23, 2022

Describe the bug

When EncodingType is set to s3.EncodingTypeUrl, ListObjectVersionsPagesWithContext can call "fn" repeatedly with the same keys. It never reaches the end of the listing, and only stops if "fn" returns false.

This happens if there is an object key containing a backslash ("\").

Expected Behavior

I expect each call to "fn" to have different keys, and I expect the listing to eventually end even if "fn" always returns true.

Current Behavior

"fn" is called repeatedly with the same keys.

Reproduction Steps

Run this program as "go run main.go -bucket <your-s3-bucket> -prefix <prefix-with-no-objects>.
It will create two objects under the prefix and attempt to list them with ListObjectVersionsPagesWithContext.
However, it gets into an infinite loop returning the first object again and again.
Note: this program does not clean up the objects it created, you have to delete them manually.

Sample output:

##### Create objects #####
Put object key: test1/object\1, version ID: ""
Put object key: test1/object\2, version ID: ""

##### List objects #####
Listed key: test1/object\1, version ID: null
Finished one page, next key marker: test1/object%5C1

Listed key: test1/object\1, version ID: null
Finished one page, next key marker: test1/object%5C1

Listed key: test1/object\1, version ID: null
Finished one page, next key marker: test1/object%5C1

^Csignal: interrupt

The program

package main

import (
	"bytes"
	"context"
	"flag"
	"fmt"
	"net/url"
	"strings"

	"github.com/aws/aws-sdk-go/aws"
	"github.com/aws/aws-sdk-go/aws/session"
	"github.com/aws/aws-sdk-go/service/s3"
)

func main() {
	bucket := flag.String("bucket", "", "S3 bucket name")
	prefixFlag := flag.String("prefix", "", "Prefix under which to create objects")
	flag.Parse()

	// Add a trailing "/" to the prefix if there isn't one
	prefix := *prefixFlag
	if !strings.HasSuffix(prefix, "/") {
		prefix += "/"
	}

	sess := session.Must(session.NewSession())
	api := s3.New(sess)

	fmt.Println("##### Create objects #####")

	// Create two objects under the provided prefix. The object keys contain a
	// backslash, which is converted to "%5C" by URL encoding.
	keys := []string{
		"object\\1",
		"object\\2",
	}
	for i, key := range keys {
		fullKey := prefix + key
		putInput := &s3.PutObjectInput{
			Bucket: bucket,
			Key:    aws.String(fullKey),
			Body:   bytes.NewReader([]byte{byte(i)}),
		}
		output, err := api.PutObjectWithContext(context.Background(), putInput)
		if err != nil {
			panic(fmt.Sprintf("Failed to put object %v: %v", fullKey, err))
		}
		fmt.Printf("Put object key: %v, version ID: \"%v\"\n",
			fullKey, aws.StringValue(output.VersionId))
	}

	fmt.Println()
	fmt.Println("##### List objects #####")

	// List objects under the prefix with URL encoding and MaxKeys set to 1.
	listInput := &s3.ListObjectVersionsInput{
		Bucket:       bucket,
		Prefix:       aws.String(prefix),
		EncodingType: aws.String(s3.EncodingTypeUrl),
		MaxKeys:      aws.Int64(1),
	}
	err := api.ListObjectVersionsPagesWithContext(context.Background(), listInput,
		func(output *s3.ListObjectVersionsOutput, cont bool) bool {
			for _, content := range output.Versions {
				fmt.Printf("Listed key: %v, version ID: %v\n",
					urlDecode(aws.StringValue(content.Key)),
					urlDecode(aws.StringValue(content.VersionId)))
			}

			for _, marker := range output.DeleteMarkers {
				fmt.Printf("Listed delete marker: %v, version ID: %v\n",
					urlDecode(aws.StringValue(marker.Key)),
					urlDecode(aws.StringValue(marker.VersionId)))
			}

			fmt.Printf("Finished one page, next key marker: %v\n",
				aws.StringValue(output.NextKeyMarker))
			fmt.Println()

			return true
		})
	if err != nil {
		panic(fmt.Sprintf("Listing error: %v\n", err))
	}

	fmt.Println("Done listing")
}

// Decode a URL-encoded string. Panic if there is any error.
func urlDecode(encoded string) string {
	decoded, err := url.PathUnescape(encoded)
	if err != nil {
		panic(fmt.Sprintf("Unabled to URL decode string \"%v\": %v", encoded, err))
	}
	return decoded
}

Possible Solution

It's possible that ListObjectVersionsPagesWithContext doesn't URL-decode NextKeyMarker from the ListObjectVersions response before using it as the KeyMarker for the next call to ListObjectVersions.

The URL-encoding of "\" is "%5C", and "%" comes before "\" lexicographically.

Additional Information/Context

No response

SDK version used

github.com/aws/aws-sdk-go v1.44.83

Environment details (Version of Go (go version)? OS name and version, etc.)

go version go1.19.4 darwin/amd64, macOS Ventura 13.1

@clumio-nick clumio-nick added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Dec 23, 2022
@RanVaknin RanVaknin self-assigned this Jan 10, 2023
@RanVaknin RanVaknin added investigating This issue is being investigated and/or work is in progress to resolve the issue. and removed needs-triage This issue or PR still needs to be triaged. labels Jan 10, 2023
@RanVaknin
Copy link
Contributor

Hi @clumio-nick,

Thanks for letting us know. I have ran your code and confirm that this is an issue.
After taking a deeper look into the problem I see that the EncodingType: aws.String(s3.EncodingTypeUrl) flag is working as expected in the sense that the Key returned from the service is urlEncoded.

If a user passes the key in this encoded form then S3 will return the first page again (does not continue pagination).
The issue is that there is no protocol level support / modeling capabilities that tell SDKs how to handle this correctly for users.

To give you a little background: all the SDKs are code generated from the various AWS service models, this should have been specified correctly in the service's model.
As a matter of fact, I tried testing this paginator with the JS V3 SDK, and the JS code generator did not even generated that paginator which tells me theres an issue with it.

Since upstreaming this to the service might take a very long time, I think we might need to handwrite the paginator which will also take a while.

In the meantime my only suggestion to you is not to use the EncodingType parameter or that paginator as all, if you can avoid it.

Sorry for the inconvenience.

All the best,
Ran~

@RanVaknin RanVaknin added needs-review This issue or pull request needs review from a core team member. p3 This is a minor priority issue m Effort estimation: medium and removed investigating This issue is being investigated and/or work is in progress to resolve the issue. labels Jan 14, 2023
@RanVaknin RanVaknin removed the needs-review This issue or pull request needs review from a core team member. label Apr 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. m Effort estimation: medium p3 This is a minor priority issue
Projects
None yet
Development

No branches or pull requests

2 participants