Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: Initial request for resumable uploads larger than ChunkSize not retried #6652

Closed
jamesl33 opened this issue Sep 12, 2022 · 2 comments · Fixed by googleapis/google-api-go-client#1690
Assignees
Labels
api: storage Issues related to the Cloud Storage API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@jamesl33
Copy link

jamesl33 commented Sep 12, 2022

Description
I'm currently investigating a case where we're seeing a large number of failures due to 503 status codes
(ServiceUnavailable). My understanding was that these errors are retried by default, however, I've observed a case
where that doesn't appear to be true.

  • Requests that fall under the default chunk threshold (16MiB) are retried
  • The first chunk for requests that are equal, or larger to the chunk threshold will not be retried
  • Through modifying the SDK, I've observed that subsequent requests do appear to be retried (although it would be great to confirm my findings)

Walking through the code, it appears this is because the first request for an upload has req.GetBody set to nil:

// google.golang.org/api@v0.46.0/storage/v1/storage-gen.go:10253

body, getBody, cleanup := c.mediaInfo_.UploadRequest(reqHeaders, body)
defer cleanup()

<snip>

req.GetBody = getBody

This results in gensupport.SendRequestWithRetry not retrying the request because it can't retrieve the request body
for the retry.

// google.golang.org/api@v0.46.0/internal/gensupport/send.go:96

// Check if we can retry the request. A retry can only be done if the error
// is retryable and the request body can be re-created using GetBody (this
// will not be possible if the body was unbuffered).
if req.GetBody == nil || !shouldRetry(status, err) {
    break
}

I suspect this may be why we've seen so many 503 status codes surface as errors where I'd expect them to be
retried.

Client

Storage

Environment

$ uname -a
Linux <hostname> 5.19.7-arch1-1 #1 SMP PREEMPT_DYNAMIC Mon, 05 Sep 2022 18:09:09 +0000 x86_64 GNU/Linux

Go Environment

$ go version
go version go1.19 linux/amd64

$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN="/home/james/.local/bin"
GOCACHE="/home/james/.cache/go-build"
GOENV="/home/james/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/james/.local/lib/golang/pkg/mod"
GOOS="linux"
GOPATH="/home/james/.local/lib/golang:/home/james/Projects/go"
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/lib/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/go/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.19"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
GOWORK=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build2865911939=/tmp/go-build -gno-record-gcc-switches"

Code

// server.go

package main

import (
	"fmt"
	"net/http"
)

func main() {
	http.HandleFunc("/upload/storage/v1/b/bucket/o", func(w http.ResponseWriter, req *http.Request) {
		// Log something useful so we can verify the test client is connecting
		fmt.Printf("Received request from %s\n", req.RemoteAddr)

		// Simulate that GCS is unavailable
		w.WriteHeader(http.StatusServiceUnavailable)

		// The SDK expects a JSON response
		w.Write([]byte("{}"))
	})

	http.ListenAndServe(":12345", nil)
}
// client.go

package main

import (
	"context"
	"io"
	"strings"

	"cloud.google.com/go/storage"
	"google.golang.org/api/googleapi"
	"google.golang.org/api/option"
)

func main() {
	client, err := storage.NewClient(
		context.Background(),
		option.WithEndpoint("http://localhost:12345"),
		option.WithoutAuthentication(),
	)
	if err != nil {
		panic(err)
	}

	// Setup and enable our custom request retry handling which extends that implemented in the SDK
	client.SetRetry(
		storage.WithPolicy(storage.RetryAlways),
	)

	ctx, cancel := context.WithCancel(context.Background())
	defer cancel()

	writer := client.Bucket("bucket").Object("key").NewWriter(ctx)

	// Should see indefinite retries
	body := strings.Repeat("a", googleapi.DefaultUploadChunkSize-1)

	// Should see an immediate failure
	// body := strings.Repeat("a", googleapi.DefaultUploadChunkSize)

	if _, err := io.Copy(writer, strings.NewReader(body)); err != nil {
		panic(err)
	}

	if err := writer.Close(); err != nil {
		panic(err)
	}
}

Steps to Reproduce

  1. Create a directory
  2. Create a server.go file with the server code provided
  3. Create a client.go file with the client code provided
  4. Run go mod init github.com/<username>/gcs-retries && go mod tidy
  5. Run go run server.go
  6. Run go run client.go (we should see multiple requests logged in the server)
  7. Comment/uncomment/modify the code to use the larger body size
  8. Run go run client.go (we should an immediate failure)

Expected behavior

I would expect to see this 503 error retried, however, there may be a good reason for not retrying the first request
(e.g. forced non-idempotency for requests that setup resumable uploads).

Actual behavior

I observe that the first request for an upload larger than ChunkSize is not retried.

@jamesl33 jamesl33 added the triage me I really want to be triaged. label Sep 12, 2022
@product-auto-label product-auto-label bot added the api: storage Issues related to the Cloud Storage API. label Sep 12, 2022
@shaffeeullah shaffeeullah added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. priority: p2 Moderately-important priority. Fix may not be included in next release. and removed triage me I really want to be triaged. labels Sep 12, 2022
@cojenco cojenco self-assigned this Sep 12, 2022
@tritone
Copy link
Contributor

tritone commented Sep 13, 2022

Thanks for flagging this, we were able to reproduce the problem and you're entirely correct. We are working on a fix in the library that handles resumable upload logic, google-api-go-client. Follow along at googleapis/google-api-go-client#1690

tritone added a commit to googleapis/google-api-go-client that referenced this issue Sep 21, 2022
… w/ non-nil getBody (#1690)

This allows retries when initiating a resumable upload session.

Add support in media.go to return a getBody function so the request body can be recreated during retry attempts.

Tests pass locally in google-cloud-go/storage

Updates googleapis/google-cloud-go#6652

Co-authored-by: Chris Cotter <cjcotter@google.com>
@tritone
Copy link
Contributor

tritone commented Sep 21, 2022

The fix has been released in google-api-go-client; we will bump the dependency in this library as well and release tomorrow.

tritone added a commit to tritone/google-cloud-go that referenced this issue Sep 21, 2022
Bump apiary dependency to pick up the fix for googleapis#6652
gcf-merge-on-green bot pushed a commit that referenced this issue Sep 22, 2022
Bump apiary dependency for storage to pick up the fix for #6652
rhu713 pushed a commit to rhu713/google-api-go-client that referenced this issue Nov 4, 2022
… w/ non-nil getBody (googleapis#1690)

This allows retries when initiating a resumable upload session.

Add support in media.go to return a getBody function so the request body can be recreated during retry attempts.

Tests pass locally in google-cloud-go/storage

Updates googleapis/google-cloud-go#6652

Co-authored-by: Chris Cotter <cjcotter@google.com>
rhu713 pushed a commit to rhu713/google-api-go-client that referenced this issue Nov 4, 2022
… w/ non-nil getBody (googleapis#1690)

This allows retries when initiating a resumable upload session.

Add support in media.go to return a getBody function so the request body can be recreated during retry attempts.

Tests pass locally in google-cloud-go/storage

Updates googleapis/google-cloud-go#6652

Co-authored-by: Chris Cotter <cjcotter@google.com>
rhu713 pushed a commit to rhu713/google-api-go-client that referenced this issue Nov 4, 2022
… w/ non-nil getBody (googleapis#1690)

This allows retries when initiating a resumable upload session.

Add support in media.go to return a getBody function so the request body can be recreated during retry attempts.

Tests pass locally in google-cloud-go/storage

Updates googleapis/google-cloud-go#6652

Co-authored-by: Chris Cotter <cjcotter@google.com>
rhu713 pushed a commit to cockroachdb/google-api-go-client that referenced this issue Nov 4, 2022
… w/ non-nil getBody (googleapis#1690)

This allows retries when initiating a resumable upload session.

Add support in media.go to return a getBody function so the request body can be recreated during retry attempts.

Tests pass locally in google-cloud-go/storage

Updates googleapis/google-cloud-go#6652

Co-authored-by: Chris Cotter <cjcotter@google.com>
rhu713 pushed a commit to cockroachdb/google-api-go-client that referenced this issue Nov 7, 2022
… w/ non-nil getBody (googleapis#1690)

This allows retries when initiating a resumable upload session.

Add support in media.go to return a getBody function so the request body can be recreated during retry attempts.

Tests pass locally in google-cloud-go/storage

Updates googleapis/google-cloud-go#6652

Co-authored-by: Chris Cotter <cjcotter@google.com>
rhu713 pushed a commit to cockroachdb/google-api-go-client that referenced this issue Nov 7, 2022
… w/ non-nil getBody (googleapis#1690)

This allows retries when initiating a resumable upload session.

Add support in media.go to return a getBody function so the request body can be recreated during retry attempts.

Tests pass locally in google-cloud-go/storage

Updates googleapis/google-cloud-go#6652

Co-authored-by: Chris Cotter <cjcotter@google.com>
rhu713 pushed a commit to cockroachdb/google-api-go-client that referenced this issue Nov 17, 2022
… w/ non-nil getBody (googleapis#1690)

This allows retries when initiating a resumable upload session.

Add support in media.go to return a getBody function so the request body can be recreated during retry attempts.

Tests pass locally in google-cloud-go/storage

Updates googleapis/google-cloud-go#6652

Co-authored-by: Chris Cotter <cjcotter@google.com>
rhu713 pushed a commit to cockroachdb/google-api-go-client that referenced this issue Jan 18, 2023
… w/ non-nil getBody (googleapis#1690)

This allows retries when initiating a resumable upload session.

Add support in media.go to return a getBody function so the request body can be recreated during retry attempts.

Tests pass locally in google-cloud-go/storage

Updates googleapis/google-cloud-go#6652

Co-authored-by: Chris Cotter <cjcotter@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the Cloud Storage API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
4 participants