Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: CopierFrom large files, progress resets and copy never finishes #6857

Closed
miguelb opened this issue Oct 12, 2022 · 5 comments · Fixed by #6863
Closed

storage: CopierFrom large files, progress resets and copy never finishes #6857

miguelb opened this issue Oct 12, 2022 · 5 comments · Fixed by #6863
Assignees
Labels
api: storage Issues related to the Cloud Storage API. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@miguelb
Copy link

miguelb commented Oct 12, 2022

We are copying a large file from a regional bucket to a multi-regional bucket, and seeing some strange behaviors.

  • If you do not specify the context timeout, the copier goes into an endless error/retry. Before we noticed this issue, we went through a Petabyte of egress in 1-2 weeks.
  • We then added a timeout, copying continues to retry until it reaches the timeout period.
  • We also added a retry strategy of "never" but it still to continues to retry.

Client
GCS

Environment
golang:1.19.1-alpine

Go Environment
1.19.1

Code

func newGCPClient(ctx context.Context) (*storage.Client, error) {
	options := []option.ClientOption{option.WithCredentialsJSON([]byte(config.Env.GoogleCredentials))}
	return storage.NewClient(ctx, options...)
}

func CopyFile(fromBucket string, fromKey string, toBucket string, toKey string) error {
	ctx := context.Background()
	client, err := newGCPClient(ctx)
	if err != nil {
		return err
	}
	defer client.Close()

	src := client.Bucket(fromBucket).Object(fromKey)
	dst := client.Bucket(toBucket).Object(toKey)

	ctx, cancel := context.WithTimeout(ctx, 5*time.Minute)
	defer cancel()

	copier := dst.CopierFrom(src)
	copier.ProgressFunc = func(copiedBytes, totalBytes uint64) {
		fmt.Println(fmt.Sprintf("progress: %d / %d", copiedBytes, totalBytes))
	}
	if _, err := copier.Run(ctx); err != nil {
		return err
	}
	return nil
}

func main() {
	config.LoadVariables()
	fromBucket := ""
	fromKey := ""
	toBucket := ""
	toKey := ""
	if err := CopyFile(fromBucket, fromKey, toBucket, toKey); err != nil {
		fmt.Println("ERROR: ", err)
		return
	}
	fmt.Println("SUCCESS")
}

Expected behavior
Progress increments and eventually finishes

Actual behavior
Progress seems to reset and it never finishes

Here are logs of the progress:

progress: 1862270976 / 8209196233
progress: 1904214016 / 8209196233
progress: 1929379840 / 8209196233
progress: 2231369728 / 8209196233
progress: 2038431744 / 8209196233
progress: 1551892480 / 8209196233
progress: 2298478592 / 8209196233
progress: 1929379840 / 8209196233
progress: 2449473536 / 8209196233
progress: 2021654528 / 8209196233
ERROR:  context deadline exceeded
Exiting.

Also, if we run the same code on two GCS buckets that are both multi-regional. The copy happens almost immediately.

Here are the logs

progress: 8209196233 / 8209196233
SUCCESS
Exiting.
@miguelb miguelb added the triage me I really want to be triaged. label Oct 12, 2022
@codyoss codyoss changed the title cloud.google.com/go/storage: CopierFrom large files, progress resets and copy never finishes storage: CopierFrom large files, progress resets and copy never finishes Oct 13, 2022
@codyoss codyoss added the api: storage Issues related to the Cloud Storage API. label Oct 13, 2022
@tritone
Copy link
Contributor

tritone commented Oct 13, 2022

Hi @miguelb , thanks for filing the issue. Could you let us know what release of the storage client you are using? That will help us reproduce. Also, if I understand correctly, this particular use case (regional to multi region bucket) never worked; it's not something that started at a certain date or after updating the client version -- can you confirm?

@miguelb
Copy link
Author

miguelb commented Oct 13, 2022

This is our first time running into this issue, this is a new codebase. It doesn't work if buckets are different region configuration. If the GCS buckets have the same region configuration, copy happens immediately between the buckets.

cloud.google.com/go/storage v1.27.0

@miguelb
Copy link
Author

miguelb commented Oct 13, 2022

I ran our test against previous versions and cloud.google.com/go/storage v1.24.0 does work. Its slow, but eventually does finish.

Here are the logs:

progress: 2759852032 / 8209196233
progress: 4236247040 / 8209196233
progress: 5494538240 / 8209196233
progress: 7004487680 / 8209196233
progress: 8209196233 / 8209196233
SUCCESS
Exiting.

@tritone
Copy link
Contributor

tritone commented Oct 13, 2022

Thanks very much for the details and for confirming that this worked in an earlier library version. We made a major refactor of the client internals in v1.25.0 so it's likely that there was a bug introduced at that point for your specific use case.

I'll work on a repro and fix for the issue.

@tritone tritone added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. and removed triage me I really want to be triaged. labels Oct 13, 2022
tritone pushed a commit that referenced this issue Oct 14, 2022
This fixes infinite retry issues.

Will follow up with tests to catch this behaviour.

Fixes #6857
@tritone
Copy link
Contributor

tritone commented Oct 14, 2022

We found a bug that was introduced in Copier during the refactor that was preventing the Copy operation from completing when the object is large enough to require multiple requests to the API. We've merged a fix which will go out with the next release of storage (sometime next week). Thanks again for bringing this to our attention, and apologies for the inconvenience.

We'll also work on expanding our integration test coverage of this case. If you have concerns about the cost of data egress you incurred due to this bug, please contact your account manager and have them get in touch with me (they can find me internally via my github handle).

tritone added a commit to tritone/google-cloud-go that referenced this issue Mar 17, 2023
Retract storage releases affected by googleapis#6857. This will prevent the
Go tool from selecting these versions using MVS. Builds where these
releases are currently selected will still work, but the user will
receive a warning.
tritone added a commit that referenced this issue Mar 17, 2023
Retract storage releases affected by #6857. This will prevent the
Go tool from selecting these versions using MVS. Builds where these
releases are currently selected will still work, but the user will
receive a warning.

Co-authored-by: gcf-merge-on-green[bot] <60162190+gcf-merge-on-green[bot]@users.noreply.github.com>
adamcstephens added a commit to adamcstephens/terraform-provider-lxd that referenced this issue Apr 5, 2023
```
go: warning: cloud.google.com/go/storage@v1.27.0: retracted by module author: due to googleapis/google-cloud-go#6857
go: to switch to the latest unretracted version, run:
	go get cloud.google.com/go/storage@latest
  ```
adamcstephens added a commit to adamcstephens/terraform-provider-lxd that referenced this issue Apr 6, 2023
go: warning: cloud.google.com/go/storage@v1.27.0: retracted by module author: due to googleapis/google-cloud-go#6857
go: to switch to the latest unretracted version, run:
	go get cloud.google.com/go/storage@latest
aledbf added a commit to gitpod-io/gitpod that referenced this issue Apr 13, 2023
aledbf added a commit to gitpod-io/gitpod that referenced this issue Apr 14, 2023
adamcstephens added a commit to adamcstephens/terraform-provider-lxd that referenced this issue Apr 16, 2023
go: warning: cloud.google.com/go/storage@v1.27.0: retracted by module author: due to googleapis/google-cloud-go#6857
go: to switch to the latest unretracted version, run:
	go get cloud.google.com/go/storage@latest
adamcstephens added a commit to adamcstephens/terraform-provider-lxd that referenced this issue Apr 16, 2023
go: warning: cloud.google.com/go/storage@v1.27.0: retracted by module author: due to googleapis/google-cloud-go#6857
go: to switch to the latest unretracted version, run:
	go get cloud.google.com/go/storage@latest
jtopjian pushed a commit to terraform-lxd/terraform-provider-lxd that referenced this issue Apr 16, 2023
* run tf-sdk-migrator v2upgrade

* schema.Removed removed in v2, change to deprecated

https://developer.hashicorp.com/terraform/plugin/sdkv2/guides/v2-upgrade-guide#removal-of-helper-schema-schema-removed

* SetPartial was removed in v2

https://developer.hashicorp.com/terraform/plugin/sdkv2/guides/v2-upgrade-guide#removal-of-helper-schema-resourcedata-setpartial

* ResourceProvider removed in v2

https://developer.hashicorp.com/terraform/plugin/sdkv2/guides/v2-upgrade-guide#removal-of-the-terraform-resourceprovider-interface

* go/storage@v1.27.0 was retracted

go: warning: cloud.google.com/go/storage@v1.27.0: retracted by module author: due to googleapis/google-cloud-go#6857
go: to switch to the latest unretracted version, run:
	go get cloud.google.com/go/storage@latest

* update plugin/v2 deps and manually merge require sections

open bug about require sections: golang/go#56471

* publish image needs fields for computed resources

* set default for file mode. ensure proper string-int conversion

the lxd client api works in ints, but mode is a string for our users

* use set-specific attribute test function

* simplify provider tests
roboquat pushed a commit to gitpod-io/gitpod that referenced this issue Apr 17, 2023
* Update k8s dependencies to v0.26.2

* Update controller-runtime to v0.14.6

* Update cloud storage
 googleapis/google-cloud-go#6857

* Update copy options

* Update wolfi image

* Remove controller-runtime replace directives

* Fix integration tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the Cloud Storage API. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants