Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential data corruption issue for PythonSDK retry on binary data upload #203

Open
jasonyin opened this issue Dec 24, 2019 · 1 comment

Comments

@jasonyin
Copy link
Contributor

jasonyin commented Dec 24, 2019

Update: this issue is now resolved in versions 2.9.0 and later.

If you are using ObjectStorageClient.put_object with retries enabled, or are using UploadManager.upload_file, you may be impacted.

Assuming you are uploading a 350MB file and, during upload, the connection is dropped such that the client must retry 2 times, with the first retry being after 50MB was uploaded and the second retry being after 200MB was uploaded, the end result would be that only the last 100MB of the file are stored by the service.

We are actively working on a fix for this issue. In the mean time, here are some workarounds which will allow you to avoid the issue:

For ObjectStorageClient.put_object, by default you would not be affected by this issue, because retries are not configured by default in the Python SDK (except on UploadManager). That said, if you are using ObjectStorageClient.put_object with retries enabled, you can use any of the following workarounds to avoid the issue:

  • Disable retries on ObjectStorageClient.put_object
  • Explicitly set the content_length parameter of ObjectStorageClient.put_object to the file size of the file to upload
  • Explicitly set the content_md5 parameter of ObjectStorageClient.put_object to the base-64 encoded md5 hash of the file to upload

For UploadManager.upload_file, by default this operation is impacted by this issue, and a workaround is required to avoid the problem:

This issue can also impact other operations in the Python SDK which send binary bodies. However, by default you would not be affected by this issue in any of those operations, because retries are not configured by default in the Python SDK (except on UploadManager). That said, even if you have retries configured for any of these operations, in most cases you will get a service error (rather than a silent failure like with Object Storage) because the service will notice the data corruption. However, to be most safe, in any operation which sends a binary body it is best to explicitly disable retries until this issue is resolved.

We apologize for any inconvenience this issue may cause you. We are working to promptly resolve this issue. Please stay tuned and watch this GitHub issue for updates.

@jasonyin jasonyin added the SDK Issue pertains to the SDK itself and not specific to any service label Dec 24, 2019
@jodoglevy jodoglevy pinned this issue Jan 7, 2020
@jodoglevy
Copy link
Member

Update: this issue is now resolved in versions 2.9.0 and later.

Please update to version 2.9.0, or a newer version, to avoid this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants