Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow tracking upload progress. #27

Open
bachirelkhoury opened this issue Jun 23, 2019 · 9 comments
Open

Allow tracking upload progress. #27

bachirelkhoury opened this issue Jun 23, 2019 · 9 comments
Assignees
Labels
api: storage Issues related to the googleapis/python-storage API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@bachirelkhoury
Copy link

This is related to googleapis/google-cloud-python#1830 reopening here as this seems to have been closed many years ago.

We are please looking for this feature as we need to monitor large files being uploaded to Google Storage buckets. I am surprised not many people are after this essential feature, which makes me feel we haven't done our research properly or that the solution is very obvious or trivial.

Can someone please share an example of how we could track progress during upload?

Update: Should we be looking at google-resumable-media? will try that out and report back.

Thanks

@tseaver
Copy link
Contributor

tseaver commented Jun 24, 2019

@frankyn Please help prioritize this feature. See the discussion in googleapis/google-cloud-python#1830 / googleapis/google-cloud-python#1077 for the tradeoffs involved.

@tseaver tseaver changed the title Upload progress for google storage API Storage: Allow tracking upload progress. Jun 24, 2019
@mf2199 mf2199 removed their assignment Sep 2, 2019
@crwilcox crwilcox transferred this issue from googleapis/google-cloud-python Jan 31, 2020
@product-auto-label product-auto-label bot added the api: storage Issues related to the googleapis/python-storage API. label Jan 31, 2020
@yoshi-automation yoshi-automation added 🚨 This issue needs some love. triage me I really want to be triaged. labels Feb 3, 2020
@frankyn frankyn added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. and removed 🚨 This issue needs some love. triage me I really want to be triaged. labels Feb 4, 2020
@pdex
Copy link

pdex commented Jun 30, 2020

Here's a workaround using tqdm.wrapattr:

import os
from google.cloud import storage

def upload_blob(client, bucket_name, source, dest, content_type=None):
  bucket = client.bucket(bucket_name)
  blob = bucket.blob(dest)
  with open(source, "rb") as in_file:
    total_bytes = os.fstat(in_file.fileno()).st_size
    with tqdm.wrapattr(in_file, "read", total=total_bytes, miniters=1, desc="upload to %s" % bucket_name) as file_obj:
      blob.upload_from_file(
        file_obj,
        content_type=content_type,
        size=total_bytes,
      )
      return blob

if __name__ == "__main__":
  upload_blob(storage.Client(), "bucket", "/etc/motd", "/path/to/blob.txt", "text/plain")

@zLupa
Copy link

zLupa commented Oct 7, 2020

One year since this has opened, any updates?

@Shreeyak
Copy link

Shreeyak commented Oct 9, 2020

This is an essential feature for large file uploads/downloads. I resorted to using gsutil via subprocess call just for the download progress bar.

@kamal94
Copy link

kamal94 commented Apr 7, 2021

To add to @pdex 's submission:
I am generating upload URLs via blob.generate_signed_urls and passing it to my application's client to upload a user-generated file. Here is what worked for me:

object_address = str(uuid.uuid4())
upload_url, upload_method = get_upload_url(object_address) # fetches signed upload URL 
size = os.path.getsize(filename)
with open(filename, "rb") as in_file:
    total_bytes = os.fstat(in_file.fileno()).st_size
    with tqdm.wrapattr(
        in_file,
        "read",
        total=total_bytes,
        miniters=1,
        desc="Uploading to my bucket",
    ) as file_obj:
        response = requests.request(
            method=upload_method,
            url=upload_url,
            data=file_obj,
            headers={"Content-Type": "application/octet-stream"},
        )
        response.raise_for_status()

return object_address, size

@Mohab25
Copy link

Mohab25 commented Jul 12, 2022

@frankyn any updates on this, been going from 2019.

@frankyn
Copy link
Member

frankyn commented Jul 12, 2022

Thanks for the ping. @andrewsg this has +17 upvotes could you please take a look when you have a moment?

@andrewsg andrewsg self-assigned this Jul 12, 2022
@andrewsg
Copy link
Contributor

We have some long-term plans around async code and transport mechanisms that may make fully integrated support for a progress meter feasible in the future, but until then, there are two main options: chunk media operations and report status in between chunks, or use a file object wrapper that tracks how much data is written or read.

As it happens, large uploads are already chunked by default using the resumable upload API. However, upload functions in the Python client library are agnostic as to the upload strategy and so we can't easily add callback functionality to upload functions in a way that will work for all uploads - they would only work for resumable uploads, and communicating that to the user would be awkward. At any rate, they will only report completed chunks, so they're inferior to the file object wrapper method.

I'll look into implementing a good first-party turnkey solution for the file object wrapper strategy. Until then, I recommend use of the tqdm attribute wrapper as show in the comments above.

@nom
Copy link

nom commented Mar 14, 2024

+1 to this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the googleapis/python-storage API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests