New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upload blob from HTTPResponse #28
Comments
Update: wrapping the HTTP response in a class that trivially implements from google.cloud import storage
client = storage.Client()
bucket = client.bucket('my-bucket')
blob = bucket.blob('my-file.csv', chunk_size=1 << 20)
import urllib.request
a_few_megs_of_data = 'https://baseballsavant.mlb.com/statcast_search/csv?all=true&batter_stands=&game_date_gt=2018-09-06&game_date_lt=2018-09-09&group_by=name&hfAB=&hfBBL=&hfBBT=&hfC=&hfFlag=&hfGT=R%7CPO%7CS%7C&hfInn=&hfNewZones=&hfOuts=&hfPR=&hfPT=&hfRO=&hfSA=&hfSea=2018%7C&hfSit=&hfZ=&home_road=&metric_1=&min_abs=0&min_pitches=0&min_results=0&opponent=&pitcher_throws=&player_event_sort=h_launch_speed&player_type=batter&position=&sort_col=pitches&sort_order=desc&stadium=&team=&type=details'
response = urllib.request.urlopen(a_few_megs_of_data)
class HTTPResponseWithTell(object):
def __init__(self, http_response):
self.http_response = http_response
self.number_of_bytes_read = 0
def tell(self):
return self.number_of_bytes_read
def read(self, *args, **kwargs):
buffer = self.http_response.read(*args, **kwargs)
self.number_of_bytes_read += len(buffer)
return buffer
response_with_tell = HTTPResponseWithTell(response)
blob.upload_from_file(response_with_tell) This reads the response 1 MB at a time and uploads it to cloud storage without ever storing the whole thing in memory. However, after reading through the code and understanding |
Thanks for providing this feedback. It seems we would need to alter the inner workings to not depend on being able to reverse through the stream. This is supported, to my knowledge, in our node client so it isn't an unreasonable ask for Python. Thanks for the feedback! |
This would definitely be super helpful for our team as well! |
I'm trying to use
Blob.upload_from_file
to upload anhttp.client.HTTPResponse
object without saving it to disk first. It seems like this, or a version of this that wraps theHTTPResponse
in anio
object, should be possible.However, because the response may be larger than
_MAX_MULTIPART_SIZE
,Blob.upload_from_file
creates a resumable upload, which depends on tell to make sure the stream is at the beginning. Here is the code that reproduces this issue:Traceback:
Is it possible to read an HTTP response in chunks and write it to the blob without using the filesystem as an intermediary, or is this bad practice? If it is possible and not discouraged, what is the recommended way to do this?
The text was updated successfully, but these errors were encountered: