New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Are we looking for S3 integration where the input file could be anything (xml/csv) and output to be dict #17
Comments
I don’t understand what you need to do that is not actually possible to do. What do you mean? Could you show me an example of what you would like to do? |
Suppose we have to use below method: This method takes url as first input, what if the url belongs to Amazon S3 or compatible S3 location for a hosted file. I am checking if this kind of feature already there or work going on ? |
Ok, it's clear. If you do it feel free to submit a PR. |
Yes, I am using |
I think you must just change the |
@uknow2009 could you show me your boto3 implementation to download the file? |
Using boto3 as below to download: import boto3
from boto3.s3.transfer import TransferConfig
S3_HOST = "s3host"
ACCESS_KEY = "accesskey"
ACCESS_SECRET = "accesssecret"
bucket = "bucket"
obj_key = "objectkey"
download_file_path = "download_file_path"
config = TransferConfig(multipart_threshold=1024 * 25, max_concurrency=10,
multipart_chunksize=1024*25, use_threads=True)
try:
s3_conn = boto3.resource('s3', endpoint_url=S3_HOST,
aws_access_key_id=ACCESS_KEY,
aws_secret_access_key=ACCESS_SECRET)
s3_conn.meta.client.download_file(bucket, obj_key, download_file_path, Config=config)
print("Download successfull")
except Exception as e:
print("Unable to download file", str(e)) |
@uknow2009 I think that wrapping all this code with a single method that requires 8 arguments to work would be not a good idea... what do you think about it? |
@fabiocaccamo Yes, passing all arguments doesn't seem good, but what can be done is passing some config from our own defined s3 config file and passing only limited arguments like host, bucket and object key via method params. Download file path can be set to process root directory or tmp location as that will be used as intermediate step and later used by |
@uknow2009 this library could be the solution: https://github.com/dask/s3fs |
@fabiocaccamo yes, as per documentation |
@tasneem-hyder We do a lot of this with benedict and s3fs (Thanks @fabiocaccamo BTW). Below is a code snippet that shows how we do it. We thought about opening a pull request but haven't because boto3 is a big library that would make benedict a much heavier package. But we're happy to do so if this would be useful.
|
And, for completeness, here is some code that writes XML to S3
|
@hudgeon thanks for pointing out at your implementation, it looks pretty easy to add s3 support using I agree, the What do you think about this solution? As the library is getting bigger this approach could be also extended to the different I/O formats, in this way it would be possible to keep the library more lightweight as possible (#18). |
@fabiocaccamo I reckon that'd be pretty useful to a lot of people. Are you thinking that if the first argument of the IO Method starts with s3:// then it uses s3fs as the 'filesystem' perhaps with read_s3_file and write_s3_file functions in here: https://github.com/fabiocaccamo/python-benedict/blob/eb950c57c0a17c58dfc247fd85a6e3baffb77e5f/benedict/dicts/io/io_util.py |
That's exactly what I would do! If the
|
:) Nice that we're on the same page. How would you like to proceed with this work and how can we assist? |
@fabiocaccamo FYI, I've just taken a look at s3fs and it actually doesn't use boto3. It uses https://github.com/aio-libs/aiobotocore and https://github.com/boto/botocore. |
@hudgeon could you submit a PR including tests? |
@fabiocaccamo No worries. We should be able to get to it this weekend. |
@fabiocaccamo We've also implemented it and have been using it from our private pip repository for several weeks now - but didn't push it to your branch because we haven't written tests for it. :) In order to do the tests properly, we reckon we'll need to use localstack (https://github.com/localstack/localstack) which we haven't done before. Currently our S3 tests use an S3 bucket we own. Can you think of an approach to testing that wouldn't involve using localstack? |
@hudgeon I never did tests with an S3 bucket, probably adding Do you think we can avoid Also checking how |
@tasneem-hyder @hudgeon |
Input : S3 location files (all supported file format of benedict)
Should support all operations there after
The text was updated successfully, but these errors were encountered: