Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File gets cached as folder when s3-compatible storage does not receive correct region #329

Open
georgeboot opened this issue May 11, 2023 · 1 comment
Labels
bug Something isn't working

Comments

@georgeboot
Copy link
Contributor

georgeboot commented May 11, 2023

Given the following code:

from cloudpathlib import CloudPath, S3Client

%env AWS_ACCESS_KEY_ID=SCWxxxxxxxxxxxx
%env AWS_SECRET_ACCESS_KEY=xxxxxxxxxxxx

s3_client = S3Client(endpoint_url='https://s3.fr-par.scw.cloud')
bucket = CloudPath('s3://my-scw-bucket-name', client=s3_client)
file = bucket.joinpath('folder/file.txt')
file.read_text(encoding="utf-8") # error!

This will result in the following error:

IsADirectoryError: [Errno 21] Is a directory: '/tmp/tmpcwvt0_jn/my-scw-bucket-name/folder/file.txt'

However, when we also add the region in the mix (by adding %env AWS_DEFAULT_REGION=fr-par to the header), it works.

The region should probably always be set for s3-compatible storage, but this error is very misleading. Any ways to make this easier to debug?

To be fair, when running the above code with Cloudflare R2, it spits out an 'unsupported region' error on the first operation. Somehow, Scaleway's S3 storage seems to accept some requests with an incorrect region.

@jayqi jayqi added the bug Something isn't working label May 13, 2023
@pjbull
Copy link
Member

pjbull commented May 25, 2023

What's the full stack trace on that error?

Also, can you check what happens on file.is_dir() and file.exists()? Curious if there's something not right in these assumptions for that provider with no region set:

def _s3_file_query(self, cloud_path: S3Path):
"""Boto3 query used for quick checks of existence and if path is file/dir"""
# check if this is an object that we can access directly
try:
# head_object accepts all download extra args (note: Object.load does not accept extra args so we do not use it for this check)
self.client.head_object(
Bucket=cloud_path.bucket,
Key=cloud_path.key.rstrip("/"),
**self.boto3_dl_extra_args,
)
return "file"
# else, confirm it is a dir by filtering to the first item under the prefix plus a "/"
except (ClientError, self.client.exceptions.NoSuchKey):
key = cloud_path.key.rstrip("/") + "/"
return next(
(
"dir" # always a dir if we find anything with this query
for obj in (
self.s3.Bucket(cloud_path.bucket)
.objects.filter(Prefix=key, **self.boto3_list_extra_args)
.limit(1)
)
),
None,
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants