-
Notifications
You must be signed in to change notification settings - Fork 267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ambiguous S3FileSystem.isfile behavior when an object exists with a "/" suffix #800
Comments
I agree that isfile() should return True in this case. @ianthomas23 , yet another edge... |
If you spend your whole life using Supporting pre-existing |
I already "fixed" my pipeline by deleting all fake directory objects. I'm now investigating how they were created in the first place. I couldn't find any code doing something like this in my system. All we use is Pandas + Pyarrow + S3fs. Maybe a combination of version of those is creating the directory objects, don't know. I will investigate more. @ianthomas23 Feel free to close the issue if you find it not worth it. |
The main place I've seen these "placeholders" created is in the AWS S3 console, with the "create directory" button. |
In my case, I had an object for every directory in my partitioned parquet. The dataset itself had an object as well. So this makes me think that was done programmatically in some way. |
I was debugging a problem with my pipeline, and I've reduced to the following code snippet:
This is failing because pyarrow uses
S3FileSystem.find("my-bucket/dataset", withdirs=True, detail=True)
method to list all files in a partitioned parquet dataset, and s3fs is listings3://my-bucket/dataset/
as a file and a directory, whileS3FileSystem.isfile("s3://my-bucket/dataset/")
is returning false. So I think there is an ambiguity happening here, as S3 doesn't have the concept of a directory.Another example of the problem:
The text was updated successfully, but these errors were encountered: