-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fsspec v2022.10.0 breaks MultiZarrToZarr
#246
Comments
Can you be more specific about what this sorting error is? The merge offsets function is regularly used by parquet reads, so I really hope it isn't broken! |
@martindurant I had limited time to debug so far, this where I am at |
Oh I see. So sorting the lists before this step should be enough to solve this. |
Do you have time to test whether switching the arg to |
Yes hopefully. Not through with debug yet.
Yes, will post an update here |
Yep, |
My environment:
Ubuntu Jammy, python 3.8
I am creating virtual zarr files from NetCDF4s and unifying them into a single virtual Zarr as follows:
This worked with with
kerchunk
v.0.0.9 andffspec
2022.8.2.However when upgrading
fsspec
to 2022.10.0 I get an error when callingMultiZarrToZarr(...).translate()
:Dove into the code and the culprit seems to be some changes to
ReferenceFileSystem
infsspec
: fsspec/filesystem_spec#1063When
ReferenceFileSystem.cat()
gets called withinMultiZarrToZarr.second_pass()
, the start/end positions of datasets are not preserved correctly, leading to some 0 length data.During debug the erratic starts/ends were introduced by this subroutine due to a sorting error https://github.com/fsspec/filesystem_spec/blob/2022.10.0/fsspec/implementations/reference.py#L337-L344
The text was updated successfully, but these errors were encountered: