Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow writes of Zarr files using S3Map #820

Open
scottyhq opened this issue Nov 6, 2023 · 4 comments
Open

Slow writes of Zarr files using S3Map #820

scottyhq opened this issue Nov 6, 2023 · 4 comments

Comments

@scottyhq
Copy link

scottyhq commented Nov 6, 2023

I'm noticing an order of magnitude difference in writing a zarr file to an S3 bucket using xarray.to_zarr(store=S3Map) (~10s) versus s3.put(local_store, remote_store) (~1s)

The data is public, simple demonstration here
https://gist.github.com/scottyhq/effa642f00112971e2350d921a0aed9d

I notice the s3fs logs via Xarray have a lot of head_object and get_object calls compared to only put_object if using s3.put with a local copy.

I realize this may be considered an Xarray or Zarr Issue. However, I'm interested in interpreting this s3fs log and whether there are non-default settings to s3fs.S3FileSystem() or s3fs.S3Map() that are most appropriate for writing.

Truncated log:

22:47:12.167 DEBUG Get directory listing page for nasa-cryo-scratch/scottyhq/zarr_from_xarray.zarr
22:47:12.215 DEBUG Get directory listing page for nasa-cryo-scratch/scottyhq/zarr_from_xarray.zarr/.zgroup
22:47:12.229 DEBUG CALL: list_objects_v2 - () - {'MaxKeys': 1, 'Bucket': 'nasa-cryo-scratch'}
22:47:12.243 DEBUG CALL: put_object - () - {'Bucket': 'nasa-cryo-scratch', 'Key': 'scottyhq/zarr_from_xarray.zarr/.zgroup'}
22:47:12.272 DEBUG CALL: head_object - ({},) - {'Bucket': 'nasa-cryo-scratch', 'Key': 'scottyhq/zarr_from_xarray.zarr/.zarray'}
22:47:12.293 DEBUG Client error (maybe retryable): An error occurred (404) when calling the HeadObject operation: Not Found
22:47:12.293 DEBUG CALL: list_objects_v2 - ({},) - {'Bucket': 'nasa-cryo-scratch', 'Prefix': 'scottyhq/zarr_from_xarray.zarr/.zarray/', 'Delimiter': '/', 'MaxKeys': 1}
22:47:12.330 DEBUG CALL: get_object - () - {'Bucket': 'nasa-cryo-scratch', 'Key': 'scottyhq/zarr_from_xarray.zarr/.zgroup'}
22:47:12.352 DEBUG CALL: head_object - ({},) - {'Bucket': 'nasa-cryo-scratch', 'Key': 'scottyhq/zarr_from_xarray.zarr/longitude/.zarray'}
22:47:12.364 DEBUG Client error (maybe retryable): An error occurred (404) when calling the HeadObject operation: Not Found
22:47:12.364 DEBUG CALL: list_objects_v2 - ({},) - {'Bucket': 'nasa-cryo-scratch', 'Prefix': 'scottyhq/zarr_from_xarray.zarr/longitude/.zarray/', 'Delimiter': '/', 'MaxKeys': 1}

versions

fsspec: 2023.10.0
xarray: 2023.10.1
zarr  : 2.16.1
s3fs  : 2023.10.0
@martindurant
Copy link
Member

At a glance, it seems the code is doing

  • is this path a directory?
  • is this a group?
  • is this an array?

Clearly, when the intent is to overwrite, these are unnecessary, and some seem to query the path just written. The only thing you might want to do is rm(recursive), if you know you are writing a whole dataset from scratch.

s3fs can probably not do anything about this; whils you can prospectively pre-list a directory tree, because there are write ops in there too, the filecache will be regularly invalidated. Perhaps something clever could be done for updating an already-cached listing.

I am not immediately sure if xarray or zarr is making the calls. It would be interesting to see what zarr alone does. I tried the following:

In [19]: c = zarr.open_array(m, shape=(10, 10), chunks=(5, 5), mode="w", dtype="int64")
2023-11-06 21:13:32,243 - s3fs - DEBUG - _lsdir -- Get directory listing page for mymdtemp/zarr
2023-11-06 21:13:32,538 - s3fs - DEBUG - _lsdir -- Get directory listing page for mymdtemp/zarr/.zarray
2023-11-06 21:13:32,600 - s3fs - DEBUG - _call_s3 -- CALL: put_object - () - {'Bucket': 'mymdtemp', 'Key': 'zarr/.zarray'}
2023-11-06 21:13:32,720 - s3fs - DEBUG - _call_s3 -- CALL: get_object - () - {'Bucket': 'mymdtemp', 'Key': 'zarr/.zarray'}

and probably we see the whole behaviour here.

It's also worth noting that zarr is processing each variable/array sequentially, whereas these tiny metadata files can all be sent concurrently, as put() does. zarr has no operations that are concurrent across variables, only for get/putting multiple data chunks (even though s3fs supports it).

Maybe fsspec could provide a way to combine "caching" and "transactions" so that the effect of local-write->put() is conveniently achieved.

@scottyhq
Copy link
Author

scottyhq commented Nov 8, 2023

Clearly, when the intent is to overwrite, these are unnecessary, and some seem to query the path just written. The only thing you might want to do is rm(recursive)

Thanks for the detailed response @martindurant. The same requests are happening from Xarray even with an empty bucket to begin with.

probably we see the whole behaviour here.

Yes! Your minimal example captures the same log outputs as xr.to_zarr(), so I think we can focus on the interaction of Zarr and S3FS. I'm not sure why 'list', 'get', 'head' are necessary for the mode='w' case? Shouldn't there just be 'puts'? Would this require a change in s3fs or Zarr?

zarr is processing each variable/array sequentially, whereas these tiny metadata files can all be sent concurrently, as put() does... Maybe fsspec could provide a way to combine "caching" and "transactions" so that the effect of local-write->put() is conveniently achieved.

Interesting, thanks for clarifying! A streamlined local-write->async puts pattern could indeed be very useful. Is it possible to do this directly from the in-memory dataset with current fsspec functionality? I'm not sure how to utilize "transactions" to achieve this, but would be happy to test something out.

@martindurant
Copy link
Member

I tried last night to write some code around simplecache to be general for any backend, but I don't think it's doing the right thing - partly because zarr still wants to list and read from the target, where the new files don't yet exist.

It might be better to write better Transaction class(es) that know what to do with filesystems that have bulk put/commit operations (cc fsspec/filesystem_spec#1416 ), but they need to be read/write.

It's also worth pointing out that zarr already supports separating the matadata mapper from the data mapper, so one could write a specialised deferred write thing that wraps FSStore.

@martindurant
Copy link
Member

fsspec/filesystem_spec#1434 I think goes some way to resolving the problem: it caches writes during a transaction, putting them to remote in one concurrent operation, and rereading from local in the meantime. It does not address directory listing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants