-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow non-memory zarr
stores in to_zarr
with distributed
#10422
Conversation
… zarr.storage.Store as long as it isn't MemoryStore
Can one of the admins verify this patch? Admins can comment |
Not sure about the etiquette/workflow but maybe I should tag @fjetter since they have already looked at the issue? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @GFleishman!
It looks like the code linter is unhappy -- could you run pre-commit
to handle those? See https://docs.dask.org/en/stable/develop.html#code-formatting for more details
Also could you add a test to make sure we're covering this case (if we're not already)?
zarr
stores in to_zarr
with distributed
@jrbourbeau I think everything is actually working now. The code linter did its thing and I added tests for two new conditions: (1) calling The original case of calling CI is still failing something, but if you look at the error log its not related to anything I added - something to do with dataframes. When I run my tests locally, they work fine. I.e. (bigstream) fleishmang@h07u01:confocal> ipython
Python 3.10.11 | packaged by conda-forge | (main, May 10 2023, 18:58:44) [GCC 11.3.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.8.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import dask.tests.test_distributed as tests
...: from distributed import Client
...: c = Client()
...: tests.test_zarr_distributed_with_explicit_directory_store(c)
...: tests.test_zarr_distributed_with_explicit_memory_store(c)
In [2]: If I understand correctly, if either test were to fail then I would get an AssertionError or a RuntimeError. |
Just checking on this PR. I'm currently under the impression that the changes are sufficient and tested but that CI has some kind of problem. |
Following up on this. This bug prevents using to_zarr and passing in a disk-backed zarr object as the destination with a local cluster? Can this please be merged in? |
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 15 files ± 0 15 suites ±0 3h 23m 21s ⏱️ -21s For more details on these failures, see this check. Results for commit a92d65f. ± Comparison against base commit 7ace31f. ♻️ This comment has been updated with latest results. |
@quasiben maybe this is something your team can help resolve? |
@mrocklin Just checking if there is anything else needed from me to help this out? |
In
dask.array.core.to_zarr
I swapped the check for distributed scheduler + MutableMapping to distributed scheduler + zarr.storage.MemoryStore, which seems to be the only zarr.storage type that is backed by memory. Though I'm unsure about the LMDB store.I also removed the import of MutableMapping since this was the only place in all of
dask.array.core
to reference it.pre-commit run --all-files