Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting Zarr-Python 3 Codec API #502

Open
jhamman opened this issue Jan 24, 2024 · 6 comments · May be fixed by #524
Open

Supporting Zarr-Python 3 Codec API #502

jhamman opened this issue Jan 24, 2024 · 6 comments · May be fixed by #524

Comments

@jhamman
Copy link
Member

jhamman commented Jan 24, 2024

Over in Zarr-Python, we are working on a a new major version (v3). This version will have a slightly new Codec API and will expose a new set of codec types (ArrayArrayCodec, ArrayBytesCodec, BytesBytesCodec, etc.). These codec classes are not wildly different from the existing Numcodecs API (perhaps except for the partial encode/decode options) but they are not in perfect alignment. With this in mind, some questions for discussion:

  1. Can Numcodecs conform to the Zarr-Python API?
    • I don't think this has to be seen as a breaking change but if it is, how do we weight the potential costs and benefits?
  2. Would Numcodecs be able to register codecs via the Zarr-Python entrypoint mechanism? (e.g. Extensible codecs for V3 zarr-python#1588)
@jhamman
Copy link
Member Author

jhamman commented Jan 24, 2024

cc @zarr-developers/python-core-devs

@jni
Copy link

jni commented Feb 6, 2024

@jhamman the link to the new codec API is 404...

@normanrz
Copy link

normanrz commented Feb 6, 2024

I fixed the OP.

@jhamman
Copy link
Member Author

jhamman commented Feb 28, 2024

We discussed this in the Zarr-Python refactor meeting today. The outstanding task here is to experiment with Zarr v3 codec API by exposing this library's compression codecs and pre-compression filters through the BytesBytesCodec and ArrayArrayCodec interfaces. If that can be done effectively, these can be registered through the entrypoint mechanism described above.

This would be a good project for someone interested in getting involved in zarr-python 3's development.

@martindurant
Copy link
Member

I would add here, that I think having some fallback support for numcodecs (+ other packages that make codecs following its API) is important to maintain readability of datasets in v3. We need to decide whether we can assume they work on bytes - which is by far the most common case - or otherwise can tell from the signature (or try/except) if they accept/produce arrays. That doesn't seem to hard.

Question: codecs are of course CPU-bound, and will be run in threads, hoping that the GIL is released. The to_thread call lives in zarr-python?

If all this is true, I don't see any reason to rewrite any codecs for v3, except where we wish to state the bytes Vs array nature of a codec.

@normanrz
Copy link

normanrz commented May 8, 2024

There is now a PR that adds the numcodecs.zarr3 module which contains Zarr v3 wrappers for the numcodecs codecs: #524

@normanrz normanrz linked a pull request May 8, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants