Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Codec creation: getting the shape of a chunk? #481

Open
csubich opened this issue Oct 13, 2023 · 2 comments
Open

Codec creation: getting the shape of a chunk? #481

csubich opened this issue Oct 13, 2023 · 2 comments

Comments

@csubich
Copy link

csubich commented Oct 13, 2023

Is there any way for a codec (as applied to encoding/decoding within zarr) to be reliably provided the shape of the chunk it is decoding?

My use-case here is to write a codecs that apply dynamic scaling and quantization (based on planes of a 3+-dimension array, normalizing by local min/max within a chunk) and/or two-dimensional linear prediction (extending numcodecs.Delta, essentially).

When calling Codec.encode() this is not a problem; the buffer supplied is a full array-like unless an earlier filter stage has done something. However, on decoding the codec is only reliably supplied a byte-stream without shape information. The out parameter to decode() seems to be inconsistently supplied.

Obviously, Zarr knows what shape of chunk it is seeking to fill. Without that information, I'll have to encode the array shape information in the output datastream. That's unnecessary redundancy, and more importantly it is aesthetically displeasing.

@martindurant
Copy link
Member

I previously have pushed for the concept of "context", which would be passed by zarr to both the codec's encode/decode methods and to the storage layer, specifying where in the array we are, the shape, key, ... and other useful pieces of information that are available at call time. Currently, the context (zarr.context.Context) only has meta_array: NDArrayLike, I see no reason not to populate it further.

@csubich
Copy link
Author

csubich commented Oct 18, 2023

Context of the chunk within the larger super-array would also be interesting, since it could allow some special-case encoders that apply data transforms along the way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants