Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can VLenArray support 2D arrays #199

Open
sofroniewn opened this issue Sep 6, 2019 · 8 comments · May be fixed by #200
Open

can VLenArray support 2D arrays #199

sofroniewn opened this issue Sep 6, 2019 · 8 comments · May be fixed by #200

Comments

@sofroniewn
Copy link

Right now I think the VLenArray only supports 1-D arrays. What would it take to extend support to 2-D arrays too? I have a list of 2D numpy arrays that I'd like to save as a zarr file using something like

zarr.array( 'foo', foo, dtype='array:f8')

where foo is my list of 2D numpy arrays (they are all NxD where D is fixed across the list but N is variable), but I currently get an error message that ends with

  File "numcodecs/vlen.pyx", line 382, in numcodecs.vlen.VLenArray.encode
ValueError: only 1-dimensional arrays are supported

If this sounds like a bad idea I can think of workarounds where I reshape flatten the arrays ahead of time and then keep track of that D number and reshape them on reading, but I thought I'd ask! Thanks!!

@alimanfoo
Copy link
Member

alimanfoo commented Sep 6, 2019

Hi @sofroniewn, I haven't thought about this deeply but I imagine the codec could be modified to be aware of the expected number of dimensions in each array, and then to encode the length of all dimensions. Currently the encode method encodes the data by interleaving the array lengths and the array buffers into a single contiguous buffer. So 2D arrays you would have to store 2 ints, then data, then 2 ints, then data, etc.

That would be relatively straightforward if all the arrays are 2D. It might get a git messier if you had a mix of arrays with different numbers of dimensions.

You may find it easier to flatten the arrays and keep track of array shapes separately and reshape on reading :-)

@alimanfoo
Copy link
Member

You may find it easier to flatten the arrays and keep track of array shapes separately and reshape on reading :-)

Btw don't mean this to sound discouraging, happy to help further if you think it's worth exploring changes to the codec.

@sofroniewn
Copy link
Author

@alimanfoo thanks for thinking about this. If you could do it in the codec with guaranteed 2D arrays then I'd strongly prefer that compared to having to do the flattening and reshaping on my end. It will make my apis much simpler and more consistent - sometimes I just have normal arrays, somethings I have these ragged arrays and the code will look much more similar if the codec can handle it.

One proposal might be to go all the way to the full general case and support a mix of arrays with different numbers of dimensions too, where we interleaved everything into a single continuous buffer when the first number was the number of dimensions, say D, then then next D numbers were there the shapes of each of the D dimensions, and then came the flattened array.

The change to the existing codec for 1-D arrays would be an additional 1 would appear at the beginning of every block. For my all 2-D arrays there would be an additional 2 at the beginning of every block, but this scheme would support the fully general case of mixing.

What are your thoughts? I'm new to the concepts in numcodecs so there might be things I'm not considering with this scheme.

@alimanfoo
Copy link
Member

alimanfoo commented Sep 6, 2019 via email

@sofroniewn
Copy link
Author

A new codec class VLenNDArray with convenience shorthands makes sense. I'm happy to give this a try myself - though as I said I'm new to the codebase, so any additional tips before I get started would be great if that's ok with you.

@alimanfoo
Copy link
Member

alimanfoo commented Sep 6, 2019 via email

@sofroniewn sofroniewn linked a pull request Sep 6, 2019 that will close this issue
8 tasks
@NumesSanguis
Copy link

@alimanfoo Could you take a look at @sofroniewn 's work (#200)? He has been pinging people, but there seems to be no response from the Zarr developers. Which would be a shame for his hard work.

I'm interested in this functionality to store 2-channel audio recordings of varying length per recording.

@alimanfoo
Copy link
Member

Hi @NumesSanguis, sorry for radio silence on this one, I've taken a look at the PR and seems good, added a few small comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants