Skip to content

scalableminds/zarrita

Repository files navigation

Zarrita

Zarrita is an experimental implementation of Zarr v3 including sharding. This is only a technical proof of concept meant for generating sample datasets. Not recommended for production use.

Setup

import zarrita
import numpy as np

store = zarrita.LocalStore('testoutput') # or zarrita.RemoteStore('s3://bucket/test')

testdata = np.arange(0, 16 * 16, dtype='int32').reshape((16, 16))

Create an array

a = zarrita.Array.create(
    store / 'array',
    shape=(16, 16),
    dtype='int32',
    chunk_shape=(2, 8),
    codecs=[
        zarrita.codecs.bytes_codec(),
        zarrita.codecs.blosc_codec(typesize=4),
    ],
    attributes={'question': 'life', 'answer': 42}
)
a[:, :] = testdata

Open an array

a = zarrita.Array.open(store / 'array')
assert np.array_equal(a[:, :], testdata)

Create an array with sharding

a = zarrita.Array.create(
    store / 'sharding',
    shape=(16, 16),
    dtype='int32',
    chunk_shape=(16, 16),
    chunk_key_encoding=('v2', '.'),
    codecs=[
        zarrita.codecs.sharding_codec(
            chunk_shape=(8, 8),
            codecs=[
                zarrita.codecs.bytes_codec(),
                zarrita.codecs.blosc_codec(typesize=4),
            ]
        ),
    ],
)
a[:, :] = testdata
assert np.array_equal(a[:, :], testdata)

Create a group

g = zarrita.Group.create(store / 'group')
g2 = g.create_group('group2')
a = g2.create_array(
    'array',
    shape=(16, 16),
    dtype='int32',
    chunk_shape=(16, 16),
)
a[:, :] = testdata

Open a group

g = zarrita.Group.open(store / 'group')
g2 = g['group2']
a = g['group2']['array']
assert np.array_equal(a[:, :], testdata)

Resize array

a.resize((10, 10))

Update attributes

a.update_attributes({'question': 'life', 'answer': 0})

Zarr v2

a = zarrita.ArrayV2.create(
    store / 'array',
    shape=(16, 16),
    dtype='int32',
    chunks=(2, 8),
)
a[:, :] = testdata

a3 = a.convert_to_v3()
assert a3.metadata.shape == a.shape

Async methods

a = await zarrita.Array.create_async(
    store / 'array_async',
    shape=(16, 16),
    dtype='int32',
    chunk_shape=(2, 8),
)
await a.async_[:, :].set(testdata)
assert np.array_equal(await a.async_[:, :].get(), testdata)

Credits

This is a largely-rewritten fork of zarrita by @alimanfoo. It implements the Zarr v3 draft specification created by @alimanfoo, @jstriebel, @jbms et al.

Licensed under MIT