Skip to content

Latest commit

 

History

History
97 lines (75 loc) · 3.32 KB

README_CHUNK_FORMAT.rst

File metadata and controls

97 lines (75 loc) · 3.32 KB

Blosc Chunk Format

The chunk is composed by a header and a blocks / splits section:

+---------+--------+---------+
|  header | blocks / splits  |
+---------+--------+---------+

These are described below.

The header section

Blosc (as of Version 1.0.0) has the following 16 byte header that stores information about the compressed buffer:

|-0-|-1-|-2-|-3-|-4-|-5-|-6-|-7-|-8-|-9-|-A-|-B-|-C-|-D-|-E-|-F-|
  ^   ^   ^   ^ |     nbytes    |   blocksize   |    cbytes     |
  |   |   |   |
  |   |   |   +--typesize
  |   |   +------flags
  |   +----------versionlz
  +--------------version

Datatypes of the header entries

All entries are little endian.

version

(uint8) Blosc format version.

versionlz

(uint8) Version of the internal compressor used.

flags and compressor enumeration

(bitfield) The flags of the buffer

bit 0 (0x01)

Whether the byte-shuffle filter has been applied or not.

bit 1 (0x02)

Whether the internal buffer is a pure memcpy or not.

bit 2 (0x04)

Whether the bit-shuffle filter has been applied or not.

bit 3 (0x08)

Reserved, must be zero.

bit 4 (0x10)

If set, the blocks will not be split in sub-blocks during compression.

bit 5 (0x20)

Part of the enumeration for compressors.

bit 6 (0x40)

Part of the enumeration for compressors.

bit 7 (0x80)

Part of the enumeration for compressors.

The last three bits form an enumeration that allows to use alternative compressors.

0

blosclz

1

lz4 or lz4hc

2

snappy

3

zlib

4

zstd

typesize

(uint8) Number of bytes for the atomic type.

nbytes

(uint32) Uncompressed size of the buffer (this header is not included).

blocksize

(uint32) Size of internal blocks.

cbytes

(uint32) Compressed size of the buffer (including this header).

The blocks / splits section

After the header, there come the blocks / splits section. Blocks are equal-sized parts of the chunk, except for the last block that can be shorter or equal than the rest.

At the beginning of the blocks section, there come a list of int32_t bstarts to indicate where the different encoded blocks starts (counting from the end of this bstarts section):

+=========+=========+========+=========+
| bstart0 | bstart1 |   ...  | bstartN |
+=========+=========+========+=========+

Finally, it comes the actual list of compressed blocks / splits data streams. It turns out that a block may optionally (see bit 4 in flags above) be further split in so-called splits which are the actual data streams that are transmitted to codecs for compression. If a block is not split, then the split is equivalent to a whole block. Before each split in the list, there is the compressed size of it, expressed as an `int32_t`:

+========+========+========+========+========+========+========+
| csize0 | split0 | csize1 | split1 |   ...  | csizeN | splitN |
+========+========+========+========+========+========+========+

Note: all the integers are stored in little endian.