Skip to content
Felföldi Zsolt edited this page Sep 28, 2017 · 2 revisions

BloomBits design rationale

The BloomBits data structure optimizes log searching by doing a bitwise transformation that makes it cheaper to retrieve bloom filter data relevant to a specific filter. When searching in a long section of the block history, we are checking three specific bits of each bloom filter per address/topic. In order to do that, currently we read/retrieve a cca. 550 byte block header per filtered block. The implemented structure optimizes this by a "bitwise 90 degree rotation" of the bloom filters. Blocks are grouped into fixed length sections (section size for the LES BloomBits Trie is 32768 blocks), BloomBits[bitIdx][sectionIdx] is a 32768 bit (4096 byte) long bit vector that contains a single bit of each bloom filter from the block range [sectionIdx*SectionSize ... (sectionIdx+1)*SectionSize-1]. (Since bloom filters are usually sparse, a simple data compression makes this structure even more efficient, especially for ODR retrieval.) By reading and binary AND-ing three BloomBits sections, we can filter for an address/topic in 32768 blocks at once ("1" bits in the binary AND result mean bloom matches).

Data compression

The compression algorithm is optimized for sparse input data which contains a lot of zero bytes. Decompression requires knowledge of the decompressed data length.

The algorithm can be described with this pseudocode:

if data only contains zeroes,
    CompressBytes(data) == nil
otherwise if len(data) <= 1,
    CompressBytes(data) == data
otherwise:
    CompressBytes(data) == append(CompressBytes(nonZeroBitset(data)), nonZeroBytes(data)...)
    where
      nonZeroBitset(data) is a bit vector with len(data) bits (MSB first):
          nonZeroBitset(data)[i/8] && (1 << (7-i%8)) != 0  if data[i] != 0
          len(nonZeroBitset(data)) == (len(data)+7)/8
      nonZeroBytes(data) contains the non-zero bytes of data in the same order

This piece of Go code can be considered a reference implementation.

BloomBits Trie

In order to make this data structure retrievable on-demand for the light client, we put the generated vectors in a trie. Parts of this trie can be retrieved with the GetPPTProofs message. Currently the trie root is part of the trusted syncing checkpoint but trustless validation of the BloomBits trie is part of the development plans. The trie consists of the compressed bit vectors as values stored at keys constructed from the the bloom bit index encoded as a 2-byte big endian, followed by the section index encoded as an 8-byte big endian. Since all-zero bit vectors have a zero length when compressed, these vectors are not added to the trie at all.

BloomBits tries are generated for each new section of transformed bloom filter data by adding the vectors belonging to the latest section index to the previous trie.

Clone this wiki locally