Reconsider serializing block data into shares #152

liamsi · 2021-03-31T11:21:32Z

Description

I wanted to bring up everyone's favorite topic again. Namely serialization.

First, I want to note that the way we arrange data into shares is essentially a custom serialzation format. It's a very light-weight and smart one and additionally uses (length delimited) protobuf for the hard parts (which IMO is a good choice).

Despite the theoretical simplicity, we ourselves are not having the best time implementing the interesting parts (correctly). Namely the splitting and merging of shares. This is not optimal because every client that does DA sampling or trying to download particular Txs directly from the p2p network, will have to do the merging part, too.
A custom serialzation format is an additional burden and hurdle for writing clients in other languages (e.g., Rust, python, Javascript, etc). IMO implementing that part should not take longer than a few hours, maybe a day in any language. We now have proof that this takes (much) longer even for those who either invented the format or at least are deeply familiar with it. Of course, we could blame this on go but the reality is that every other language also has its pitfalls and tradeoffs, too.

Proposal

I'm proposing to describe shares in protobuf too. The fields of the share proto message would be the namespace, the startIndex (optional) and the total length of the serlialized message, and finally, the part of the message in that particular share (in bytes) resulting from the splitting operation.
That way the spec really would just need to describe the split and merge algorithms and nothing else. We'd need to be careful that we do the splitting in a way that still results in equally sized protobuf messages.

I know, I've been kinda advocating against going all-in on protobuf for the shares myself. At least in the sense that I wanted to keep out any knowledge of the serialization format used out of the NMT (which IMO should simply take in raw bytes and nothing else). I'm not sure yet if the implication here would be that we'd need to move away from this property.

related: #35

┆Issue is synchronized with this Asana task by Unito

evan-forbes · 2021-04-01T18:11:40Z

That's an interesting idea, and definitely needs more investigation. Provided that we could get exact sized shares, I think it could be a good simplification measure. Although, I'm still not sure how much it would simplify things, as we're already mostly working with raw bytes, and would still have to manually split/merge.

I'm not the most familiar with protobuf, how would we get the namespace before the length delimiter?

We're already using protobuf for evidence, would we first serialize evidence, then create the shares, then serialize the shares?

adlerjohn · 2021-04-01T18:40:34Z

we could blame this on go

Agreed 🦀

the total length of the serlialized message, and finally

The problem with this suggestion is that this length does not repeat each share. So you'd still need at least 2 serialization passes: one to length-delimit the tx/message/whatever, then another to serialize the share struct after splitting.

The issue with using protobuf for this second step is that protobuf doesn't support fixed-length arrays.

liamsi · 2021-04-01T20:29:02Z

we could blame this on go

Agreed 🦀

Good that you bring this up. We should likely try to implement the current scheme in Rust as it is the most interesting language we should target after go anyways. That said, I'd be surprised if the time to finish a fully tested implementation in Rust would be less than it took us to implement this in go (maybe it will actually be faster because we've ironed out a lot of the details now - but it will be the same ballpark).

I'm happy to write up splitting and merging using protobuf messages only and we can try to compare the understandability and ease or difficultness to implement that. But that's for later. Let revisit this issue after the first test-nets.

adlerjohn · 2022-09-19T19:45:59Z

not planned any time soon

liamsi added serialization Serialization definitions disambiguation investigation Investigation required. labels Mar 31, 2021

liamsi added this to To do in Incentivized Testnet via automation Mar 31, 2021

liamsi mentioned this issue Apr 1, 2021

Spec compliant merge shares celestiaorg/celestia-core#261

Merged

3 tasks

liamsi removed this from To do in Incentivized Testnet Dec 8, 2021

adlerjohn closed this as not planned Won't fix, can't repro, duplicate, stale Sep 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reconsider serializing block data into shares #152

Reconsider serializing block data into shares #152

liamsi commented Mar 31, 2021 •

edited by sync-by-unito bot

evan-forbes commented Apr 1, 2021

adlerjohn commented Apr 1, 2021 •

edited

liamsi commented Apr 1, 2021

adlerjohn commented Sep 19, 2022

Reconsider serializing block data into shares #152

Reconsider serializing block data into shares #152

Comments

liamsi commented Mar 31, 2021 • edited by sync-by-unito bot

Description

Proposal

evan-forbes commented Apr 1, 2021

adlerjohn commented Apr 1, 2021 • edited

liamsi commented Apr 1, 2021

adlerjohn commented Sep 19, 2022

liamsi commented Mar 31, 2021 •

edited by sync-by-unito bot

adlerjohn commented Apr 1, 2021 •

edited