Proposal: Reconsider .code / .name fields of BlockEncoder / BlockDecoder #177

Gozala · 2022-04-20T01:38:34Z

I find .code filed and .name (to lesser degree) fields on following interfaces to be troublesome

Lines 4 to 15 in 9bcd7fe

    
           export interface BlockEncoder<Code extends number, T> { 
        
             name: string 
        
             code: Code 
        
             encode(data: T): ByteView<T> 
        
           } 
        
           /** 
        
            * IPLD decoder part of the codec. 
        
            */ 
        
           export interface BlockDecoder<Code extends number, T> { 
        
             code: Code 
        
             decode(bytes: ByteView<T>): T

Problem is that it prevents one from defining codec composition without introducing subtle footgun. For example dag-ucan in theory could be composition of dag-cbor and raw codecs, meaning it could decode block in either cbor or raw encoding and similarly encode node either in cbor or raw representation (depending on UCAN specific nuances).

This double representation is an implementation detail currently hidden under new 0x78c0 multicodec code multiformats/multicodec#264.

Given the arguments in the thread I have considered dropping new code and make an implementation that is UCAN specialized BlockCodec<0x71|0x55> codec. However there are some interesting challenges:

.code could be either 0x71 or 0x55, while type checker would be happy with either option it is misleading because it is common to use that code field when creating cids e.g.:

js-multiformats/src/block.js

Lines 148 to 150 in 9bcd7fe

    
           const bytes = codec.encode(value) 
        
           const hash = await hasher.digest(bytes) 
        
           const cid = CID.create(1, codec.code, hash)

I think this is a symptom of a broader problem I've experienced in different contexts. Result of encode carries no information about codec. Probably why I find myself resorting to { code, bytes } whenever I want to defer async CID creation.

It retrospect it seems silly that we identified need for this in MultihashDigest but not here

js-multiformats/src/hashes/interface.ts

Lines 12 to 32 in 9bcd7fe

    
           export interface MultihashDigest<Code extends number = number> { 
        
             /** 
        
              * Code of the multihash 
        
              */ 
        
             code: Code 
        
             /** 
        
              * Raw digest (without a hashing algorithm info) 
        
              */ 
        
             digest: Uint8Array 
        
             /** 
        
              * byte length of the `this.digest` 
        
              */ 
        
             size: number 
        
             /** 
        
              * Binary representation of this multihash digest. 
        
              */ 
        
             bytes: Uint8Array 
        
           }

Unfortunately I see no way to address this in backwards compatible manner. Maybe we could introduce MultiblockEncoder along the side of BlockEncoder similar to how we have MultibaseEncoder producing prefixed values and BaseEncoder without prefix:

js-multiformats/src/bases/interface.ts

Lines 4 to 14 in 9bcd7fe

    
           /** 
        
            * Base encoder just encodes bytes into base encoded string. 
        
            */ 
        
           export interface BaseEncoder { 
        
             /** 
        
              * Base encodes to a **plain** (and not a multibase) string. Unlike 
        
              * `encode` no multibase prefix is added. 
        
              * @param bytes 
        
              */ 
        
             baseEncode(bytes: Uint8Array): string 
        
           }

js-multiformats/src/bases/interface.ts

Lines 48 to 62 in 9bcd7fe

    
           export interface MultibaseEncoder<Prefix extends string> { 
        
             /** 
        
              * Name of the encoding. 
        
              */ 
        
             name: string 
        
             /** 
        
              * Prefix character for that base encoding. 
        
              */ 
        
             prefix: Prefix 
        
             /** 
        
              * Encodes binary data into **multibase** string (which will have a 
        
              * prefix added). 
        
              */ 
        
             encode(bytes: Uint8Array): Multibase<Prefix> 
        
           }

Maybe this is even broader issue of having multicodes in address as opposed to data itself. E.g if we tagged encoded bytes themself with multihash all the IR representations would naturally be represented although that ship has probably sailed a long ago.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Reconsider .code / .name fields of BlockEncoder / BlockDecoder #177

Proposal: Reconsider .code / .name fields of BlockEncoder / BlockDecoder #177

Gozala commented Apr 20, 2022

Proposal: Reconsider .code / .name fields of BlockEncoder / BlockDecoder #177

Proposal: Reconsider .code / .name fields of BlockEncoder / BlockDecoder #177

Comments

Gozala commented Apr 20, 2022