Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Reconsider .code / .name fields of BlockEncoder / BlockDecoder #177

Open
Gozala opened this issue Apr 20, 2022 · 0 comments
Open

Comments

@Gozala
Copy link
Contributor

Gozala commented Apr 20, 2022

I find .code filed and .name (to lesser degree) fields on following interfaces to be troublesome

export interface BlockEncoder<Code extends number, T> {
name: string
code: Code
encode(data: T): ByteView<T>
}
/**
* IPLD decoder part of the codec.
*/
export interface BlockDecoder<Code extends number, T> {
code: Code
decode(bytes: ByteView<T>): T

Problem is that it prevents one from defining codec composition without introducing subtle footgun. For example dag-ucan in theory could be composition of dag-cbor and raw codecs, meaning it could decode block in either cbor or raw encoding and similarly encode node either in cbor or raw representation (depending on UCAN specific nuances).

This double representation is an implementation detail currently hidden under new 0x78c0 multicodec code multiformats/multicodec#264.

Given the arguments in the thread I have considered dropping new code and make an implementation that is UCAN specialized BlockCodec<0x71|0x55> codec. However there are some interesting challenges:

  1. .code could be either 0x71 or 0x55, while type checker would be happy with either option it is misleading because it is common to use that code field when creating cids e.g.:
    const bytes = codec.encode(value)
    const hash = await hasher.digest(bytes)
    const cid = CID.create(1, codec.code, hash)
  2. I think this is a symptom of a broader problem I've experienced in different contexts. Result of encode carries no information about codec. Probably why I find myself resorting to { code, bytes } whenever I want to defer async CID creation.
    • It retrospect it seems silly that we identified need for this in MultihashDigest but not here
      export interface MultihashDigest<Code extends number = number> {
      /**
      * Code of the multihash
      */
      code: Code
      /**
      * Raw digest (without a hashing algorithm info)
      */
      digest: Uint8Array
      /**
      * byte length of the `this.digest`
      */
      size: number
      /**
      * Binary representation of this multihash digest.
      */
      bytes: Uint8Array
      }

Unfortunately I see no way to address this in backwards compatible manner. Maybe we could introduce MultiblockEncoder along the side of BlockEncoder similar to how we have MultibaseEncoder producing prefixed values and BaseEncoder without prefix:

/**
* Base encoder just encodes bytes into base encoded string.
*/
export interface BaseEncoder {
/**
* Base encodes to a **plain** (and not a multibase) string. Unlike
* `encode` no multibase prefix is added.
* @param bytes
*/
baseEncode(bytes: Uint8Array): string
}

export interface MultibaseEncoder<Prefix extends string> {
/**
* Name of the encoding.
*/
name: string
/**
* Prefix character for that base encoding.
*/
prefix: Prefix
/**
* Encodes binary data into **multibase** string (which will have a
* prefix added).
*/
encode(bytes: Uint8Array): Multibase<Prefix>
}


Maybe this is even broader issue of having multicodes in address as opposed to data itself. E.g if we tagged encoded bytes themself with multihash all the IR representations would naturally be represented although that ship has probably sailed a long ago.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant