Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Multiblock encoder interface #175

Open
Gozala opened this issue Apr 9, 2022 · 3 comments
Open

Proposal: Multiblock encoder interface #175

Gozala opened this issue Apr 9, 2022 · 3 comments

Comments

@Gozala
Copy link
Contributor

Gozala commented Apr 9, 2022

We're running into more and more cases where BlockEncoder interface just does not fit the bill:

  1. With IPNFT geared towards NFTs we've discovered that NFT metadata can easily exceed 1MiB size which would hinder our ability to serve such blocks on gateway etc....
  2. With new UnixFS code we basically want pass file and get set of blocks with a root back.
  3. Now with UCANs we want to pass auth chain and produce block per link in chain.

I am sure I'm forgetting some and we are likely to encounter more use cases where we want to turn some input into a DAG represented by many blocks. Which is why I would like to propose adopting following interfaces:

export interface SyncDAGEncoder<Code extends number = number, T extends unknown = unknown> {
  encoder(data:T): IterableIterator<{ code: Code, bytes: Uint8Array }>
}

export interface AsyncDAGEncoder<Code extends number = number, T extends unknown = unknown> {
    encoder(data:T): AsyncIterableIterator<{ code: Code, bytes: Uint8Array }>
}

export type DAGEncoder<Code extends number = number, T extends unknown = unknown> =
  | SyncDAGEncoder<Code, T>
  | AsyncDAGEncoder<Code, T>

Last block would be a DAG root block (which is natural due to hash linking)

Such interfaces would cover all above use cases. Additionally we could make all our block codecs implement these interface too making them compatible.

@Gozala
Copy link
Contributor Author

Gozala commented Apr 15, 2022

I'm realizing now that above proposed API is not great and pretty much will never be sync because CIDs need to be computed in order to build a DAG. I think what would make more sense is to represent DAGs with clearly denoted block boundaries, however it would be difficult to generalize this and maybe it would be best not to. Maybe instead encoder interface could be expanded to allow recognizing what needs to be linked e.g.

interface DAGIterator<T extends unknown = unknown> {
   iterate <U>(data:T): IterableIterator<{ encoder:  BlockEncoder<number, U>, data: U }>
}

Such thing could be used to:

  1. Pass in value that needs encoding
  2. Iterate over the parts that need to be broken out
  3. After all parts are encoded continue with encoding actual value substituting all the parts with corresponding links

@rvagg
Copy link
Member

rvagg commented Apr 19, 2022

@Gozala I'm not quite following you on the last comment there; is it the order that's a problem? I get that that sync API is a problem, but beyond that why are you wanting to have an iterator of encoders? I don't quite see what problem that's solving.

Also what is U in your iterate() generic?

Also 2 .. it's kind of amusing to see you here, and in dag-ucan, essentially having to re-invent the whole ADL concept after we went through the dramas of disagreements in the IPLD team re their utility. I really think it'd be worth taking another look at whether there's a path to doing something sensible in JS on this front, and perhaps what you're getting at here is part of that (in Go, the write-side of ADLs are the least mature part, there's some messy mechanics and plumbing). I started tinkering with a new JS stack to try and better encompass these ideas a while back but it's been another one of those projects that get lost in the too-many-more-important-things-to-do rush.

@Gozala
Copy link
Contributor Author

Gozala commented Apr 20, 2022

@Gozala I'm not quite following you on the last comment there; is it the order that's a problem? I get that that sync API is a problem, but beyond that why are you wanting to have an iterator of encoders? I don't quite see what problem that's solving.

Problem is that SyncDAGEncoder / AsyncDAGEncoder only emitted { code, bytes } and a thing consuming it may generate different CIDs (due to different hashing alg) than the ones that parent node will use to reference it's children.

Also what is U in your iterate() generic?

Yeah it is generic basically telling that type of data field is the same as type of data of the encoder as per

export interface BlockEncoder<Code extends number, T> {
name: string
code: Code
encode(data: T): ByteView<T>
}

Also 2 .. it's kind of amusing to see you here, and in dag-ucan, essentially having to re-invent the whole ADL concept after we went through the dramas of disagreements in the IPLD team re their utility. I really think it'd be worth taking another look at whether there's a path to doing something sensible in JS on this front, and perhaps what you're getting at here is part of that (in Go, the write-side of ADLs are the least mature part, there's some messy mechanics and plumbing). I started tinkering with a new JS stack to try and better encompass these ideas a while back but it's been another one of those projects that get lost in the too-many-more-important-things-to-do rush.

Happy to amuse :P More seriously, we really need a way to represent things that span multiple blocks that can be packed into CAR(s) in a generic way. It does sound like ADLs, but then again they don't seem to have a very concrete definition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants