Make block size limit part of the BlockEncoder API #223

Gozala · 2022-11-08T18:21:09Z

Currently BlockEncoder interface just encodes passed input into bytes

Lines 6 to 10 in 58117f2

    
           export interface BlockEncoder<Code extends number, T> { 
        
             name: string 
        
             code: Code 
        
             encode: (data: T) => ByteView<T> 
        
           }

Problem is:

If you go beyond block size limit, your block won't be bit-swappable.
You may not even know you've gone past block size limit.
Most people is not even aware of the block size limit.

Proposed solution:

I would like to propose amending our BlockEncoder interface as follows:

export interface BlockEncoder<Code extends number, T> {
  name: string
  code: Code
  /**
   * Encodes given data. If `buffer` is provided data is written into it and
   * a `Uint8Array` view of the written bytes is returned.  If `buffer` is not
   * passed new buffer is allocated with a `byteLength` corresponding to
   * default block size limit in IPFS. If encoded data does not fit the `buffer`
   * `RangeError` exception is thrown.
   */
  encode: (data: T, buffer?: Uint8Array) => ByteView<T>
}

Idea here is that:

User should be able to optionally pass in the buffer to write data into.
If data does not fit the block size limit encode fails.
User is still able to encode blocks larger than a block size limit by passing in larger buffer.

This will be non-breaking change at the API level, but it would be breaking in the sense that errors will occur if block is larger than a block size limit. Never the less it seems like a better default than silently letting things slip.

The text was updated successfully, but these errors were encountered:

Gozala · 2022-11-08T18:22:16Z

There is some overlap with #222, however this is much smaller in scope and does no attempt to reduce computational overhead.

rvagg · 2022-12-06T23:51:31Z

I think I'm fine with this but a little dubious of this as the right way to achieve the stated goal because it requires you allocating the maximum size byte array for each encode, which may end up being very wasteful if your encode regularly gives you only a fraction of that size. At least now our codecs (mostly?) only request allocation of what they actually need (or close to it, there's some trickery involved but it's not a very large optimistic allocation).

The change would also be fairly invasive if you want to go changing codecs. dag-pb would probably the easiest to change to start with but it's still pretty invasive (off the top of my head). So if you want to experiment with the API and have time then go ahead and see how it works out!

Gozala · 2022-12-07T08:05:03Z

I think I'm fine with this but a little dubious of this as the right way to achieve the stated goal because it requires you allocating the maximum size byte array for each encode, which may end up being very wasteful if your encode regularly gives you only a fraction of that size. At least now our codecs (mostly?) only request allocation of what they actually need (or close to it, there's some trickery involved but it's not a very large optimistic allocation).

This does not match my experience. We find ourselves allocating fixed size buffers (frame of sorts) that we intended to pack with some data and send it off, then repeat until we're done.

With current APIs we have two choices:

Collect blocks until we reach frame size, then allocate frame and copy all blocks. This causes spikes in memory use as there are moments where things are double allocated.
Allocate frame ahead and copy bytes from every block then drop the block. This does not cause spikes but we have to release blocks or create a replicas that point into byte range in the frame.

With proposed API we no longer have to choose between either of two, instead we can just allocate frame and encode blocks directly into it. That does mean that for each block we'll create a new Uint8Array view into buffer from the current offset with max block length, but those views will be short lived and are a lot cheaper.

rvagg · 2022-12-08T04:21:27Z

Oh, right, so you're allocating a large chunk and then wanting to present a slice to the decoder to fill; I was imagining allocating a new large chunk for each one but that's unnecessary with Uint8Arrays if you have a nice queue lined up. That's fair enough.

BigLep · 2023-01-03T23:45:36Z

2023-01-03 IPLD triage conversation: @Gozala are you going to take this on?

rvagg assigned Gozala Dec 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make block size limit part of the BlockEncoder API #223

Make block size limit part of the BlockEncoder API #223

Gozala commented Nov 8, 2022

Gozala commented Nov 8, 2022

rvagg commented Dec 6, 2022

Gozala commented Dec 7, 2022

rvagg commented Dec 8, 2022

BigLep commented Jan 3, 2023

Make block size limit part of the BlockEncoder API #223

Make block size limit part of the BlockEncoder API #223

Comments

Gozala commented Nov 8, 2022

Gozala commented Nov 8, 2022

rvagg commented Dec 6, 2022

Gozala commented Dec 7, 2022

rvagg commented Dec 8, 2022

BigLep commented Jan 3, 2023