Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

carv2/ReadWriteBlockstore: support deferred root CIDs #196

Open
raulk opened this issue Aug 2, 2021 · 5 comments
Open

carv2/ReadWriteBlockstore: support deferred root CIDs #196

raulk opened this issue Aug 2, 2021 · 5 comments
Labels
P2 Medium: Good to have, but can wait until someone steps up

Comments

@raulk
Copy link
Member

raulk commented Aug 2, 2021

blockstore.OpenReadWrite requires providing the root CIDs when creating a new blockstore. This design inhibits the ability to use a CARv2 blockstore as the target of a streaming merkle DAG formation like UnixFS, as the root CID is not known beforehand.

We could work around this situation by supplying a placeholder root CID, and once the blockstore is finalized, we could go back and replace those bytes in the header.

Unfortunately, the library doesn't provide APIs to do that safely and without making assumptions about the underlying format, or breaking abstractions.

Some ideas:

  1. Allow blockstore.OpenReadWrite to take a []struct{cid.Builder, func() (cid.Cid, error)}.
    • Use the cid.Builders to compute the length of each CID, and preallocate those bytes in the header.
    • On Finalize, call each function to get the actual CID to replace it in the header.
  2. Simpler: allow the user to specify a number of bytes to preallocate, and provide primitives to update a CAR header.
@masih
Copy link
Member

masih commented Aug 16, 2021

Implementation in lotus here

@rvagg
Copy link
Member

rvagg commented May 3, 2022

@masih can we close this out? is what we have enough to cover this for now or should we label this as an actual TODO for API improvement?

@masih
Copy link
Member

masih commented May 4, 2022

I think this issue is worth resolving. The API we have just now doesn't exactly cover the ask here and I think it is a useful thing to have.

@BigLep BigLep added P2 Medium: Good to have, but can wait until someone steps up and removed need/author-input Needs input from the original author labels May 10, 2022
@MichaelMure
Copy link
Contributor

In a completely different direction, have you considered writing the header+index at the end of the file, like the zip format does? This has several benefit:

  • no assumption required about the header/index size
  • way easier to build the index and root CID list while writing the blocks, just append at the end
  • way easier to append new blocks on an already finalized file: read the previous header/index, overwrite those with the new blocks, write the new header/index again

@rvagg
Copy link
Member

rvagg commented Aug 1, 2022

@MichaelMure yes, that was always an option up front for the CAR format, I even had an early proposal for a DAG storage container that was based on ZIP: https://github.com/rvagg/js-datastore-zipcar / https://github.com/rvagg/go-datastore-zipcar (before we really committed to, and specced CAR). There's many cases where having it in a trailer would be useful, but there's also many cases where having it in the header is useful—one primary reason is that you can walk and verify a DAG in a well-ordered CAR by picking out the root and progressively loading the blocks in the DAG; so you can do streaming reads fairly nicely. On the flip side, having the root(s) in a trailer is nicer for streaming writes where you may not know the root until you've finished bundling it all up.

CARv2 is considered an in-between format that still wraps a CARv1 but adds some other features. Nailing down a CARv3 might be where we open up the option space a bit to allow for more novel layouts like headers vs trailers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 Medium: Good to have, but can wait until someone steps up
Projects
None yet
Development

No branches or pull requests

5 participants