Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consider easier ways to let users use a CAR index directly #243

Open
mvdan opened this issue Sep 27, 2021 · 2 comments
Open

consider easier ways to let users use a CAR index directly #243

mvdan opened this issue Sep 27, 2021 · 2 comments
Labels
kind/enhancement A net-new feature or improvement to an existing feature P3 Low: Not priority right now

Comments

@mvdan
Copy link
Contributor

mvdan commented Sep 27, 2021

Right now, an easy way to transparently use the index is to go through the blockstore package. Ideally we don't want that to be the main option, though - many use cases don't need a blockstore and its extra abstraction layer.

It should be possible to efficiently use an index from a CAR file on disk via OpenReader. The best option right now is to go through Reader.IndexReader and index.ReadFrom. Unfortunately, it has some shortcomings:

  1. Always loads the entire index into memory. This is rather wasteful if one just wants to loop over all index entries once, for example. Plus, OpenReader already uses an mmap, which allows for fast sequential or random access.

  2. No ability to inspect information about the index. For example, how do I tell if a CAR file has a multihash sorted index? Right now the API just allows this by loading the entire index into memory.

  3. Has some footguns; for example, it's a bit too easy to call index.ReadFrom straight on Reader.IndexReader, forgetting that it may be nil.

I'm working around these in the indexer provider, but I'll probably backport some of it in the form of reusable APIs.

This issue would enable #222, I think.
It's unclear to me if/how #95 is related.

cc @masih @willscott

@mvdan mvdan added the kind/enhancement A net-new feature or improvement to an existing feature label Sep 27, 2021
@mvdan
Copy link
Contributor Author

mvdan commented Sep 27, 2021

One easy suggestion is to allow inspecting the index codec from Reader:

func (*Reader) IndexCodec() multicodec.Code

The internal reader is a ReaderAt, so we can read the varint without consuming bytes. I don't even think we need to return an error; in the rare case the index varint is corrupted, we can return a bogus multicodec like 0 (i.e. "identity"). The caller should only consider the result valid if it's a known car index multicodec.

@mvdan
Copy link
Contributor Author

mvdan commented Sep 27, 2021

Another suggestion, which could replace IndexCodec above, is a higher level API to obtain the existing Index in an efficient way:

func (*Reader) Index() (index.Index, error)

It would return nil, nil if the CAR file has no index. It would back up the index with the mmap where possible, or otherwise read and decode the entire index into memory. One could still obtain the multicodec via Index.Codec, and one could type assert the index interface to IterableIndex or specific implementations.

@BigLep BigLep added the P3 Low: Not priority right now label Jun 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement A net-new feature or improvement to an existing feature P3 Low: Not priority right now
Projects
None yet
Development

No branches or pull requests

2 participants