Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix!: replace dag walkers with generic CID extraction from blocks #447

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

achingbrain
Copy link
Member

Replace the codec-specific .dagWalkers property with a generic dag walker internally that uses the Block interface from the multicodecs module.

  • Removes the .dagWalkers property from the Helia interface
  • Adds getCodec and getHasher to retrieve codecs and hashers by code
  • Adds loadCodec and loadHasher options to allow sync or async loading of extra codecs/hashes in addition to staticlly configured ones in the codecs/hashers keys

BREAKING CHANGE: the .dagWalkers property has been removed

Change checklist

  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation if necessary (this includes comments as well)
  • I have added tests that prove my fix is effective or that my feature works

Replace the codec-specific `.dagWalkers` property with a generic
dag walker internally that uses the `Block` interface from the
`multicodecs` module.

- Removes the `.dagWalkers` property from the Helia interface
- Adds `getCodec` and `getHasher` to retrieve codecs and hashers by code
- Adds `loadCodec` and `loadHasher` options to allow sync or async loading of extra codecs/hashes in addition to staticlly configured ones in the `codecs`/`hashers` keys

BREAKING CHANGE: the `.dagWalkers` property has been removed
@achingbrain achingbrain requested a review from a team as a code owner February 22, 2024 18:21
@achingbrain
Copy link
Member Author

I think the pinning benchmarks should be re-run with this PR before merging. The multiformats/block .links() function works by deserializing a block into an object, then recursively walking every property and yielding any value that can be turned into a CID.

This could potentially be more expensive than deserializing a block, collecting CIDs during deserialization and then yielding them all as the DAGWalkers do.

Copy link
Member

@SgtPooki SgtPooki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few things that could be changed, not sure if the networked-storage accepting an additional arg was done explicitly for some reason though


await withBlock(cid, block)
const block = createUnsafe({ bytes, cid, codec })
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the implication of using createUnsafe ? I'm not sure I understand how this change affects things. And https://github.com/multiformats/js-multiformats/blob/5e2159a5126f15f2e16032b29a8bb5e31e619160/src/block.ts#L197 doesn't have much information

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a Block instance to call .links() on it.

createUnsafe doesn't verify that the block hash matches the block data, which is a (potentially) expensive operation, it just creates the block.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the sake of verified-fetch, where do we verify the block hash matches the data after the changes in this PR?

packages/utils/src/index.ts Outdated Show resolved Hide resolved
packages/utils/src/utils/networked-storage.ts Outdated Show resolved Hide resolved
packages/utils/src/utils/networked-storage.ts Outdated Show resolved Hide resolved
packages/utils/src/utils/get-codec.ts Outdated Show resolved Hide resolved
packages/utils/src/utils/get-hasher.ts Outdated Show resolved Hide resolved
packages/utils/src/utils/get-hasher.ts Outdated Show resolved Hide resolved
packages/utils/src/utils/is-promise.ts Outdated Show resolved Hide resolved

await addChildren(subChild, name, level + 1, index + i, depth - 1, children, dag, codec, blocks)
links.push(
await createAndPutBlock(dagCbor.code, dagCbor.encode(child), blocks)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is just a test fixture, but do we want to remove the ability to use any codec?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is creating a layered DAG depends on the codec - different codecs will implement this in different ways.

Previously we were only ever using one codec so it was a bit misleading.

packages/utils/test/utils/networked-storage.spec.ts Outdated Show resolved Hide resolved
achingbrain and others added 2 commits April 3, 2024 12:00
Co-authored-by: Russell Dempsey <1173416+SgtPooki@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants