Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: getPath with carScope #8

Merged
merged 11 commits into from May 1, 2023
Merged

feat: getPath with carScope #8

merged 11 commits into from May 1, 2023

Conversation

olizilla
Copy link
Contributor

@olizilla olizilla commented Apr 17, 2023

add getPath method as a generator that returns blocks for the targeted dag and all blocks traversed while resolving a cid+path string

supports carScope to specify what blocks to return for the resolved dag

  • 'all': return the entire dag starting at path. (default)
  • 'block': return the block identified by the path.
  • 'file': Mimic gateway semantics: Return All blocks for a multi-block file or just enough blocks to enumerate a dir/map but not the dir contents.

see: web3-storage/freeway#33
see: web3-storage/freeway#34
see: ipfs/specs#402

TODO:

  • find out how to identify the boundaries of a unixfs hamt

...unixfs-exporter seems to define it as "not having an empty or null Link.Name after the first 2 chars are stripped, which seems loose... what happens if the actual dir listing has 2 char long link names? see: https://github.com/ipfs/js-ipfs-unixfs/blob/e853049bd63d6773442e1540ae49b6a443ca8672/packages/ipfs-unixfs-exporter/src/resolvers/unixfs-v1/content/hamt-sharded-directory.ts#L20-L42

License: MIT

add getPath method as a generator that returns blocks for the targeted dag and all blocks traversed while resolving a cid+path string

supports carScope to specify what blocks to return for the resolved dag
- 'all': return the entire dag starting at path. (default)
- 'block': return the block identified by the path.
- 'file': Mimic gateway semantics: Return All blocks for a multi-block file or just enough blocks to enumerate a dir/map but not the dir contents.

see: web3-storage/freeway#33
see: web3-storage/freeway#34

TODO:
- [] find out how to identify the boundaries of a unixfs hamt (unixfs-exported seems to define it as "not having an empty or null Link.Name after the first 2 chars are stripped, which seems risky... what happens if the actual dir listing has 2 char long link names? see: https://github.com/ipfs/js-ipfs-unixfs/blob/e853049bd63d6773442e1540ae49b6a443ca8672/packages/ipfs-unixfs-exporter/src/resolvers/unixfs-v1/content/hamt-sharded-directory.ts#L20-L42

License: MIT
Signed-off-by: Oli Evans <oli@protocol.ai>
License: MIT
Signed-off-by: Oli Evans <oli@protocol.ai>
@olizilla olizilla requested a review from alanshaw April 17, 2023 16:20
@olizilla
Copy link
Contributor Author

@alanshaw plz to sanity check. also if you can point me at the right way to detect the boundary of the current hamt then that would save me some staring at things.

License: MIT
Signed-off-by: Oli Evans <oli@protocol.ai>
License: MIT
Signed-off-by: Oli Evans <oli@protocol.ai>
@@ -49,8 +49,7 @@
"blockstore-core": "^1.0.5",
"ipfs-unixfs": "^11.0.0",
"miniswap": "^2.0.0",
"standard": "^17.0.0",
"uint8arrays": "^3.0.0"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

turns out the swiss-army-knife has toString and fromString under multiformats/bytes! who knew!

Copy link
Member

@alanshaw alanshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

index.js Show resolved Hide resolved
index.d.ts Outdated
@@ -61,7 +75,7 @@ export declare class Dagula implements IDagula {
/**
* Emit nodes for all path segements and get UnixFS files and directories
*/
walkUnixfsPath (path: CID|string, options?: AbortOptions): Promise<UnixFSEntry>
walkUnixfsPath (path: CID|string, options?: AbortOptions): AsyncIterableIterator<UnixFSEntry & Block>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is UnixFSEntry not already a Block?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it lacks the all important bytes

@alanshaw
Copy link
Member

...unixfs-exporter seems to define it as "not having an empty or null Link.Name after the first 2 chars are stripped, which seems loose... what happens if the actual dir listing has 2 char long link names?

You just strip the first two characters, and if the result is empty then it's a HAMT node, and if it is not empty it is not.

@olizilla
Copy link
Contributor Author

is it the case that there is no edge case here because the user cannot create a dir with an um-named link.

License: MIT
Signed-off-by: Oli Evans <oli@protocol.ai>
License: MIT
Signed-off-by: Oli Evans <oli@protocol.ai>
index.js Outdated
Comment on lines 116 to 119
for await (const item of this.walkUnixfsPath(cidPath, { signal: options.signal })) {
base = item
yield item
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should emit blocks not unixfs entries... where an entry is for a hamt we will only emit one thing, but we want to emit all the blocks that we traverse through the hamt. Also we don't want to emit a mix of unixfs entries and blocks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies I should have caught that in the review!

@olizilla
Copy link
Contributor Author

olizilla commented Apr 28, 2023

@alanshaw ok the bits i missed first time around are now fixed, plz to eyeball

  1. as you pointed out walkPath emits a single entry even for multi-block structures: fixed by passing in a custom blockstore that caches the traversed blocks so we can emit them all.
  2. i was using the multiformats flavour block.links() function in my linkFilter, but that only understands ipld flavour link parsing, so dag-pb nodes link names are Links/<index>/Hash regardless of a Name field being present, so my hamtSearch wasn't working: fixed by special casing dagPB, which it would need to be anyway.

I've added tests for pathing through a sharded dir, as well as targeting a sharded dir as the end of the path as it's the most interesting case for carScope.

I've gone this root to ride on the work in unixfs-exporter for today, and get it working, hopefully without introducing any extra edge cases. I plan to remove this logic from here next week and put it in to https://github.com/web3-storage/ipfs-path which will become a block-centric rather than unixfsEntry-centric lib just for resolving ipfs paths... but first i want to land and ship verifiable paths and carScope to our gateway.

License: MIT
Signed-off-by: Oli Evans <oli@protocol.ai>
@olizilla olizilla requested a review from alanshaw April 28, 2023 20:31
License: MIT
Signed-off-by: Oli Evans <oli@protocol.ai>
Copy link
Member

@alanshaw alanshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

index.js Outdated Show resolved Hide resolved
index.js Outdated Show resolved Hide resolved
Co-authored-by: Alan Shaw <alan.shaw@protocol.ai>
@olizilla olizilla merged commit a613b45 into main May 1, 2023
3 checks passed
@olizilla olizilla deleted the get-path branch May 1, 2023 13:38
olizilla pushed a commit that referenced this pull request May 1, 2023
🤖 I have created a release *beep* *boop*
---


##
[6.0.0](v5.0.0...v6.0.0)
(2023-05-01)


### ⚠ BREAKING CHANGES

* support for multiple hash types
([#9](#9))

### Features

* add walkUnixfsPath to emit nodes for each path segment
([#7](#7))
([6c0eed1](6c0eed1))
* getPath with carScope
([#8](#8))
([a613b45](a613b45))
* support for multiple hash types
([#9](#9))
([c147fdb](c147fdb))


### Other Changes

* publish from ci
([#13](#13))
([5df4116](5df4116))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants