New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
complete reads of lazy nodes in walks #303
Closed
Closed
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we worried here about maps and lists as well? e.g. if I match on a list or map do I need to iterate through the whole thing so I get all the blocks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't currently have any ADLs that exercise that functionality, so i think we figure out how to better re-factor and not need consumers of traversals to all replicate increasingly complicated logic on how to manage lazy loading once we have an example, like hamts, that we try to actually register as an available reifier
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about UnixFS directories, aren't they map ADLs https://github.com/ipfs/go-unixfsnode/blob/650a6150508a73accb8d8014aa39400ad0657433/hamt/shardeddir.go#L161?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay if we file an issue for it? i don't think adding iteration of lists/maps here is going to do anything other than adding another potential form of inconsistency at this point, so i'd like to figure out what we actually need to do in ipld-prime-traversal level for that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would think we're safe in that you're going to touch all of the blocks involved in getting to your leaves even through map & list ADLs, not necessarily all the blocks, but the right ones according to the selector. If you want the whole map, then your selector should probably be adjusted, and maybe you shouldn't
InterpretAs
. I think even if you're doing an ADL walk and matching-all then you should get all of the list or map contents that are relevant to the ADL so that'd still be covered I think (again, don't do it as an ADL if you want to be sure it's exhaustive on the whole DAG).What about
Slice
though? That seems like a relevant matcher to take into account here doesn't it? If I only want a subset of bytes, I can craft a selector for it and I would expect only the relevant blocks to be included. Maybe that's a niche use-case, but it's still a use-case? (If you think this is relevant then that could just be shunted to TODO in the code and maybe an issue as well).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@willscott Wont this discard the actual leaf bytes that we are looking for as well ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aarshkshah1992 the important part of this is that we load the blocks, that's where the list of blocks to include is being catalogued. So we can discard the bytes, but in the process we're forcing it to load all of the blocks involved so they're going to be included.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting idea. Does this mean it doesn't really make sense to "match" on a directory/list rather than the elements anyhow? Or does it only not make sense within block-transfer contexts (CAR files, downloading data over Bitswap/GraphSync, etc.).?
IIUC this case with Slices is the main point of this PR. Will this not cover that? Perhaps once the tests are added (combined with #305) it'll be easier to see. IIUC
Slice
covers String, Bytes (and LargeBytes). Do we have to add code to cover String + Bytes as well as LargeBytes?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re matching - technically yeah, currently with the usecases we have, you really should only be concerned about the nodes that get you to the leaves that you care about. Perhaps there will be cases (in the future) where you're using an ADL that has hidden metadata that's not touched by a selector but really should be used; so maybe in the "I want all of this" cases, you should avoid doing it at the ADL level and just run an exhaustive selector from the root on the untyped node.
Re
Slice
- you're right - this usesMatcher_Subset
which usesSlice
internally which is doing theLargeBytesNode
@ https://github.com/ipld/go-ipld-prime/blob/651b1cd5be900ae04b703bdff39b7727c26f36a6/traversal/selector/matcher.go#L52While nice, there's only fairly marginal gains to add tests for anything other than
LargeBytesNode
since we're pretty focused on which blocks are being loaded and we currently don't have an interface like that for ADLs for Strings (i.e. without an equivalent ofLargeBytesNode
you're going to get the whole string--all nodes--even if it's an ADL and you want only a subset). This PR (& #305) are focused on block loading and limiting that to what's needed.