Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Way to get CIDs of intermediate objects when querying with a path #8526

Open
3 tasks done
stbrody opened this issue Oct 26, 2021 · 8 comments
Open
3 tasks done

Way to get CIDs of intermediate objects when querying with a path #8526

stbrody opened this issue Oct 26, 2021 · 8 comments
Labels
exp/intermediate Prior experience is likely helpful help wanted Seeking public contribution on this issue kind/feature A new feature P2 Medium: Good to have, but can wait until someone steps up status/blocked Unable to be worked further until needs are met

Comments

@stbrody
Copy link

stbrody commented Oct 26, 2021

Checklist

  • My issue is specific & actionable.
  • I am not suggesting a protocol enhancement.
  • I have searched on the issue tracker for my issue.

Description

Summary:
dag.get with a path argument should be able to return an array of CIDs, representing all the intermediate IPFS objects it traversed along the path to eventually reach the object it ultimately returns. That would enable much more efficient sequential iteration over complex IPLD data structures.

Use case:
Imagine you are trying to do an in-order traversal over a tree structure encoded in IPLD. From knowing the number of elements in the tree (which could be stored in the root of the tree) and how many children each intermediate node has, you can deterministically calculate the depth of the tree. That would allow you to build a path selector specifying the path from the root of the tree to the left-most leaf node fairly easily, which could then be passed to ipfs.dag.get to get the data from the first leaf node in the tree. But now you want to fetch the second leaf node. You could once again deterministically build a path selector from the root to the second leaf node, but that would have the path once again running from the root, which if the tree is large may involve traversing many intermediate nodes multiple times. Instead, ideally you'd like to already have the CID of the parent node of the first leaf node, and then be able to issue a new query with just the path from that parent node to its second child to get the second leaf node of the overall tree. The problem is that dag.get with the path to the first leaf node will only return the data of the leaf node, not any information about the intermediate nodes it passed through to get there, so you have no way to know the CID of its parent. If the dag.get call returned not just the data from the first leaf node, but also an array of the CIDs it passed through to get there when traversing the path, then you'd be able to intelligently pop CIDs off the back of the resulting list to move back up the tree, and issue new dag queries with new paths to other children nodes as you continue to iterate over the tree structure.

@stbrody stbrody added the kind/feature A new feature label Oct 26, 2021
@welcome
Copy link

welcome bot commented Oct 26, 2021

Thank you for submitting your first issue to this repository! A maintainer will be here shortly to triage and review.
In the meantime, please double-check that you have provided all the necessary information to make this process easy! Any information that can help save additional round trips is useful! We currently aim to give initial feedback within two business days. If this does not happen, feel free to leave a comment.
Please keep an eye on how this issue will be labeled, as labels give an overview of priorities, assignments and additional actions requested by the maintainers:

  • "Priority" labels will show how urgent this is for the team.
  • "Status" labels will show if this is ready to be worked on, blocked, or in progress.
  • "Need" labels will indicate if additional input or analysis is required.

Finally, remember to use https://discuss.ipfs.io if you just need general support.

@stbrody
Copy link
Author

stbrody commented Oct 26, 2021

An alternative to returning an array of CIDs that were traversed over that would accomplish the same thing would be to instead return a CAR file containing the CIDs AND data for every ipfs object along the path that was given in the initial query.

@aschmahmann aschmahmann added exp/intermediate Prior experience is likely helpful exp/expert Having worked on the specific codebase is important P2 Medium: Good to have, but can wait until someone steps up and removed exp/intermediate Prior experience is likely helpful labels Nov 19, 2021
@aschmahmann
Copy link
Contributor

@stbrody is this a feature request for something like ipfs resolve --give-cids /ipfs/path... that outputs all the CIDs along the path, or would just doing #8239 be enough?

@BigLep BigLep added exp/intermediate Prior experience is likely helpful help wanted Seeking public contribution on this issue status/blocked Unable to be worked further until needs are met and removed exp/expert Having worked on the specific codebase is important labels Jan 7, 2022
@BigLep
Copy link
Contributor

BigLep commented Jan 7, 2022

2022-01-07 discussion: this would be be common usecase-specific form of #8239 . We'd likely implement this specific usecase using the more generic form of being able to fetch for a specific selector.

@stbrody : do you have a sense from Ceramic's perspective as to which of these two is higher priority?

Also, this isn't something the go-ipfs mainteners expect to getting to in the short term but could certainly direct others into where/how to solve.

I'm marking this as blocked until #8239 is handled.

@stbrody
Copy link
Author

stbrody commented Jan 10, 2022

I suppose if #8239 were done in such a way that we could get the entire tree structure loaded onto our local ipfs node, then doing multiple iterative calls over the same paths in the tree wouldn't be nearly as bad. You'd still wind up re-processing the same path multiple times, but with data that's all local so it will be much more performant.

My sense is that both this and #8239 are valuable in different ways, but I'd imagine this one would likely be easier to implement. And there are cases where this ticket actually helps more than #8239 does. Like if you're doing an in-order traversal over part of a tree structure. If you're only going to wind up processing some part of the tree, then pulling the whole tree to your local node is overkill, which can be especially bad if the tree is large. It would also be bad if you had to wait for all the data matching the selector (in this case the whole tree) had to be loaded locally before you can get the result from the first item you want to process.

is this a feature request for something like ipfs resolve --give-cids /ipfs/path... that outputs all the CIDs along the path

Yes, that's more or less what I'm imagining, though I'd want it exposed via the http-client.

do you have a sense from Ceramic's perspective as to which of these two is higher priority?

I'll defer to @oed on this one

@stbrody
Copy link
Author

stbrody commented Jan 10, 2022

I'm marking this as blocked until #8239 is handled.

FWIW, while I do see these two as related in the use cases they help improve, technically I think they're probably fairly independent.

@BigLep BigLep added this to the Best Effort Track milestone Mar 3, 2022
@BigLep
Copy link
Contributor

BigLep commented Mar 18, 2022

2022-03-18 conversations: maintainer priority and plan of record is:

  1. (in progress) Add selector support in gateways (https://github.com/ipfs/go-ipfs/issues/8769 )
  2. (easy followup) Add support for selectors in dag-export (dag API should let a user ask for the daemon to fetch data matching a selector #8239 )

General:
Paths are easy
Selectors are harder to use

As people have been asking about selectors, we're going to add them to more APIs but we don't want to overload users with the more complicated selector syntax.

We're treating paths and selectors separately (resolve the path and then apply the selector).
We're starting with CAR files because those users are already more "advanced". It's also possible to write a path as a selector (in most cases).

@BigLep
Copy link
Contributor

BigLep commented Jun 3, 2022

2022-06-03 conversation: this is still blocked per the discussion above. There will be a relevant gateway selector spec in the next month.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
exp/intermediate Prior experience is likely helpful help wanted Seeking public contribution on this issue kind/feature A new feature P2 Medium: Good to have, but can wait until someone steps up status/blocked Unable to be worked further until needs are met
Projects
Status: No status
Development

No branches or pull requests

3 participants