Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prioritize announcement of pin roots #10365

Closed
lidel opened this issue Mar 7, 2024 · 2 comments · Fixed by #10376
Closed

Prioritize announcement of pin roots #10365

lidel opened this issue Mar 7, 2024 · 2 comments · Fixed by #10376
Assignees
Labels
effort/days Estimated to take multiple days, but less than a week exp/intermediate Prior experience is likely helpful kind/enhancement A net-new feature or improvement to an existing feature P1 High: Likely tackled by core team if no one steps up topic/routing Topic routing

Comments

@lidel
Copy link
Member

lidel commented Mar 7, 2024

Hopefully inexpensive optimization that would help Collab cluster, and content routing based on Amino DHT in general:

  • before announcing everything, make a quick pass and announce only the root blocks for all pins first.

This ensures most important CIDs are out of the door (in the spirit of ipfs/specs#462), and we can slowly announce everything else.

@lidel lidel added kind/enhancement A net-new feature or improvement to an existing feature topic/routing Topic routing P1 High: Likely tackled by core team if no one steps up exp/intermediate Prior experience is likely helpful effort/days Estimated to take multiple days, but less than a week labels Mar 7, 2024
@aschmahmann
Copy link
Contributor

Seems reasonable. The main tradeoff here is that we might over-read data from disk which may/may not be that important.

The reprovider strategies are here

kubo/core/node/provider.go

Lines 130 to 139 in e22f47a

switch reprovideStrategy {
case "all", "":
keyProvider = fx.Provide(provider.NewBlockstoreProvider)
case "roots":
keyProvider = fx.Provide(pinnedProviderStrategy(true))
case "pinned":
keyProvider = fx.Provide(pinnedProviderStrategy(false))
default:
return fx.Error(fmt.Errorf("unknown reprovider strategy %q", reprovideStrategy))
}

Probably the easiest way to do this is:

  • "roots" -> keep doing the same thing
  • "pinned" -> do a first pass using the "roots" function and then do this function (maybe keep something like a bloom filter or map so we don't readvertise the same data twice in exchange for some more memory)
    • If the plan is to use a map here it'd probably be better to rewrite the function to enumerate the roots and then put them into a queue so we don't have to hit the datastore twice
  • "all" -> do a first pass using the "roots" function, then do the blockstore enumeration
    • Could do roots -> pinned -> all but if most of the data is pinned it'll require enumerating the blockstore twice which could be quite expensive. This way the worst case is if the blockstore is lots of tiny objects in which case the datastore and blockstore are both enumerated (might still be annoying, but likely less bad)

@lidel lidel mentioned this issue Mar 19, 2024
11 tasks
@aschmahmann
Copy link
Contributor

aschmahmann commented Mar 19, 2024

From conversation with @lidel:

Something we could also do here is doing a local walk (i.e. don't do any remote fetching) of MFS and announcing the file/folder roots first (along with the other "roots").

Tracked in #10386

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
effort/days Estimated to take multiple days, but less than a week exp/intermediate Prior experience is likely helpful kind/enhancement A net-new feature or improvement to an existing feature P1 High: Likely tackled by core team if no one steps up topic/routing Topic routing
Projects
No open projects
Status: 🎉 Done
Development

Successfully merging a pull request may close this issue.

3 participants