Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor out routing system #655

Draft
wants to merge 7 commits into
base: master
Choose a base branch
from
Draft

Conversation

aschmahmann
Copy link
Contributor

@aschmahmann aschmahmann commented May 27, 2020

In order to support #616 we need to have more configurability of how records received from the DHT are processed (i.e. everything we do aside from the Kademlia logic itself).

This PR takes a stab at that by introducing a set of processing functions that operate on the received records and then making those configurable via routing options.

One thing that's currently missing from the PR is the ability to configure who updating puts are sent to when we do a GetValue/SearchValue query.

A next step to examine is extracting the networking/RPC code from the routing + query logic so that they are attached to separate objects (if not packages) to make exposing additional functionality less overwhelming and confusing to new users.

Still a WIP, but looking for some feedback @aarshkshah1992

@aschmahmann aschmahmann force-pushed the feat/refactor-routing branch 2 times, most recently from 72f8ab7 to eca6287 Compare May 28, 2020 23:01
Comment on lines +77 to +24
// GetClosestPeersSeeded is the Kademlia 'node lookup' operation
func (dht *IpfsDHT) GetClosestPeersSeeded(ctx context.Context, key string, seedPeers []peer.ID, useRTPeers bool) (<-chan peer.ID, error) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the first of these double functions, it's a bit unfortunate that we have to expose 2x as many functions for the same thing. We could move them into a new struct, but I'm not sure if it's worth it. Thoughts?

If we're going to leave these functions on the main struct we should have a consistent naming scheme for these. Extended feeds pretty general and something to do with seeding the query or being a continuable query seems a little specific. I'm up for suggestions, otherwise I'll just use [OriginalName]Exteneded everywhere.

Also, while we're rewriting this it would be great to return []peer.ID instead of chan peer.ID, but some tests may have to be modified.

)

var (
logger = logging.Logger("dht.routing")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just use the dht logger here?

Comment on lines +10 to +12
type Processor interface {
Process(interface{}, func()) (interface{}, error)
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty unfortunate and is related to Go's lack of generics. We could also do RecordProcessor and ProviderProcessor here and just duplicate more of the code. Both options seems similarly gross to me, but having a single interface makes some of this a bit simpler (e.g. less routing options, sharing the CountStopper, etc.).

Also, once we move to a unified record system this should hopefully go away 🙏

Comment on lines +62 to +63
// FindProviders searches for the providers corresponding to given Key and streams the results.
func FindProviders(ctx context.Context, key multihash.Multihash, findProvsFn findProvsFn, processors []Processor, cfg *routing.Options) (<-chan peer.AddrInfo, <-chan []peer.ID, error) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is basically just a duplicated version of the SearchValue function, but with added type safety. Not sure if it's worth it, but seems reasonable.

Comment on lines +70 to +74
outChSize := maxRequestedRecords
if outChSize == 0 {
outChSize = 1
}
out := make(chan peer.AddrInfo, outChSize)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code has previously existed and for a requested provider record count > 0 the channel buffer = requested record count. I don't know if this is really necessary anymore, it seems like we could just set the channel size to 1.

While this doesn't really hurt too much on the memory front it does feel a little weird and certainly doesn't align with the SearchValue setup.

Thoughts on just removing this and setting the channel size to 1?

Comment on lines +13 to +14
//
// Deprecated: use github.com/libp2p/go-libp2p-kad-dht/routing.Quorum
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe moving the routing options into it's own package was overkill. Thoughts?

case <-ctx.Done():
return false
}
processors = []dhtrouting.Processor{validation, quorum, bestValue}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current code actually does quorum -> bestValue. The faithful representation of this here would be quorum -> validation -> bestValue (will fix in the next PR update). It's unfortunate that we end up validating twice (once in the getValues function and once in the pipeline) we can optimize this as well.

We should also figure out what (if anything) we want to do with the quorum function for SearchValue. I think requiring a certain number of the latest records to be equal would be reasonable, and would allow us to throw out invalid records and make things a little easier. Alternatively, we could just drop the quorum function entirely.

MaxCount: dhtrouting.GetQuorum(&cfg),
}

processors = []dhtrouting.Processor{newValuesOnly, quorum}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: While SearchValue does quorum -> bestValue, here we do newValuesOnly -> quorum. I'd like some comments explaining the pipeline order for each function.

Comment on lines -552 to -530
// If we have enough peers locally, don't bother with remote RPC
// TODO: is this a DOS vector?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If people start querying with count = 0 then this isn't a problem, so may be we should just emphasize that?

Comment on lines +1 to +2
package routing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More comments needed in this file. Also, is this separate package currently a good idea?

go.sum Outdated Show resolved Hide resolved
@aschmahmann aschmahmann changed the base branch from master to refactor/extract-messages January 4, 2021 09:17
@aschmahmann aschmahmann marked this pull request as draft January 4, 2021 09:17
@aschmahmann aschmahmann changed the base branch from refactor/extract-messages to master January 4, 2021 20:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant