Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache Provider Records in the network #386

Open
aarshkshah1992 opened this issue Aug 29, 2019 · 3 comments
Open

Cache Provider Records in the network #386

aarshkshah1992 opened this issue Aug 29, 2019 · 3 comments

Comments

@aarshkshah1992
Copy link
Contributor

aarshkshah1992 commented Aug 29, 2019

Problem

In Kademlia, the K (20 for now) closest nodes to the hash H of some content are responsible for serving the providers for that content. As described quite well in libp2p/specs#163, as the content gets more popular, these nodes get overburdened since all provider lookups converge onto them. This creates both performance & availability problems.

Solution

The kademlia paper suggests caching records like so:

For caching purposes, once a lookup succeeds, the requesting node stores the (key,value)
pair at the closest node it observed to the key that did not return the value.
Because of the unidirectionality of the topology, future searches for the same key are 
likely to hit cached entries before querying the closest node.

Basically, once a node has finished fetching the providers for hash H from the network, it should store the providers on the node N closest to the hash H that did not return the providers when it was traversing the network. This expands the radius of providers & future queries for H from nodes further from H than N will discover the cached providers before they bother the K closest nodes.
This distributes provider lookups across the network & prevents them from converging on & congesting the same path.

Cache Eviction

From the kad paper:

To avoid “over-caching,” we make the expiration time of a (key,value) pair in any 
node’s database exponentially inversely proportional to the number of nodes 
between the current node and the node whose ID is closest to the key ID.

Basically, the further a node N is away from a node V which is the node that's closest to hash H in the whole network, the faster the expiry of the cached providers on N. This makes intuitive sense because queries for hash H from any node move closer & closer to V with each hop. So, the closer a node is to V, the higher the probability that it will be on a provider lookup path than a node that's further from V.

Question: Can't we simply use the XOR distance between N & H to determine expiry time rather than number of nodes between N & V since that is not trivial to do in our current implementation

Testing
libp2p testlab FTW !

@raulk @Stebalien

@aarshkshah1992 aarshkshah1992 changed the title `Cache' Provider Records in the network Cache Provider Records in the network Aug 29, 2019
@aschmahmann
Copy link
Contributor

Can't we simply use the distance from H to determine expiry time rather than distance from V since determining V will need more effort.

Yes and No. According to your quote the time t that we store the data corresponding to H is c*2^-d where d is some key space distance metric and c is some preset coefficient. If we use the distance from H instead of from V then c will need to be very large in order to make the cache times non-negligible. However, if c is some preset coefficient then it depends on properties like network size (i.e. not really a constant).

This leaves us with some options:

  1. Make c a property of the network so if people want to run DHTs aside from the shared global one they'll have to tweak the number appropriately
  2. Have some metric for calculating network size (and therefore c) related to how filled the buckets are.
  3. Ask the node doing the "put" to give the correct cache duration (since that node has searched for where to put the values already it should have a good idea, but it requires trusting the put node to do that correctly)

I'd prefer option 2 (or even 1) over 3 since 3 makes DoS attacks on particular record updates much easier than 1 or 2 do.

@aarshkshah1992
Copy link
Contributor Author

aarshkshah1992 commented Nov 6, 2019

@aschmahmann I see exactly what you mean. Because distance(N, H) is potentially >> num_nodes(N,V). Given that 2^-d for us will be in [1, 2] (min d = 1, max d = 2^32 -1), we will have to choose c (based on the network size) carefully to come up with a meaningful expiry time in millis.

Let me come up with a rudimentary TODO.

@aarshkshah1992
Copy link
Contributor Author

aarshkshah1992 commented Nov 6, 2019

Hmm, I now realise that we currently do not/should not allow a node to register another node as a provider which means this caching protocol can not work for us for Provider Records in it's current form.

I think using it for PutValue records first makes more sense.

TODO

  • Pick brains to come up with a constant based on the network size that produces meaningful expiry times for eviction.

  • Currently, we do not support a 'custom expiry time' for a PutValue record. The expiry time is determined by the caller who periodically sweeps the store(Expire and republish records received via PutValue #388). We need to make the expiry time a part of the serialized PutValue record for eviction purposes & ensure that we do not republish cached records as part of Expire and republish records received via PutValue #388.

  • Modify the dht.getValues query to keep a track of the closest peer it observed that did not return the value & asynchronously call PutValue on that peer after getValues finishes successfully.

@aschmahmann @Stebalien Please let me know your thoughts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants