Expire and republish records received via PUT_VALUE and ADD_PROVIDER #323

raulk · 2019-04-18T12:07:42Z

We:

Store such data permanently, thus deviating from Kademlia and creating weakness in our system for various reasons.
Don't rebalance/republish these records, allowing these records to disappear from the network as nodes leave and join.

See discussion in https://discuss.libp2p.io/t/dht-hourly-key-value-republish/93/4.

vyzo · 2019-04-18T12:26:19Z

this will be a problem for ipns, as the original publisher may be offline.
cc @Stebalien

raulk · 2019-04-18T12:42:13Z

Note that IPNS does republish the name record every 4 hours:

https://github.com/ipfs/go-ipfs/blob/42e191c0170a8e818c3302dab9b5afbf968df405/namesys/republisher/repub.go#L27

But the way I see it, this is to deal with network churn, not because the DHT peers are actively pruning the record. Even if the IPNS record carries the expiration, all the DHT sees is a []byte. Maybe via a Validator it validates a record is not expired, but this would happen only reactively: when processing a PUT_VALUE or a GET_VALUE request.

dirkmc · 2019-04-18T12:44:51Z

I want to make sure I understand the reason that Kademlia requires a node to republish its key/value pairs every hour. Is it so that as peers enter and leave the DHT the key/value is rebalanced onto the nodes closest to the key? If that's the case then it should make IPNS work better, because it will be more likely to find 16 copies of the key/value pair.

raulk · 2019-04-18T12:46:43Z

@dirkmc in addition to that, it helps keep storage footprint in check -- although by itself it's insufficient to deter DoS attacks exploiting that vector.

Stebalien · 2019-04-18T17:39:03Z

I want to make sure I understand the reason that Kademlia requires a node to republish its key/value pairs every hour.

The requirement comes from the fact that the network isn't exactly stable.

We should definitely be throwing away old records, we even make sure to record the time received so we can do this. The main issue is that we simply don't sweep the datastore.

raulk · 2019-04-18T17:45:08Z

@dirkmc if you're up for making a contribution, we can guide you through it ;-)

dirkmc · 2019-04-19T00:32:41Z

Sure, I can do so - I've been working on the Javascript DHT so I will do the same there

jacobheun · 2019-04-23T12:47:59Z

Should we also do the same for ADD_PROVIDER? It seems like it will suffer from the same issues if not republished.

vyzo · 2019-04-23T15:38:30Z

provider records are already collected after 24hs, and changing that would be a difficult change to deploy.

This commit relates to [libp2p-146] and [libp2p-1089]. * All records expire (by default, configurable). * Provider records are also stored in the RecordStore, and the RecordStore API extended. * Background jobs for periodic (re-)replication and (re-)publication of records. Regular (value-)records are subject to re-replication and re-publication as per standard Kademlia. Provider records are only subject to re-publication. * For standard Kademlia value lookups (quorum = 1), the record is cached at the closest peer to the key that did not return the value, as per standard Kademlia. * Expiration times of regular (value-)records is computed exponentially inversely proportional to the number of nodes between the local node and the closest node known to the key (beyond the k closest), as per standard Kademlia. The protobuf messages are extended with two fields: `ttl` and `publisher` in order to implement the different semantics of re-replication (by any of the k closest peers to the key, not affecting expiry) and re-publication (by the original publisher, resetting the expiry). This is not done yet in other libp2p Kademlia implementations, see e.g. [libp2p-go-323]. The new protobuf fields have been given somewhat unique identifiers to prevent future collision. Similarly, periodic re-publication of provider records does not seem to be done yet in other implementations, see e.g. [libp2p-js-98]. [libp2p-146]: libp2p#146 [libp2p-1089]: libp2p#1089 [libp2p-go-323]: libp2p/go-libp2p-kad-dht#323 [libp2p-js-98]: libp2p/js-libp2p-kad-dht#98

* Somewhat complete the implementation of Kademlia records. This commit relates to [libp2p-146] and [libp2p-1089]. * All records expire (by default, configurable). * Provider records are also stored in the RecordStore, and the RecordStore API extended. * Background jobs for periodic (re-)replication and (re-)publication of records. Regular (value-)records are subject to re-replication and re-publication as per standard Kademlia. Provider records are only subject to re-publication. * For standard Kademlia value lookups (quorum = 1), the record is cached at the closest peer to the key that did not return the value, as per standard Kademlia. * Expiration times of regular (value-)records is computed exponentially inversely proportional to the number of nodes between the local node and the closest node known to the key (beyond the k closest), as per standard Kademlia. The protobuf messages are extended with two fields: `ttl` and `publisher` in order to implement the different semantics of re-replication (by any of the k closest peers to the key, not affecting expiry) and re-publication (by the original publisher, resetting the expiry). This is not done yet in other libp2p Kademlia implementations, see e.g. [libp2p-go-323]. The new protobuf fields have been given somewhat unique identifiers to prevent future collision. Similarly, periodic re-publication of provider records does not seem to be done yet in other implementations, see e.g. [libp2p-js-98]. [libp2p-146]: #146 [libp2p-1089]: #1089 [libp2p-go-323]: libp2p/go-libp2p-kad-dht#323 [libp2p-js-98]: libp2p/js-libp2p-kad-dht#98 * Tweak kad-ipfs example. * Add missing files. * Ensure new delays are polled immediately. To ensure task notification, since `NotReady` is returned right after. * Fix ipfs-kad example and use wasm_timer. * Small cleanup. * Incorporate some feedback. * Adjustments after rebase. * Distinguish events further. In order for a user to easily distinguish the result of e.g. a `put_record` operation from the result of a later republication, different event constructors are used. Furthermore, for now, re-replication and "caching" of records (at the closest peer to the key that did not return a value during a successful lookup) do not yield events for now as they are less interesting. * Speed up tests for CI. * Small refinements and more documentation. * Guard a node against overriding records for which it considers itself to be the publisher. * Document the jobs module more extensively. * More inline docs around removal of "unreachable" addresses. * Remove wildcard re-exports. * Use NonZeroUsize for the constants. * Re-add method lost on merge. * Add missing 'pub'. * Further increase the timeout in the ipfs-kad example. * Readd log dependency to libp2p-kad. * Simplify RecordStore API slightly. * Some more commentary. * Change Addresses::remove to return Result<(),()>. Change the semantics of `Addresses::remove` so that the error case is unambiguous, instead of the success case. Use the `Result` for clearer semantics to that effect. * Add some documentation to .

Stebalien · 2019-07-18T07:56:49Z

Closing #354 in favor of this:

There are two parts to republishing:

The original publisher should republish periodically (4-12 hours) and records should expire every 8-24 hours.
Nodes that have received a record should republish hourly.

See section 2.5 of https://pdos.csail.mit.edu/~petar/papers/maymounkov-kademlia-lncs.pdf.

aarshkshah1992 · 2019-07-18T09:24:51Z

@Stebalien Do we plan to implement this anytime soon ?

Stebalien · 2019-07-25T17:21:50Z

I'd like to run a change like this against the real network for a while before merging it. Luckily, some of the test infrastructure we're working on will allow us to do that.

So yeah, go ahead! Just be warned that there may be a long road to merging it.

aarshkshah1992 · 2019-08-12T17:09:52Z

@Stebalien When you say test infrastructure, are you talking about the testlab network ?

Stebalien · 2019-08-12T21:55:39Z

Yes.

aarshkshah1992 · 2019-08-20T14:47:01Z

@Stebalien @raulk

Please can you assign this to me ?

There are two parts to republishing:

The original publisher should republish periodically (4-12 hours)

So, after a successful PutValue call to the network, the dht instance that is the original publisher will start a go-routine that keeps calling PutValue for that key every 4-12 hours ?
I see the paper mentions an interval of 24 hours or more. Why do we choose 4-12 hours ?

Records should expire every 8-24 hours.

The Kad paper simply states that they require an expiry interval of 24 hours for file sharing records but we ALREADY expire provider records after 24 hours. This issue is for the PutValue records & we currently have a MaxRecordAge of 36 hours. Why do we choose an interval of 8-24 ?

Nodes that have received a record should republish hourly.

See section 2.5 of https://pdos.csail.mit.edu/~petar/papers/maymounkov-kademlia-lncs.pdf.

If we choose a scan interval randomly (say between 60-75), the first peer that fires a republish resets the received time on the record on all other peers that are still among the K closest to the key. This should minimise the number of re-publish calls(This optimisation is mentioned in the paper).

aarshkshah1992 · 2019-09-02T15:42:48Z

TODO

Expire non-provider records in the datastore that are older than 12 hours.
Original publisher should republish every 6 hours.
Nodes that have received a record should republish every 60-75 minutes.
Test with libp2p-testlab

Stebalien · 2019-09-05T00:17:08Z

I see the paper mentions an interval of 24 hours or more. Why do we choose 4-12 hours?

You're right, 24 hours should be fine. The 4-12 hours was necessary because we don't currently rebalance.

The Kad paper simply states that they require an expiry interval of 24 hours for file sharing records but we ALREADY expire provider records after 24 hours. This issue is for the PutValue records & we currently have a MaxRecordAge of 36 hours. Why do we choose an interval of 8-24 ?

That's also a good point. File sharing records make sense (really, the client should choose but we can handle that later) but we IPNS records can last longer (36 hours is reasonable).

If we choose a scan interval randomly (say between 60-75), the first peer that fires a republish resets the received time on the record on all other peers that are still among the K closest to the key. This should minimise the number of re-publish calls(This optimisation is mentioned in the paper).

SGTM.

aarshkshah1992 · 2019-10-14T02:22:13Z

@bigs @Stebalien @raulk

This PR needs some love. Please take a look when you can :)

Stebalien · 2020-05-21T23:07:50Z

While implementing this feature, we've discovered a chain of dependencies. In reverse order:

Implement this feature.
To safely implement this feature, we need to ensure that a peer can't send out provider records on behalf of other peers. At the moment, peers only accept provider records from the peer providing the content. However, once we start supporting re-publishing, peers will need to re-publish records from other peers. To be secure, those records need to be signed by the providing peer.
To start ensure these signed records aren't too large, we need to switch to more space efficient signature algorithm (ed25519) by default. RSA keys & signatures are at least 256 bytes. Ed25519 keys & signatures are ~32 bytes.
Ed25519 peer IDs inline the public key in the per ID, instead of hashing it. Unfortunately, this makes them slightly larger (4 bytes) than RSA peer IDs (the current default). When encoded in multibase-base32, they encode to 65 bytes, which is 2 bytes over the maximum subdomain size. This breaks go-ipfs subdomain gateway feature where go-ipfs puts peer IDs in subdomains for IPNS support: https://bafz...ipns.io.
To shrink the peer ID sizes in subdomains, we've decided to switch to encoding them with base36 by default (in subdomains, in go-ipfs). When encoded with base36, ed25519 IPNS keys now just barely fit in a single subdomain.
To switch to base36, we need to define a base36 multibase, implement it in multiple languages, and ideally include this implementation in at least one go-ipfs and js-ipfs release before we start using it.

dokterbob · 2020-11-04T09:40:31Z

@Stebalien Thanks for the summery. If I read it well, considering the current state of affairs (at least go-ipfs), it seems that republishing is blocked by the lack of signatures for provider records. Is this correct?

Is there a current plan/spec/issue to implement signed provider records?

Stebalien · 2020-11-04T16:53:01Z

In reverse order, we've now finished steps 6-3. Unfortunately, the current go-ipfs team is tiny. Fortunately, this is now changing. Unfortunately, people currently moving back to the team are working towards other goals at the moment (@aarshkshah1992 will be working on NAT traversal, packet orientation, and connection establishment speed, for example).

* Somewhat complete the implementation of Kademlia records. This commit relates to [libp2p-146] and [libp2p-1089]. * All records expire (by default, configurable). * Provider records are also stored in the RecordStore, and the RecordStore API extended. * Background jobs for periodic (re-)replication and (re-)publication of records. Regular (value-)records are subject to re-replication and re-publication as per standard Kademlia. Provider records are only subject to re-publication. * For standard Kademlia value lookups (quorum = 1), the record is cached at the closest peer to the key that did not return the value, as per standard Kademlia. * Expiration times of regular (value-)records is computed exponentially inversely proportional to the number of nodes between the local node and the closest node known to the key (beyond the k closest), as per standard Kademlia. The protobuf messages are extended with two fields: `ttl` and `publisher` in order to implement the different semantics of re-replication (by any of the k closest peers to the key, not affecting expiry) and re-publication (by the original publisher, resetting the expiry). This is not done yet in other libp2p Kademlia implementations, see e.g. [libp2p-go-323]. The new protobuf fields have been given somewhat unique identifiers to prevent future collision. Similarly, periodic re-publication of provider records does not seem to be done yet in other implementations, see e.g. [libp2p-js-98]. [libp2p-146]: libp2p/rust-libp2p#146 [libp2p-1089]: libp2p/rust-libp2p#1089 [libp2p-go-323]: libp2p/go-libp2p-kad-dht#323 [libp2p-js-98]: libp2p/js-libp2p-kad-dht#98 * Tweak kad-ipfs example. * Add missing files. * Ensure new delays are polled immediately. To ensure task notification, since `NotReady` is returned right after. * Fix ipfs-kad example and use wasm_timer. * Small cleanup. * Incorporate some feedback. * Adjustments after rebase. * Distinguish events further. In order for a user to easily distinguish the result of e.g. a `put_record` operation from the result of a later republication, different event constructors are used. Furthermore, for now, re-replication and "caching" of records (at the closest peer to the key that did not return a value during a successful lookup) do not yield events for now as they are less interesting. * Speed up tests for CI. * Small refinements and more documentation. * Guard a node against overriding records for which it considers itself to be the publisher. * Document the jobs module more extensively. * More inline docs around removal of "unreachable" addresses. * Remove wildcard re-exports. * Use NonZeroUsize for the constants. * Re-add method lost on merge. * Add missing 'pub'. * Further increase the timeout in the ipfs-kad example. * Readd log dependency to libp2p-kad. * Simplify RecordStore API slightly. * Some more commentary. * Change Addresses::remove to return Result<(),()>. Change the semantics of `Addresses::remove` so that the error case is unambiguous, instead of the success case. Use the `Result` for clearer semantics to that effect. * Add some documentation to .

jacobheun mentioned this issue Apr 22, 2019

Already provided content not sent to new peers libp2p/js-libp2p-kad-dht#98

Open

This was referenced Apr 27, 2019

fix: performance improvements libp2p/js-libp2p-kad-dht#107

Merged

WIP callbacks -> async / await libp2p/js-libp2p-kad-dht#82

Closed

romanb mentioned this issue Jun 30, 2019

Kademlia: Somewhat complete the records implementation. libp2p/rust-libp2p#1189

Merged

dirkmc mentioned this issue Jul 12, 2019

Correctly Implement Kademlia Rebalancing #354

Closed

aarshkshah1992 mentioned this issue Sep 2, 2019

Expire and republish records received via PutValue #388

Open

Stebalien assigned aarshkshah1992 Sep 5, 2019

Stebalien changed the title ~~Expire and republish records received via PUT_VALUE~~ Expire and republish records received via PUT_VALUE and ADD_PROVIDER May 21, 2020

Stebalien mentioned this issue May 21, 2020

Content Resolution And Gateway Performance ipfs/kubo#6383

Open

rach-id mentioned this issue Jan 26, 2023

DHT values republishing celestiaorg/orchestrator-relayer#75

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expire and republish records received via PUT_VALUE and ADD_PROVIDER #323

Expire and republish records received via PUT_VALUE and ADD_PROVIDER #323

raulk commented Apr 18, 2019 •

edited by Stebalien

vyzo commented Apr 18, 2019

raulk commented Apr 18, 2019

dirkmc commented Apr 18, 2019

raulk commented Apr 18, 2019

Stebalien commented Apr 18, 2019

raulk commented Apr 18, 2019

dirkmc commented Apr 19, 2019

jacobheun commented Apr 23, 2019

vyzo commented Apr 23, 2019

Stebalien commented Jul 18, 2019

aarshkshah1992 commented Jul 18, 2019

Stebalien commented Jul 25, 2019

aarshkshah1992 commented Aug 12, 2019

Stebalien commented Aug 12, 2019

aarshkshah1992 commented Aug 20, 2019 •

edited

aarshkshah1992 commented Sep 2, 2019 •

edited

Stebalien commented Sep 5, 2019

aarshkshah1992 commented Oct 14, 2019 •

edited

Stebalien commented May 21, 2020

dokterbob commented Nov 4, 2020

Stebalien commented Nov 4, 2020

Expire and republish records received via PUT_VALUE and ADD_PROVIDER #323

Expire and republish records received via PUT_VALUE and ADD_PROVIDER #323

Comments

raulk commented Apr 18, 2019 • edited by Stebalien

vyzo commented Apr 18, 2019

raulk commented Apr 18, 2019

dirkmc commented Apr 18, 2019

raulk commented Apr 18, 2019

Stebalien commented Apr 18, 2019

raulk commented Apr 18, 2019

dirkmc commented Apr 19, 2019

jacobheun commented Apr 23, 2019

vyzo commented Apr 23, 2019

Stebalien commented Jul 18, 2019

aarshkshah1992 commented Jul 18, 2019

Stebalien commented Jul 25, 2019

aarshkshah1992 commented Aug 12, 2019

Stebalien commented Aug 12, 2019

aarshkshah1992 commented Aug 20, 2019 • edited

aarshkshah1992 commented Sep 2, 2019 • edited

TODO

Stebalien commented Sep 5, 2019

aarshkshah1992 commented Oct 14, 2019 • edited

Stebalien commented May 21, 2020

dokterbob commented Nov 4, 2020

Stebalien commented Nov 4, 2020

raulk commented Apr 18, 2019 •

edited by Stebalien

aarshkshah1992 commented Aug 20, 2019 •

edited

aarshkshah1992 commented Sep 2, 2019 •

edited

aarshkshah1992 commented Oct 14, 2019 •

edited