New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add NetworkCookie option for isolating testnets #2658
base: master
Are you sure you want to change the base?
Conversation
This adds NetworkCookie libp2p option that prevents peers with different NetworkCookies from connecting to each other. The peers without a NetworkCookie can't connect to or receive a connection from a peer with a NetworkCookie. This is not a replacement for private networks, just a safeguard against unintended testnet / mainnet interaction.
This is a massive change. If you want to make this happen, this will need proper specification in the specs repo before we can make any progress here. |
@marten-seemann The change is not that big if you don't count the tests, but sure, as there are (backward-compatible) changes on the protocol level I will work on a PR for the specs repo. |
This is a wide-reaching spec change and it's not clear that it's the right solution. It definitely requires a spec proposal so we can discuss alternatives. The usual way to fix this is:
Otherwise, it's impossible for peers to connect to both networks at the same time which is actually desirable in some cases. To be clear, private networks were never intended to be used as a way to isolate networks this way, they were designed to provide authentication and privacy. |
## Motivation This PR implements changes needed for spacemeshos/pm#275, except for measurement ## Changes * Introduce Routing Discovery to contact peers behind NATs * Introduce dynamic v2 relay discovery which is needed for hole punching. The idea is to have a wider array of circuit-v2 passive relays which should be much safer than old libp2p active relays (which were disabled in e.g. Filecoin due to security concerns) * Introduce QUIC transport to improve chances at hole punching, with testnet-mainnet "crosstalk" protection based on a transport-level handshake mechanism * the handshake is not used on mainnet. That way, connections between mainnet and testnet nodes are still prevented, as testnet peers expect the handshake, but if/when my libp2p changes are merged (libp2p/go-libp2p#2658) or libp2p gets private network support * Make it possible to listen on multiple addresses and advertise multiple addresses * Extend DebugService with additional P2P info needed for hole punching diagnostics (needs spacemeshos/api#285) * Add `ping-peers` config option to facilitate P2P network issue diagnostics * Add `force-dht-server` config option that is useful during troubleshooting DHT and hole-punching issues `ping-peers` and `force-dht-server` were initially considered to be temporary features, but I think it might make sense to keep them for various P2P network troubleshooting scenarios. All of the changes are disabled in the config by default, except for: * libp2p Ping service is enabled by default to make diagnostics easier * DHT Values and Providers as these will make DHT Routing Peer discovery work efficiently from the beginning when we enable this feature in the configs * Bootnodes aren't used as relays by default anymore. v2 relays have very limited capacity by default and bootnode relay servers' reservations are very quickly exhausted. Need to either specify a static relay list or enable routing discovery, which searches for more available relays as needed ## Test Plan * Tested using k8s several clusters with cone NATs enabled via `bridge` CNI plugin (via Multus) -- backported to v1.2.8 * Added a Mac node for testing ## TODO - [x] Have spacemeshos/api#285 merged and updated to the new `api` release - [x] Retest using an image based on this branch (not backport) - [ ] Decide on whether/how to extend systests to include NAT testing - [ ] To check: TCP holepunching tends to happen more than QUIC (might be related to the handshake mechanism) - [ ] ~~To consider: try picking up some % (e.g.: 50%) of non-infra peers during routing discovery~~ (doesn't work too well, need something more involved for that) Maybe as a follow-up (depending on how soon this gets reviewed): - Include new metrics / check if they're already present - NAT type (UDP / TCP) - Cone / Symmetric / Unknown - Reachability - Public / Private / Unknown - N of "advertised" peers found via routing discovery - N of TCP and UDP (QUIC) peers - N of peers reached via relayed connections (these being present for a long time may indicate hole-punching troubles, usually relayed connections go away relatively quickly) - N of relay reservations this node managed to obtain - Whether routing discovery is active or suspended (e.g. b/c `low-peers` N of peers has been reached) - Whether DHT is in the `Server` or `Client` mode - systests checking NATed connections Co-authored-by: Ivan Shvedunov <ivan4th@users.noreply.github.com>
@@ -128,6 +128,8 @@ type Config struct { | |||
DialRanker network.DialRanker | |||
|
|||
SwarmOpts []swarm.Option | |||
|
|||
NetworkCookie crypto.NetworkCookie |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should rebrand this as PNET v2 (and make a specification).
They both seems to achieve the same thing, I havn't red the code enough to know if NetworkCookie
is private information, but that doesn't sounds very hard to ensure we never leak the cookie.
This wouldn't solve the problems raised by @Stebalien however.
This PR implements changes needed for spacemeshos/pm#275, except for measurement * Introduce Routing Discovery to contact peers behind NATs * Introduce dynamic v2 relay discovery which is needed for hole punching. The idea is to have a wider array of circuit-v2 passive relays which should be much safer than old libp2p active relays (which were disabled in e.g. Filecoin due to security concerns) * Introduce QUIC transport to improve chances at hole punching, with testnet-mainnet "crosstalk" protection based on a transport-level handshake mechanism * the handshake is not used on mainnet. That way, connections between mainnet and testnet nodes are still prevented, as testnet peers expect the handshake, but if/when my libp2p changes are merged (libp2p/go-libp2p#2658) or libp2p gets private network support * Make it possible to listen on multiple addresses and advertise multiple addresses * Extend DebugService with additional P2P info needed for hole punching diagnostics (needs spacemeshos/api#285) * Add `ping-peers` config option to facilitate P2P network issue diagnostics * Add `force-dht-server` config option that is useful during troubleshooting DHT and hole-punching issues `ping-peers` and `force-dht-server` were initially considered to be temporary features, but I think it might make sense to keep them for various P2P network troubleshooting scenarios. All of the changes are disabled in the config by default, except for: * libp2p Ping service is enabled by default to make diagnostics easier * DHT Values and Providers as these will make DHT Routing Peer discovery work efficiently from the beginning when we enable this feature in the configs * Bootnodes aren't used as relays by default anymore. v2 relays have very limited capacity by default and bootnode relay servers' reservations are very quickly exhausted. Need to either specify a static relay list or enable routing discovery, which searches for more available relays as needed * Tested using k8s several clusters with cone NATs enabled via `bridge` CNI plugin (via Multus) -- backported to v1.2.8 * Added a Mac node for testing - [x] Have spacemeshos/api#285 merged and updated to the new `api` release - [x] Retest using an image based on this branch (not backport) - [ ] Decide on whether/how to extend systests to include NAT testing - [ ] To check: TCP holepunching tends to happen more than QUIC (might be related to the handshake mechanism) - [ ] ~~To consider: try picking up some % (e.g.: 50%) of non-infra peers during routing discovery~~ (doesn't work too well, need something more involved for that) Maybe as a follow-up (depending on how soon this gets reviewed): - Include new metrics / check if they're already present - NAT type (UDP / TCP) - Cone / Symmetric / Unknown - Reachability - Public / Private / Unknown - N of "advertised" peers found via routing discovery - N of TCP and UDP (QUIC) peers - N of peers reached via relayed connections (these being present for a long time may indicate hole-punching troubles, usually relayed connections go away relatively quickly) - N of relay reservations this node managed to obtain - Whether routing discovery is active or suspended (e.g. b/c `low-peers` N of peers has been reached) - Whether DHT is in the `Server` or `Client` mode - systests checking NATed connections Co-authored-by: Ivan Shvedunov <ivan4th@users.noreply.github.com>
This PR implements changes needed for spacemeshos/pm#275, except for measurement * Introduce Routing Discovery to contact peers behind NATs * Introduce dynamic v2 relay discovery which is needed for hole punching. The idea is to have a wider array of circuit-v2 passive relays which should be much safer than old libp2p active relays (which were disabled in e.g. Filecoin due to security concerns) * Introduce QUIC transport to improve chances at hole punching, with testnet-mainnet "crosstalk" protection based on a transport-level handshake mechanism * the handshake is not used on mainnet. That way, connections between mainnet and testnet nodes are still prevented, as testnet peers expect the handshake, but if/when my libp2p changes are merged (libp2p/go-libp2p#2658) or libp2p gets private network support * Make it possible to listen on multiple addresses and advertise multiple addresses * Extend DebugService with additional P2P info needed for hole punching diagnostics (needs spacemeshos/api#285) * Add `ping-peers` config option to facilitate P2P network issue diagnostics * Add `force-dht-server` config option that is useful during troubleshooting DHT and hole-punching issues `ping-peers` and `force-dht-server` were initially considered to be temporary features, but I think it might make sense to keep them for various P2P network troubleshooting scenarios. All of the changes are disabled in the config by default, except for: * libp2p Ping service is enabled by default to make diagnostics easier * DHT Values and Providers as these will make DHT Routing Peer discovery work efficiently from the beginning when we enable this feature in the configs * Bootnodes aren't used as relays by default anymore. v2 relays have very limited capacity by default and bootnode relay servers' reservations are very quickly exhausted. Need to either specify a static relay list or enable routing discovery, which searches for more available relays as needed * Tested using k8s several clusters with cone NATs enabled via `bridge` CNI plugin (via Multus) -- backported to v1.2.8 * Added a Mac node for testing - [x] Have spacemeshos/api#285 merged and updated to the new `api` release - [x] Retest using an image based on this branch (not backport) - [ ] Decide on whether/how to extend systests to include NAT testing - [ ] To check: TCP holepunching tends to happen more than QUIC (might be related to the handshake mechanism) - [ ] ~~To consider: try picking up some % (e.g.: 50%) of non-infra peers during routing discovery~~ (doesn't work too well, need something more involved for that) Maybe as a follow-up (depending on how soon this gets reviewed): - Include new metrics / check if they're already present - NAT type (UDP / TCP) - Cone / Symmetric / Unknown - Reachability - Public / Private / Unknown - N of "advertised" peers found via routing discovery - N of TCP and UDP (QUIC) peers - N of peers reached via relayed connections (these being present for a long time may indicate hole-punching troubles, usually relayed connections go away relatively quickly) - N of relay reservations this node managed to obtain - Whether routing discovery is active or suspended (e.g. b/c `low-peers` N of peers has been reached) - Whether DHT is in the `Server` or `Client` mode - systests checking NATed connections Co-authored-by: Ivan Shvedunov <ivan4th@users.noreply.github.com>
This PR implements changes needed for spacemeshos/pm#275, except for measurement * Introduce Routing Discovery to contact peers behind NATs * Introduce dynamic v2 relay discovery which is needed for hole punching. The idea is to have a wider array of circuit-v2 passive relays which should be much safer than old libp2p active relays (which were disabled in e.g. Filecoin due to security concerns) * Introduce QUIC transport to improve chances at hole punching, with testnet-mainnet "crosstalk" protection based on a transport-level handshake mechanism * the handshake is not used on mainnet. That way, connections between mainnet and testnet nodes are still prevented, as testnet peers expect the handshake, but if/when my libp2p changes are merged (libp2p/go-libp2p#2658) or libp2p gets private network support * Make it possible to listen on multiple addresses and advertise multiple addresses * Extend DebugService with additional P2P info needed for hole punching diagnostics (needs spacemeshos/api#285) * Add `ping-peers` config option to facilitate P2P network issue diagnostics * Add `force-dht-server` config option that is useful during troubleshooting DHT and hole-punching issues `ping-peers` and `force-dht-server` were initially considered to be temporary features, but I think it might make sense to keep them for various P2P network troubleshooting scenarios. All of the changes are disabled in the config by default, except for: * libp2p Ping service is enabled by default to make diagnostics easier * DHT Values and Providers as these will make DHT Routing Peer discovery work efficiently from the beginning when we enable this feature in the configs * Bootnodes aren't used as relays by default anymore. v2 relays have very limited capacity by default and bootnode relay servers' reservations are very quickly exhausted. Need to either specify a static relay list or enable routing discovery, which searches for more available relays as needed * Tested using k8s several clusters with cone NATs enabled via `bridge` CNI plugin (via Multus) -- backported to v1.2.8 * Added a Mac node for testing - [x] Have spacemeshos/api#285 merged and updated to the new `api` release - [x] Retest using an image based on this branch (not backport) - [ ] Decide on whether/how to extend systests to include NAT testing - [ ] To check: TCP holepunching tends to happen more than QUIC (might be related to the handshake mechanism) - [ ] ~~To consider: try picking up some % (e.g.: 50%) of non-infra peers during routing discovery~~ (doesn't work too well, need something more involved for that) Maybe as a follow-up (depending on how soon this gets reviewed): - Include new metrics / check if they're already present - NAT type (UDP / TCP) - Cone / Symmetric / Unknown - Reachability - Public / Private / Unknown - N of "advertised" peers found via routing discovery - N of TCP and UDP (QUIC) peers - N of peers reached via relayed connections (these being present for a long time may indicate hole-punching troubles, usually relayed connections go away relatively quickly) - N of relay reservations this node managed to obtain - Whether routing discovery is active or suspended (e.g. b/c `low-peers` N of peers has been reached) - Whether DHT is in the `Server` or `Client` mode - systests checking NATed connections Co-authored-by: Ivan Shvedunov <ivan4th@users.noreply.github.com>
This PR implements changes needed for spacemeshos/pm#275, except for measurement * Introduce Routing Discovery to contact peers behind NATs * Introduce dynamic v2 relay discovery which is needed for hole punching. The idea is to have a wider array of circuit-v2 passive relays which should be much safer than old libp2p active relays (which were disabled in e.g. Filecoin due to security concerns) * Introduce QUIC transport to improve chances at hole punching, with testnet-mainnet "crosstalk" protection based on a transport-level handshake mechanism * the handshake is not used on mainnet. That way, connections between mainnet and testnet nodes are still prevented, as testnet peers expect the handshake, but if/when my libp2p changes are merged (libp2p/go-libp2p#2658) or libp2p gets private network support * Make it possible to listen on multiple addresses and advertise multiple addresses * Extend DebugService with additional P2P info needed for hole punching diagnostics (needs spacemeshos/api#285) * Add `ping-peers` config option to facilitate P2P network issue diagnostics * Add `force-dht-server` config option that is useful during troubleshooting DHT and hole-punching issues `ping-peers` and `force-dht-server` were initially considered to be temporary features, but I think it might make sense to keep them for various P2P network troubleshooting scenarios. All of the changes are disabled in the config by default, except for: * libp2p Ping service is enabled by default to make diagnostics easier * DHT Values and Providers as these will make DHT Routing Peer discovery work efficiently from the beginning when we enable this feature in the configs * Bootnodes aren't used as relays by default anymore. v2 relays have very limited capacity by default and bootnode relay servers' reservations are very quickly exhausted. Need to either specify a static relay list or enable routing discovery, which searches for more available relays as needed * Tested using k8s several clusters with cone NATs enabled via `bridge` CNI plugin (via Multus) -- backported to v1.2.8 * Added a Mac node for testing - [x] Have spacemeshos/api#285 merged and updated to the new `api` release - [x] Retest using an image based on this branch (not backport) - [ ] Decide on whether/how to extend systests to include NAT testing - [ ] To check: TCP holepunching tends to happen more than QUIC (might be related to the handshake mechanism) - [ ] ~~To consider: try picking up some % (e.g.: 50%) of non-infra peers during routing discovery~~ (doesn't work too well, need something more involved for that) Maybe as a follow-up (depending on how soon this gets reviewed): - Include new metrics / check if they're already present - NAT type (UDP / TCP) - Cone / Symmetric / Unknown - Reachability - Public / Private / Unknown - N of "advertised" peers found via routing discovery - N of TCP and UDP (QUIC) peers - N of peers reached via relayed connections (these being present for a long time may indicate hole-punching troubles, usually relayed connections go away relatively quickly) - N of relay reservations this node managed to obtain - Whether routing discovery is active or suspended (e.g. b/c `low-peers` N of peers has been reached) - Whether DHT is in the `Server` or `Client` mode - systests checking NATed connections Co-authored-by: Ivan Shvedunov <ivan4th@users.noreply.github.com>
## Motivation This PR implements changes needed for spacemeshos/pm#275, except for measurement ## Changes * Introduce Routing Discovery to contact peers behind NATs * Introduce dynamic v2 relay discovery which is needed for hole punching. The idea is to have a wider array of circuit-v2 passive relays which should be much safer than old libp2p active relays (which were disabled in e.g. Filecoin due to security concerns) * Introduce QUIC transport to improve chances at hole punching, with testnet-mainnet "crosstalk" protection based on a transport-level handshake mechanism * the handshake is not used on mainnet. That way, connections between mainnet and testnet nodes are still prevented, as testnet peers expect the handshake, but if/when my libp2p changes are merged (libp2p/go-libp2p#2658) or libp2p gets private network support * Make it possible to listen on multiple addresses and advertise multiple addresses * Extend DebugService with additional P2P info needed for hole punching diagnostics (needs spacemeshos/api#285) * Add `ping-peers` config option to facilitate P2P network issue diagnostics * Add `force-dht-server` config option that is useful during troubleshooting DHT and hole-punching issues `ping-peers` and `force-dht-server` were initially considered to be temporary features, but I think it might make sense to keep them for various P2P network troubleshooting scenarios. All of the changes are disabled in the config by default, except for: * libp2p Ping service is enabled by default to make diagnostics easier * DHT Values and Providers as these will make DHT Routing Peer discovery work efficiently from the beginning when we enable this feature in the configs * Bootnodes aren't used as relays by default anymore. v2 relays have very limited capacity by default and bootnode relay servers' reservations are very quickly exhausted. Need to either specify a static relay list or enable routing discovery, which searches for more available relays as needed ## Test Plan * Tested using k8s several clusters with cone NATs enabled via `bridge` CNI plugin (via Multus) -- backported to v1.2.8 * Added a Mac node for testing ## TODO - [x] Have spacemeshos/api#285 merged and updated to the new `api` release - [x] Retest using an image based on this branch (not backport) - [ ] Decide on whether/how to extend systests to include NAT testing - [ ] To check: TCP holepunching tends to happen more than QUIC (might be related to the handshake mechanism) - [ ] ~~To consider: try picking up some % (e.g.: 50%) of non-infra peers during routing discovery~~ (doesn't work too well, need something more involved for that) Maybe as a follow-up (depending on how soon this gets reviewed): - Include new metrics / check if they're already present - NAT type (UDP / TCP) - Cone / Symmetric / Unknown - Reachability - Public / Private / Unknown - N of "advertised" peers found via routing discovery - N of TCP and UDP (QUIC) peers - N of peers reached via relayed connections (these being present for a long time may indicate hole-punching troubles, usually relayed connections go away relatively quickly) - N of relay reservations this node managed to obtain - Whether routing discovery is active or suspended (e.g. b/c `low-peers` N of peers has been reached) - Whether DHT is in the `Server` or `Client` mode - systests checking NATed connections Co-authored-by: Ivan Shvedunov <ivan4th@users.noreply.github.com>
This PR implements changes needed for spacemeshos/pm#275, except for measurement * Introduce Routing Discovery to contact peers behind NATs * Introduce dynamic v2 relay discovery which is needed for hole punching. The idea is to have a wider array of circuit-v2 passive relays which should be much safer than old libp2p active relays (which were disabled in e.g. Filecoin due to security concerns) * Introduce QUIC transport to improve chances at hole punching, with testnet-mainnet "crosstalk" protection based on a transport-level handshake mechanism * the handshake is not used on mainnet. That way, connections between mainnet and testnet nodes are still prevented, as testnet peers expect the handshake, but if/when my libp2p changes are merged (libp2p/go-libp2p#2658) or libp2p gets private network support * Make it possible to listen on multiple addresses and advertise multiple addresses * Extend DebugService with additional P2P info needed for hole punching diagnostics (needs spacemeshos/api#285) * Add `ping-peers` config option to facilitate P2P network issue diagnostics * Add `force-dht-server` config option that is useful during troubleshooting DHT and hole-punching issues `ping-peers` and `force-dht-server` were initially considered to be temporary features, but I think it might make sense to keep them for various P2P network troubleshooting scenarios. All of the changes are disabled in the config by default, except for: * libp2p Ping service is enabled by default to make diagnostics easier * DHT Values and Providers as these will make DHT Routing Peer discovery work efficiently from the beginning when we enable this feature in the configs * Bootnodes aren't used as relays by default anymore. v2 relays have very limited capacity by default and bootnode relay servers' reservations are very quickly exhausted. Need to either specify a static relay list or enable routing discovery, which searches for more available relays as needed * Tested using k8s several clusters with cone NATs enabled via `bridge` CNI plugin (via Multus) -- backported to v1.2.8 * Added a Mac node for testing - [x] Have spacemeshos/api#285 merged and updated to the new `api` release - [x] Retest using an image based on this branch (not backport) - [ ] Decide on whether/how to extend systests to include NAT testing - [ ] To check: TCP holepunching tends to happen more than QUIC (might be related to the handshake mechanism) - [ ] ~~To consider: try picking up some % (e.g.: 50%) of non-infra peers during routing discovery~~ (doesn't work too well, need something more involved for that) Maybe as a follow-up (depending on how soon this gets reviewed): - Include new metrics / check if they're already present - NAT type (UDP / TCP) - Cone / Symmetric / Unknown - Reachability - Public / Private / Unknown - N of "advertised" peers found via routing discovery - N of TCP and UDP (QUIC) peers - N of peers reached via relayed connections (these being present for a long time may indicate hole-punching troubles, usually relayed connections go away relatively quickly) - N of relay reservations this node managed to obtain - Whether routing discovery is active or suspended (e.g. b/c `low-peers` N of peers has been reached) - Whether DHT is in the `Server` or `Client` mode - systests checking NATed connections Co-authored-by: Ivan Shvedunov <ivan4th@users.noreply.github.com>
Motivation: sometimes, it's necessary to be able to isolate testnets from the real networks, avoiding some hard-to-diagnose issues. This was previously achievable using Noise Prologue, but was impossible for e.g. QUIC transport.
This adds NetworkCookie libp2p option that prevents peers with different NetworkCookies from connecting to each other. The peers without a NetworkCookie can't connect to or receive a connection from a peer with a NetworkCookie.
This is not a replacement for private networks, just a safeguard against unintended testnet / mainnet interaction.
This PR replaces #2645 which only worked for QUIC and didn't have proper "facade". This change works for all transports.
The usage is as follows
Implementation details:
Additional implementation notes: