Skip to content

Commit

Permalink
P2P decentralization improvements (#5329)
Browse files Browse the repository at this point in the history
## Motivation

This PR implements changes needed for spacemeshos/pm#275, except for measurement

## Changes
* Introduce Routing Discovery to contact peers behind NATs
* Introduce dynamic v2 relay discovery which is needed for hole punching. The idea is to have a wider array of circuit-v2 passive relays which should be much safer than old libp2p active relays (which were disabled in e.g. Filecoin due to security concerns)
* Introduce QUIC transport to improve chances at hole punching, with testnet-mainnet "crosstalk" protection based on a transport-level handshake mechanism
  * the handshake is not used on mainnet. That way, connections between mainnet and testnet nodes are still prevented, as testnet peers expect the handshake, but if/when my libp2p changes are merged (libp2p/go-libp2p#2658) or libp2p gets private network support
* Make it possible to listen on multiple addresses and advertise multiple addresses
* Extend DebugService with additional P2P info needed for hole punching diagnostics (needs spacemeshos/api#285)
* Add `ping-peers` config option to facilitate P2P network issue diagnostics
* Add `force-dht-server` config option that is useful during troubleshooting DHT and hole-punching issues

`ping-peers` and `force-dht-server` were initially considered to be temporary features, but I think it might make sense to keep them for various P2P network troubleshooting scenarios.

All of the changes are disabled in the config by default, except for:
* libp2p Ping service is enabled by default to make diagnostics easier
* DHT Values and Providers as these will make DHT Routing Peer discovery work efficiently from the beginning when we enable this feature in the configs
* Bootnodes aren't used as relays by default anymore. v2 relays have very limited capacity by default and bootnode relay servers' reservations are very quickly exhausted. Need to either specify a static relay list or enable routing discovery, which searches for more available relays as needed

## Test Plan
* Tested using k8s several clusters with cone NATs enabled via `bridge` CNI plugin (via Multus) -- backported to v1.2.8
* Added a Mac node for testing

## TODO
- [x] Have spacemeshos/api#285 merged and updated to the new `api` release
- [x] Retest using an image based on this branch (not backport)
- [ ] Decide on whether/how to extend systests to include NAT testing
- [ ] To check: TCP holepunching tends to happen more than QUIC (might be related to the handshake mechanism)
- [ ] ~~To consider: try picking up some % (e.g.: 50%) of non-infra peers during routing discovery~~ (doesn't work too well, need something more involved for that)

Maybe as a follow-up (depending on how soon this gets reviewed):
- Include new metrics / check if they're already present
  - NAT type (UDP / TCP) - Cone / Symmetric / Unknown
  - Reachability - Public / Private / Unknown
  - N of "advertised" peers found via routing discovery
  - N of TCP and UDP (QUIC) peers
  - N of peers reached via relayed connections (these being present for a long time may indicate hole-punching troubles, usually relayed connections go away relatively quickly)
  - N of relay reservations this node managed to obtain
  - Whether routing discovery is active or suspended (e.g. b/c `low-peers` N of peers has been reached)
  - Whether DHT is in the `Server` or `Client` mode
- systests checking NATed connections

Co-authored-by: Ivan Shvedunov <ivan4th@users.noreply.github.com>
  • Loading branch information
2 people authored and dsmello committed Dec 28, 2023
1 parent abf6339 commit fd69fb4
Show file tree
Hide file tree
Showing 28 changed files with 2,182 additions and 225 deletions.
9 changes: 9 additions & 0 deletions CHANGELOG.md
Expand Up @@ -89,6 +89,15 @@ for more information on how to configure the node to work with the PoST service.
query rewards by smesherID. Additionally, it does not re-index old data. Rewards will contain smesherID going forward,
but to refresh data for all rewards, a node will have to delete its database and resync from genesis.

* [#5329](https://github.com/spacemeshos/go-spacemesh/pull/5329) P2P decentralization improvements. Added support for QUIC
transport and DHT routing discovery for finding peers and relays. Also, added the `ping-peers` feature which is useful
during connectivity troubleshooting. `static-relays` feature can be used to provide a static list of circuit v2 relays
nodes when automatic relay discovery is not desired. All of the relay server resource settings are now configurable. Most
of the new functionality is disabled by default unless explicitly enabled in the config via `enable-routing-discovery`,
`routing-discovery-advertise`, `enable-quic-transport`, `static-relays` and `ping-peers` options in the `p2p` config
section. The non-conditional changes include values/provides support on all of the nodes, which will enable DHT to
function efficiently for routing discovery.

## Release v1.2.9

### Improvements
Expand Down
43 changes: 43 additions & 0 deletions README.md
Expand Up @@ -439,6 +439,49 @@ as on UNIX-based systems.
- This is a great way to get a feel for the protocol and the platform and to start hacking on Spacemesh.
- Follow the steps in our [Local Testnet Guide](https://testnet.spacemesh.io/#/README)

### Improved decentralization and P2P diagnostic features

**WARNING! THIS IS EXPERIMENTAL FUNCTIONALITY, USE WITH CARE!**

In order to make the p2p network more decentralized, the following options are provided:
- `"enable-routing-discovery": true`: enables routing discovery for finding new peers, including those behind NAT, ans
also for discovering relay nodes which are used for NAT hole punching. Note that hole punching can be done when both
ends of the connection are behind an endpoint-independent ("cone") NAT.
- `"routing-discovery-advertise": true` advertises this node for discovery by other peers, even if it is behind NAT.
- `"enable-quic-transport": true`: enables QUIC transport which, together with TCP transport, heightens the changes of
successful NAT hole punching.
- `"enable-tcp-transport": false` disables TCP transport. This option is intended to be used for debugging purposes
only!
- `"static-relays": ["/dns4/relay.example.com/udp/5000/quic-v1/p2p/...", ...]` provides a static list of relay nodes for
use for NAT hole punching in case of routing discovery based relay search is not to be used.
- `"ping-peers": ["p2p_id_1", "p2p_id_2", ...]` runs P2P ping against the specified peers, logging the results.

For the purpose of debugging P2P connectivity issues, the following command can also be used:
```console
$ grpcurl -plaintext 127.0.0.1:9093 spacemesh.v1.DebugService.NetworkInfo
{
"id": "12D3Koo...",
"listenAddresses": [
"/ip4/0.0.0.0/tcp/50212",
"/ip4/0.0.0.0/udp/59458/quic-v1",
"/p2p-circuit"
],
"knownAddresses": [
"/ip4/127.0.0.1/tcp/50212",
"/ip4/127.0.0.1/udp/59458/quic-v1",
"/ip4/192.168.33.5/tcp/50212",
"/ip4/192.168.33.5/udp/59458/quic-v1",
"/ip4/.../tcp/37670/p2p/12D3Koo.../p2p-circuit",
"/ip4/.../udp/37659/quic-v1/p2p/12D3Koo.../p2p-circuit",
"/ip4/.../tcp/31960/p2p/12D3Koo.../p2p-circuit",
"/ip4/.../udp/33377/quic-v1/p2p/12D3Koo.../p2p-circuit"
],
"natTypeUdp": "Cone",
"natTypeTcp": "Cone",
"reachability": "Private"
}
```

#### Next Steps

- Please visit our [wiki](https://github.com/spacemeshos/go-spacemesh/wiki)
Expand Down
46 changes: 42 additions & 4 deletions api/grpcserver/debug_service.go
Expand Up @@ -3,9 +3,11 @@ package grpcserver
import (
"context"
"fmt"
"sort"

"github.com/grpc-ecosystem/go-grpc-middleware/logging/zap/ctxzap"
"github.com/grpc-ecosystem/grpc-gateway/v2/runtime"
"github.com/libp2p/go-libp2p/core/network"
pb "github.com/spacemeshos/api/release/go/spacemesh/v1"
"go.uber.org/zap"
"google.golang.org/grpc"
Expand All @@ -24,7 +26,7 @@ import (
type DebugService struct {
db *sql.Database
conState conservativeState
identity networkIdentity
netInfo networkInfo
oracle oracle
}

Expand All @@ -43,11 +45,11 @@ func (d DebugService) String() string {
}

// NewDebugService creates a new grpc service using config data.
func NewDebugService(db *sql.Database, conState conservativeState, host networkIdentity, oracle oracle) *DebugService {
func NewDebugService(db *sql.Database, conState conservativeState, host networkInfo, oracle oracle) *DebugService {
return &DebugService{
db: db,
conState: conState,
identity: host,
netInfo: host,
oracle: oracle,
}
}
Expand Down Expand Up @@ -91,7 +93,21 @@ func (d DebugService) Accounts(ctx context.Context, in *pb.AccountsRequest) (*pb

// NetworkInfo query provides NetworkInfoResponse.
func (d DebugService) NetworkInfo(ctx context.Context, _ *emptypb.Empty) (*pb.NetworkInfoResponse, error) {
return &pb.NetworkInfoResponse{Id: d.identity.ID().String()}, nil
resp := &pb.NetworkInfoResponse{Id: d.netInfo.ID().String()}
for _, a := range d.netInfo.ListenAddresses() {
resp.ListenAddresses = append(resp.ListenAddresses, a.String())
}
sort.Strings(resp.ListenAddresses)
for _, a := range d.netInfo.KnownAddresses() {
resp.KnownAddresses = append(resp.KnownAddresses, a.String())
}
sort.Strings(resp.KnownAddresses)
udpNATType, tcpNATType := d.netInfo.NATDeviceType()
resp.NatTypeUdp = convertNATType(udpNATType)
resp.NatTypeTcp = convertNATType(tcpNATType)
resp.Reachability = convertReachability(d.netInfo.Reachability())
resp.DhtServerEnabled = d.netInfo.DHTServerEnabled()
return resp, nil
}

// ActiveSet query provides hare active set for the specified epoch.
Expand Down Expand Up @@ -158,3 +174,25 @@ func castEventProposal(ev *events.EventProposal) *pb.Proposal {
}
return proposal
}

func convertNATType(natType network.NATDeviceType) pb.NetworkInfoResponse_NATType {
switch natType {
case network.NATDeviceTypeCone:
return pb.NetworkInfoResponse_Cone
case network.NATDeviceTypeSymmetric:
return pb.NetworkInfoResponse_Symmetric
default:
return pb.NetworkInfoResponse_NATTypeUnknown
}
}

func convertReachability(r network.Reachability) pb.NetworkInfoResponse_Reachability {
switch r {
case network.ReachabilityPublic:
return pb.NetworkInfoResponse_Public
case network.ReachabilityPrivate:
return pb.NetworkInfoResponse_Private
default:
return pb.NetworkInfoResponse_ReachabilityUnknown
}
}
36 changes: 33 additions & 3 deletions api/grpcserver/grpcserver_test.go
Expand Up @@ -17,6 +17,8 @@ import (
"testing"
"time"

"github.com/libp2p/go-libp2p/core/network"
ma "github.com/multiformats/go-multiaddr"
pb "github.com/spacemeshos/api/release/go/spacemesh/v1"
"github.com/spacemeshos/merkle-tree"
"github.com/spacemeshos/poet/shared"
Expand Down Expand Up @@ -2043,10 +2045,10 @@ func TestMultiService(t *testing.T) {

func TestDebugService(t *testing.T) {
ctrl := gomock.NewController(t)
identity := NewMocknetworkIdentity(ctrl)
netInfo := NewMocknetworkInfo(ctrl)
mOracle := NewMockoracle(ctrl)
db := sql.InMemory()
svc := NewDebugService(db, conStateAPI, identity, mOracle)
svc := NewDebugService(db, conStateAPI, netInfo, mOracle)
cfg, cleanup := launchServer(t, svc)
t.Cleanup(cleanup)

Expand Down Expand Up @@ -2097,13 +2099,33 @@ func TestDebugService(t *testing.T) {

t.Run("networkID", func(t *testing.T) {
id := p2p.Peer("test")
identity.EXPECT().ID().Return(id)
netInfo.EXPECT().ID().Return(id)
netInfo.EXPECT().ListenAddresses().Return([]ma.Multiaddr{
mustParseMultiaddr("/ip4/0.0.0.0/tcp/5000"),
mustParseMultiaddr("/ip4/0.0.0.0/udp/5001/quic-v1"),
})
netInfo.EXPECT().KnownAddresses().Return([]ma.Multiaddr{
mustParseMultiaddr("/ip4/10.36.0.221/tcp/5000"),
mustParseMultiaddr("/ip4/10.36.0.221/udp/5001/quic-v1"),
})
netInfo.EXPECT().NATDeviceType().Return(network.NATDeviceTypeCone, network.NATDeviceTypeSymmetric)
netInfo.EXPECT().Reachability().Return(network.ReachabilityPrivate)
netInfo.EXPECT().DHTServerEnabled().Return(true)

response, err := c.NetworkInfo(context.Background(), &emptypb.Empty{})
require.NoError(t, err)
require.NotNil(t, response)
require.Equal(t, id.String(), response.Id)
require.Equal(t, []string{"/ip4/0.0.0.0/tcp/5000", "/ip4/0.0.0.0/udp/5001/quic-v1"},
response.ListenAddresses)
require.Equal(t, []string{"/ip4/10.36.0.221/tcp/5000", "/ip4/10.36.0.221/udp/5001/quic-v1"},
response.KnownAddresses)
require.Equal(t, pb.NetworkInfoResponse_Cone, response.NatTypeUdp)
require.Equal(t, pb.NetworkInfoResponse_Symmetric, response.NatTypeTcp)
require.Equal(t, pb.NetworkInfoResponse_Private, response.Reachability)
require.True(t, response.DhtServerEnabled)
})

t.Run("ActiveSet", func(t *testing.T) {
epoch := types.EpochID(3)
activeSet := types.RandomActiveSet(11)
Expand Down Expand Up @@ -2445,3 +2467,11 @@ func TestMeshService_EpochStream(t *testing.T) {
}
require.ElementsMatch(t, expected, got)
}

func mustParseMultiaddr(s string) ma.Multiaddr {
maddr, err := ma.NewMultiaddr(s)
if err != nil {
panic("can't parse multiaddr: " + err.Error())
}
return maddr
}
12 changes: 10 additions & 2 deletions api/grpcserver/interface.go
Expand Up @@ -4,6 +4,9 @@ import (
"context"
"time"

"github.com/libp2p/go-libp2p/core/network"
ma "github.com/multiformats/go-multiaddr"

"github.com/spacemeshos/go-spacemesh/activation"
"github.com/spacemeshos/go-spacemesh/common/types"
"github.com/spacemeshos/go-spacemesh/p2p"
Expand All @@ -12,9 +15,14 @@ import (

//go:generate mockgen -typed -package=grpcserver -destination=./mocks.go -source=./interface.go

// networkIdentity interface.
type networkIdentity interface {
// networkInfo interface.
type networkInfo interface {
ID() p2p.Peer
ListenAddresses() []ma.Multiaddr
KnownAddresses() []ma.Multiaddr
NATDeviceType() (udpNATType, tcpNATType network.NATDeviceType)
Reachability() network.Reachability
DHTServerEnabled() bool
}

// conservativeState is an API for reading state and transaction/mempool data.
Expand Down

0 comments on commit fd69fb4

Please sign in to comment.