Skip to content

Commit

Permalink
P2P decentralization improvements (backport of #5329)
Browse files Browse the repository at this point in the history
This PR implements changes needed for spacemeshos/pm#275, except for measurement

* Introduce Routing Discovery to contact peers behind NATs
* Introduce dynamic v2 relay discovery which is needed for hole punching. The idea is to have a wider array of circuit-v2 passive relays which should be much safer than old libp2p active relays (which were disabled in e.g. Filecoin due to security concerns)
* Introduce QUIC transport to improve chances at hole punching, with testnet-mainnet "crosstalk" protection based on a transport-level handshake mechanism
  * the handshake is not used on mainnet. That way, connections between mainnet and testnet nodes are still prevented, as testnet peers expect the handshake, but if/when my libp2p changes are merged (libp2p/go-libp2p#2658) or libp2p gets private network support
* Make it possible to listen on multiple addresses and advertise multiple addresses
* Extend DebugService with additional P2P info needed for hole punching diagnostics (needs spacemeshos/api#285)
* Add `ping-peers` config option to facilitate P2P network issue diagnostics
* Add `force-dht-server` config option that is useful during troubleshooting DHT and hole-punching issues

`ping-peers` and `force-dht-server` were initially considered to be temporary features, but I think it might make sense to keep them for various P2P network troubleshooting scenarios.

All of the changes are disabled in the config by default, except for:
* libp2p Ping service is enabled by default to make diagnostics easier
* DHT Values and Providers as these will make DHT Routing Peer discovery work efficiently from the beginning when we enable this feature in the configs
* Bootnodes aren't used as relays by default anymore. v2 relays have very limited capacity by default and bootnode relay servers' reservations are very quickly exhausted. Need to either specify a static relay list or enable routing discovery, which searches for more available relays as needed

* Tested using k8s several clusters with cone NATs enabled via `bridge` CNI plugin (via Multus) -- backported to v1.2.8
* Added a Mac node for testing

- [x] Have spacemeshos/api#285 merged and updated to the new `api` release
- [x] Retest using an image based on this branch (not backport)
- [ ] Decide on whether/how to extend systests to include NAT testing
- [ ] To check: TCP holepunching tends to happen more than QUIC (might be related to the handshake mechanism)
- [ ] ~~To consider: try picking up some % (e.g.: 50%) of non-infra peers during routing discovery~~ (doesn't work too well, need something more involved for that)

Maybe as a follow-up (depending on how soon this gets reviewed):
- Include new metrics / check if they're already present
  - NAT type (UDP / TCP) - Cone / Symmetric / Unknown
  - Reachability - Public / Private / Unknown
  - N of "advertised" peers found via routing discovery
  - N of TCP and UDP (QUIC) peers
  - N of peers reached via relayed connections (these being present for a long time may indicate hole-punching troubles, usually relayed connections go away relatively quickly)
  - N of relay reservations this node managed to obtain
  - Whether routing discovery is active or suspended (e.g. b/c `low-peers` N of peers has been reached)
  - Whether DHT is in the `Server` or `Client` mode
- systests checking NATed connections

Co-authored-by: Ivan Shvedunov <ivan4th@users.noreply.github.com>
  • Loading branch information
ivan4th and ivan4th committed Dec 29, 2023
1 parent 14a36df commit 7e896bb
Show file tree
Hide file tree
Showing 28 changed files with 2,186 additions and 227 deletions.
9 changes: 9 additions & 0 deletions CHANGELOG.md
Expand Up @@ -31,6 +31,15 @@ See [RELEASE](./RELEASE.md) for workflow instructions.

* further increased cache sizes and and p2p timeouts to compensate for the increased number of nodes on the network.

* [#5329](https://github.com/spacemeshos/go-spacemesh/pull/5329) P2P decentralization improvements. Added support for QUIC
transport and DHT routing discovery for finding peers and relays. Also, added the `ping-peers` feature which is useful
during connectivity troubleshooting. `static-relays` feature can be used to provide a static list of circuit v2 relays
nodes when automatic relay discovery is not desired. All of the relay server resource settings are now configurable. Most
of the new functionality is disabled by default unless explicitly enabled in the config via `enable-routing-discovery`,
`routing-discovery-advertise`, `enable-quic-transport`, `static-relays` and `ping-peers` options in the `p2p` config
section. The non-conditional changes include values/provides support on all of the nodes, which will enable DHT to
function efficiently for routing discovery.

## Release v1.2.9

### Improvements
Expand Down
43 changes: 43 additions & 0 deletions README.md
Expand Up @@ -296,6 +296,49 @@ Close and reopen powershell to load the new PATH. You can then run the command `
- This is a great way to get a feel for the protocol and the platform and to start hacking on Spacemesh.
- Follow the steps in our [Local Testnet Guide](https://testnet.spacemesh.io/#/README)

### Improved decentralization and P2P diagnostic features

**WARNING! THIS IS EXPERIMENTAL FUNCTIONALITY, USE WITH CARE!**

In order to make the p2p network more decentralized, the following options are provided:
- `"enable-routing-discovery": true`: enables routing discovery for finding new peers, including those behind NAT, ans
also for discovering relay nodes which are used for NAT hole punching. Note that hole punching can be done when both
ends of the connection are behind an endpoint-independent ("cone") NAT.
- `"routing-discovery-advertise": true` advertises this node for discovery by other peers, even if it is behind NAT.
- `"enable-quic-transport": true`: enables QUIC transport which, together with TCP transport, heightens the changes of
successful NAT hole punching.
- `"enable-tcp-transport": false` disables TCP transport. This option is intended to be used for debugging purposes
only!
- `"static-relays": ["/dns4/relay.example.com/udp/5000/quic-v1/p2p/...", ...]` provides a static list of relay nodes for
use for NAT hole punching in case of routing discovery based relay search is not to be used.
- `"ping-peers": ["p2p_id_1", "p2p_id_2", ...]` runs P2P ping against the specified peers, logging the results.

For the purpose of debugging P2P connectivity issues, the following command can also be used:
```console
$ grpcurl -plaintext 127.0.0.1:9093 spacemesh.v1.DebugService.NetworkInfo
{
"id": "12D3Koo...",
"listenAddresses": [
"/ip4/0.0.0.0/tcp/50212",
"/ip4/0.0.0.0/udp/59458/quic-v1",
"/p2p-circuit"
],
"knownAddresses": [
"/ip4/127.0.0.1/tcp/50212",
"/ip4/127.0.0.1/udp/59458/quic-v1",
"/ip4/192.168.33.5/tcp/50212",
"/ip4/192.168.33.5/udp/59458/quic-v1",
"/ip4/.../tcp/37670/p2p/12D3Koo.../p2p-circuit",
"/ip4/.../udp/37659/quic-v1/p2p/12D3Koo.../p2p-circuit",
"/ip4/.../tcp/31960/p2p/12D3Koo.../p2p-circuit",
"/ip4/.../udp/33377/quic-v1/p2p/12D3Koo.../p2p-circuit"
],
"natTypeUdp": "Cone",
"natTypeTcp": "Cone",
"reachability": "Private"
}
```

#### Next Steps

- Please visit our [wiki](https://github.com/spacemeshos/go-spacemesh/wiki)
Expand Down
46 changes: 42 additions & 4 deletions api/grpcserver/debug_service.go
Expand Up @@ -3,8 +3,10 @@ package grpcserver
import (
"context"
"fmt"
"sort"

"github.com/grpc-ecosystem/go-grpc-middleware/logging/zap/ctxzap"
"github.com/libp2p/go-libp2p/core/network"
pb "github.com/spacemeshos/api/release/go/spacemesh/v1"
"go.uber.org/zap"
"google.golang.org/grpc/codes"
Expand All @@ -22,7 +24,7 @@ import (
type DebugService struct {
db *sql.Database
conState conservativeState
identity networkIdentity
netInfo networkInfo
oracle oracle
}

Expand All @@ -32,11 +34,11 @@ func (d DebugService) RegisterService(server *Server) {
}

// NewDebugService creates a new grpc service using config data.
func NewDebugService(db *sql.Database, conState conservativeState, host networkIdentity, oracle oracle) *DebugService {
func NewDebugService(db *sql.Database, conState conservativeState, host networkInfo, oracle oracle) *DebugService {
return &DebugService{
db: db,
conState: conState,
identity: host,
netInfo: host,
oracle: oracle,
}
}
Expand Down Expand Up @@ -78,7 +80,21 @@ func (d DebugService) Accounts(ctx context.Context, in *pb.AccountsRequest) (*pb

// NetworkInfo query provides NetworkInfoResponse.
func (d DebugService) NetworkInfo(ctx context.Context, _ *emptypb.Empty) (*pb.NetworkInfoResponse, error) {
return &pb.NetworkInfoResponse{Id: d.identity.ID().String()}, nil
resp := &pb.NetworkInfoResponse{Id: d.netInfo.ID().String()}
for _, a := range d.netInfo.ListenAddresses() {
resp.ListenAddresses = append(resp.ListenAddresses, a.String())
}
sort.Strings(resp.ListenAddresses)
for _, a := range d.netInfo.KnownAddresses() {
resp.KnownAddresses = append(resp.KnownAddresses, a.String())
}
sort.Strings(resp.KnownAddresses)
udpNATType, tcpNATType := d.netInfo.NATDeviceType()
resp.NatTypeUdp = convertNATType(udpNATType)
resp.NatTypeTcp = convertNATType(tcpNATType)
resp.Reachability = convertReachability(d.netInfo.Reachability())
resp.DhtServerEnabled = d.netInfo.DHTServerEnabled()
return resp, nil
}

// ActiveSet query provides hare active set for the specified epoch.
Expand Down Expand Up @@ -145,3 +161,25 @@ func castEventProposal(ev *events.EventProposal) *pb.Proposal {
}
return proposal
}

func convertNATType(natType network.NATDeviceType) pb.NetworkInfoResponse_NATType {
switch natType {
case network.NATDeviceTypeCone:
return pb.NetworkInfoResponse_Cone
case network.NATDeviceTypeSymmetric:
return pb.NetworkInfoResponse_Symmetric
default:
return pb.NetworkInfoResponse_NATTypeUnknown

Check warning on line 172 in api/grpcserver/debug_service.go

View check run for this annotation

Codecov / codecov/patch

api/grpcserver/debug_service.go#L171-L172

Added lines #L171 - L172 were not covered by tests
}
}

func convertReachability(r network.Reachability) pb.NetworkInfoResponse_Reachability {
switch r {
case network.ReachabilityPublic:
return pb.NetworkInfoResponse_Public

Check warning on line 179 in api/grpcserver/debug_service.go

View check run for this annotation

Codecov / codecov/patch

api/grpcserver/debug_service.go#L178-L179

Added lines #L178 - L179 were not covered by tests
case network.ReachabilityPrivate:
return pb.NetworkInfoResponse_Private
default:
return pb.NetworkInfoResponse_ReachabilityUnknown

Check warning on line 183 in api/grpcserver/debug_service.go

View check run for this annotation

Codecov / codecov/patch

api/grpcserver/debug_service.go#L182-L183

Added lines #L182 - L183 were not covered by tests
}
}
36 changes: 33 additions & 3 deletions api/grpcserver/grpcserver_test.go
Expand Up @@ -18,6 +18,8 @@ import (
"testing"
"time"

"github.com/libp2p/go-libp2p/core/network"
ma "github.com/multiformats/go-multiaddr"
pb "github.com/spacemeshos/api/release/go/spacemesh/v1"
"github.com/spacemeshos/merkle-tree"
"github.com/spacemeshos/poet/shared"
Expand Down Expand Up @@ -2394,10 +2396,10 @@ func TestJsonApi(t *testing.T) {

func TestDebugService(t *testing.T) {
ctrl := gomock.NewController(t)
identity := NewMocknetworkIdentity(ctrl)
netInfo := NewMocknetworkInfo(ctrl)
mOracle := NewMockoracle(ctrl)
db := sql.InMemory()
svc := NewDebugService(db, conStateAPI, identity, mOracle)
svc := NewDebugService(db, conStateAPI, netInfo, mOracle)
cfg, cleanup := launchServer(t, svc)
t.Cleanup(cleanup)

Expand Down Expand Up @@ -2448,13 +2450,33 @@ func TestDebugService(t *testing.T) {

t.Run("networkID", func(t *testing.T) {
id := p2p.Peer("test")
identity.EXPECT().ID().Return(id)
netInfo.EXPECT().ID().Return(id)
netInfo.EXPECT().ListenAddresses().Return([]ma.Multiaddr{
mustParseMultiaddr("/ip4/0.0.0.0/tcp/5000"),
mustParseMultiaddr("/ip4/0.0.0.0/udp/5001/quic-v1"),
})
netInfo.EXPECT().KnownAddresses().Return([]ma.Multiaddr{
mustParseMultiaddr("/ip4/10.36.0.221/tcp/5000"),
mustParseMultiaddr("/ip4/10.36.0.221/udp/5001/quic-v1"),
})
netInfo.EXPECT().NATDeviceType().Return(network.NATDeviceTypeCone, network.NATDeviceTypeSymmetric)
netInfo.EXPECT().Reachability().Return(network.ReachabilityPrivate)
netInfo.EXPECT().DHTServerEnabled().Return(true)

response, err := c.NetworkInfo(context.Background(), &emptypb.Empty{})
require.NoError(t, err)
require.NotNil(t, response)
require.Equal(t, id.String(), response.Id)
require.Equal(t, []string{"/ip4/0.0.0.0/tcp/5000", "/ip4/0.0.0.0/udp/5001/quic-v1"},
response.ListenAddresses)
require.Equal(t, []string{"/ip4/10.36.0.221/tcp/5000", "/ip4/10.36.0.221/udp/5001/quic-v1"},
response.KnownAddresses)
require.Equal(t, pb.NetworkInfoResponse_Cone, response.NatTypeUdp)
require.Equal(t, pb.NetworkInfoResponse_Symmetric, response.NatTypeTcp)
require.Equal(t, pb.NetworkInfoResponse_Private, response.Reachability)
require.True(t, response.DhtServerEnabled)
})

t.Run("ActiveSet", func(t *testing.T) {
epoch := types.EpochID(3)
activeSet := types.RandomActiveSet(11)
Expand Down Expand Up @@ -2776,3 +2798,11 @@ func TestMeshService_EpochStream(t *testing.T) {
}
require.ElementsMatch(t, expected, got)
}

func mustParseMultiaddr(s string) ma.Multiaddr {
maddr, err := ma.NewMultiaddr(s)
if err != nil {
panic("can't parse multiaddr: " + err.Error())
}
return maddr
}
12 changes: 10 additions & 2 deletions api/grpcserver/interface.go
Expand Up @@ -4,6 +4,9 @@ import (
"context"
"time"

"github.com/libp2p/go-libp2p/core/network"
ma "github.com/multiformats/go-multiaddr"

"github.com/spacemeshos/go-spacemesh/activation"
"github.com/spacemeshos/go-spacemesh/common/types"
"github.com/spacemeshos/go-spacemesh/p2p"
Expand All @@ -12,9 +15,14 @@ import (

//go:generate mockgen -typed -package=grpcserver -destination=./mocks.go -source=./interface.go

// networkIdentity interface.
type networkIdentity interface {
// networkInfo interface.
type networkInfo interface {
ID() p2p.Peer
ListenAddresses() []ma.Multiaddr
KnownAddresses() []ma.Multiaddr
NATDeviceType() (udpNATType, tcpNATType network.NATDeviceType)
Reachability() network.Reachability
DHTServerEnabled() bool
}

// conservativeState is an API for reading state and transaction/mempool data.
Expand Down

0 comments on commit 7e896bb

Please sign in to comment.