Skip to content

Network topology and peer discovery mechanism

Felföldi Zsolt edited this page Feb 16, 2016 · 2 revisions

Network topology and peer discovery mechanism

The original peer discovery protocol was designed to support the ETH protocol which is symmetrical and homogenous, meaning that every node provides and requires the same set of services to and from its peers. This is not the case with the LES (light client protocol), where there are at least two roles: servers and clients. The planned (and soon to be implemented) scalable LES/2 protocol (see below) needs even finer distinction because it makes possible for light nodes to become servers of a certain slice of the state trie too. The future "sharding" improvement of Ethereum will also make it necessary for ETH to create more strongly interconnected subnetworks and also a higher-level connection structure connecting those subnetworks. Other distributed services could probably also use the same structure.

Goals

  • A well structured, searchable network of nodes offering a certain set of services, preventing isolation
  • Support for asymmetrical protocols where some nodes offer different feature sets than others, possibly some are only clients
  • Support for large service hierarchies where the set of services (or the set of data the services operate on) are divided into smaller parts, some or all of the nodes only offering a subset of them (see "Topic hierarchy")
  • Multi-level Kademlia network/subnetwork structure that makes it possible to search both for subnetwork and address proximity

Beyond our goals

The proposed topology makes it possible to traverse an entire set of nodes providing a certain protocol/service once it connects to any of these nodes. The hierarchical structure makes it possible to find a suitable peer providing any sub-topic inside a topic hierarchy. But it does not provide a mechanism for finding boot nodes for a certain protocol. Such a universal advertisement mechanism can be implemented later as a separate, globally spoken protocol. This task can be well separated from the construction of the network topology and traversal mechanism. It also should be separated because it is far from trivial, especially when considering security issues. If any node can advertise itself on any topic, certain topics can be spammed by useless nodes, making the proper ones too hard to find. Only a node speaking a certain protocol can tell if another node also speaks it properly. This universal node registry should also realize a web-of-trust mechanism and probably a micropayment channel/security deposit system too. Until such a mechanism becomes available and reliable, protocols can work with hard-wired or externally provided boot nodes.

Topic hierarchy

A topic (represented by an uint128) consists of two parts: a topic root (uint96) and a subtopic ID (uint32). The topic root identifies the network of nodes speaking the same protocol (and observing the same blockchain if it is a blockchain-related protocol). The subtopic ID denotes a node of a binary tree called the subtopic tree. (See below; 1 is the root, the children of x are 2x and 2x+1) Each subtopic represents a set of services provided by the node that "offers" the given subtopic. The set of available services denoted by a child subtopic is always a subset of the services denoted by the parent. For each subtopic belongs a subnetwork that includes the nodes offering either that subtopic or any descendant of it. It is possible that some subtopics are not offered by anyone but their descendants are. For example, the available nodes can provide 16 disjoint sets of services (like access to 16 equal and disjoint slices of the state), in which case, the four highest levels of subtopics will not be offered but the belonging subnetworks will still exist. This also means that the nodes belonging to a subnetwork are a subset of the ones belonging to the parent. The "root network" includes every node offering any topic with the same topic root.

           1
          / \
         2   3
        / \ / \
       4  5 6  7
      / \
     8   9

Network topology of service providers

Let's call a subtopic's (or belonging subnetwork's) ancestors, children (but not grandchildren), sibling, and ancestors' siblings its "relatives". For example, 4's relatives are 1, 2, 3, 5, 8, 9. Any node offering a certain subtopic T should have connections inside its own subnetworks (T and ancestors of T) and also to its relative subnetworks. This makes it possible to search for a certain required service (a certain subtopic or any of its ancestors) inside the network hierarchy, even if some of the higher level subtopics (larger sets of services) are not offered by anyone. The subnetworks are Kademlia structures, which makes it possible to also search for address proximity and ensures the possibility to randomly select nodes from an entire subnetwork, keeping the topology healthy. Every node offering a subtopic T should keep a KAD(T) Kademlia table for its own subnetwork and a KAD(R) table for every relative R. A KAD(T) table consists of several "lines", each holding connection info for a few nodes who offer subtopic T or a descendant of it and whose addresses have a certain length of common prefix with our address. An element of KAD(T)(i) can also be an element of KAD(A)(i) where A is an ancestor of T so in practice, most of the connections can be shared with ancestor tables. Also, sibling tables don't need to be populated as much as the "own subnetwork" tables. Still, this means a lot of connections for a big hierarchical network, but they don't need to be kept open, just remembered and later replaced by newer connections as they get old.

Client-only operation

In case of an asymmetrical protocol like LES, there can be nodes who do not offer any service topics and act only as clients. They do not need to place themselves inside the Kademlia structures and populate their own KAD tables. The set of services they are looking for and their peer selection strategies are application-specific, node searches are initiated by the protocol implementation. For client-mode operation, the peer manager only needs to remember the recently known connections without the need to enforce any particular structure.

Search algorithm

The purpose of searching is to find a node that can provide us a certain service or (if we are providing services too) to build connections, populate our KAD tables and announce ourselves to those peers so that others will find us too. A search request has four parameters:

  • subtopic: find someone who is offering this subtopic or one of its ancestors. If there is none, find the closest ones (whose offered subtopic has the lowest level common ancestor with the searched one). The distance between offered and searched subtopic is defined as the number of levels between the offered subtopic and its closest common ancestor with the searched one.
  • address: find someone with an address as close to this one as possible. The binary XOR distance metric is used. Address distance has lower priority than subtopic distance.
  • age: filter out connections with last contact older than this
  • count: return at most this many nodes

The algorithm for finding suitable nodes looks like this:

  • run search locally
  • if suitable nodes are found, return
  • pick the closest node that has not been tried yet, establish contact, send search request
  • add results to local KAD tables
  • repeat

Service announcement and network topology maintenance

Service providers need to populate and keep their KAD tables of their own and all relative subnetworks up-to-date and also ensure that a sufficient number of other nodes also have them in their tables. This is done by randomly selecting nodes from those subnetworks, adding them to our tables (replacing old entries if necessary) and introducing ourselves to them. A rough approximation to "selecting randomly" is running a search for a given subtopic and a random address, findig the closest existing address. This could be further refined by examining the nearby addresses which we also found during the search and trying to even out the uneven chances caused by the uneven distribution of addresses. Choosing connections randomly with an approximately even chance ensures a healthy and resilient network topology.

An example application: the distributed LES/2 protocol

The purpose of the LES/2 protocol is to create a network topology in which a relatively small number of transaction-processing full nodes can spread out state data to a very large number of light nodes, some of which take part in data distribution, preferably avoiding frequent topology changes and lenghty node searches during client operation. It is trivially compatible with the planned state sharding concept. Note that the light client protocol also distributes transaction/receipt data and new transaction forwardig/validation, but these mechanisms are not discussed here since the distribution of state data is the most interesting part. There are two service types in LES/2: distributor (DIS) and fetcher (FET). DIS provides an "update entire subtree" service that passes all trie nodes within a certain key range that has been changed since a previous query. It also passes trie nodes up to the root node, providing a Merkle proof for the entire subtree. An ETH node can also offer DIS(00-ff), meaning that the entire key range is accessible there. Light nodes who do distribution are probably only offering and observing a smaller key range, and they are connected either to full nodes or larger light distributors, updating their state slice with every new block. This way state data can be quickly and safely duplicated and made accessible to a lot of nodes, without one node having to handle too much data. If the state trie is not too heavily subdivided, light clients can have a contact to one node offering each slice, but further system scaling (especially if shards are further sliced up for light node distribution) might require having a large number of slices. Keeping hundreds of live connections might not be possible or practical, especially in the case of mobile devices. But searching for suitable nodes on-demand for a certain key range when state retrieval request comes in might also not be practical because finding a node can take several seconds and some operations can require several state reads. In this case, the fetchers can help. A fetcher, even FET(00-ff) which offers the entire key range, does not necessarily have to handle a large amount of data (depends on the number of clients). It only needs to keep a live connection to a number of smaller fetchers/distributors who cover its offered range. It offers single Merkle proof retrievals and forwards these requests to one of its peers who can answer it. Practice will show if and when fetchers will be really needed, but in a massively scaled system, they could provide serious performance improvement. So here is an example topology showing offered subtopics:

                   <ETH node>
                   DIS(00-ff)
                 /            \
       DIS(00-7f)              DIS(80-ff)
      /          \            /          \
 DIS(00-3f)  DIS(40-7f)  DIS(80-bf)  DIS(c0-ff)
      \          /            \          /
       FET(00-7f)              FET(80-ff)
                 \            /
               no service offered
                  <LES client>

The topic hierarchy would look like this:

                  <topic root>
             (not offered by anyone)
                 /            \
       DIS(00-ff)              FET(00-ff)
      /          \            /          \
 DIS(00-7f)  DIS(80-7f)  FET(00-7f)  FET(80-ff)
Clone this wiki locally