Skip to content

Latest commit

 

History

History
683 lines (523 loc) · 56 KB

routing.asciidoc

File metadata and controls

683 lines (523 loc) · 56 KB

Routing on a Network of Payment channels

In this section we will finally unpack how payment channels can be connected to a network of other payment channels via a process called routing.

Routing vs. Path Finding

It’s important to note that we separate the concept of routing from the concept of path finding. These two concepts are often confused and the term "routing" is often used to describe both concepts. Let’s remove the ambiguity, before we proceed any further.

Path Finding, which is covered in [path_finding] is the process of finding and choosing a contiguous path made of payment channels which connects the sender A to the recipient B. The sender of a payment does the path finding, by examining the channel graph which they have assembled from channel announcements gossiped by other nodes.

Routing refers to the series of interactions across the network that allow a payment to flow from A to B, across the path previously selected by path finding. Routing is the active process of sending a payment on a path, which involves the cooperation of all the intermediary nodes along that path.

An important rule of thumb is that it’s possible for a path to exist between Alice and Bob, yet there may not be an active route on which to send the payment.

One example is the scenario where all the nodes connecting Alice and Bob are currently off-line.

In theory, one can examine the channel graph and connect a series of payment channels from Alice to Bob, hence a path exists. However, as the intermediary nodes are offline, the payment cannot be sent and so no route exists.

Routing a payment

In this section we will examine routing from the perspective of Gloria, a gamer who receives donations from her fans while she livestreams her game sessions.

The innovation of routed payment channels allows Gloria to receive tips without maintaining a separate channel with every one of her fans who want to tip her. As long as there exists a path of well-funded channels from that viewer to Glori, she will be able to receive payment from that fan. The nodes along the path from the fan to Gloria are intermediaries and called "routing nodes" in the context of routing a payment.

Any one of Gloria’s fans in the diagram can pay her by routing via the nodes in between them and Gloria

Any one of Gloria’s fans in the diagram can pay her by routing via the nodes in between them and Gloria

Importantly, the routing nodes are unable to steal the funds while routing a payment from a fan to Gloria. Furthermore, routing nodes cannot lose money while participating in the routing process. Routing nodes can charge a routing fee for acting as an intermediary, although they don’t have to and may choose to route payments for free.

Another important detail is that due to the use of onion routing, intermediary nodes are only explicitly aware of the one node preceding them and the one node following them in the route. They will not necessarily know who is the sender and recipient of the payment. This enables fans to use intermediary nodes to pay Gloria, without leaking private information and without risking theft.

This process of connecting a series of payment channels with end-to-end security, and the incentive structure for nodes to forward payments, is one of the key innovations of the Lightning Network.

In this chapter, we’ll dive into the mechanism of routing in the Lightning Network, detailing the precise manner in-which payments flow through the network. First, we will cover the concept of a conditional chained end-to-end secure payment, most commonly referred to as a Hash Time Locked Transaction (HTLC). Having learned how payments can be transmitted through the network, we will then discuss the concept of source-based routing and contrast it to the privacy preserving onion routing used in the network today. Finally, we will explore the exact mechanism of payment forwarding. We will discuss how the structure (edges, fees, time-locks, etc) of the route is determined by the sender, and is then transmitted to each individual node along the route.

Creating a Network of payment channels

Before we dive into the concept of a conditional chained end-to-end secure payment, let’s work through an example. Let us to return to Alice who, in previous chapters, purchased a coffee from Bob with whom she has an open channel. Alice now watches a live stream from Gloria the gamer, and wants to send her a tip via the Lightning Network. However, Alice has no direct channel with Gloria. Alice could open a direct channel, however that would require liquidity and on-chain fees which could be more than the value of the tip itself.

Instead, Alice can use her existing open channels to send a tip to Gloria without the need to open a channel directly with Gloria.

This is possible, as long as there exists some path of channels from Alice to Gloria with sufficient capacity to route the tip.

From previous chapters, we know Alice has an open channel with Bob, the coffee shop owner. Bob, in turn, has an open channel with the software developer Wei who helps him with the point of sale system he uses in his coffee shop. Wei is also the owner of a large software company which develops the game that Gloria plays, and they already have an open channel which Gloria uses to pay for the game’s license and in-game items.

If we draw out this series of payment channels, it’s possible to manually trace a path from Alice to Gloria that uses Bob and Wei as intermediary routing nodes. Alice can then craft a route from this outlined path, and use it to send a tip of a few thousand satoshis to Gloria, with the payment being forwarded by Bob and Wei. Essentially, Alice will pay Bob, who will pay Wei, who will pay Gloria. And no direct channel from Alice to Gloria is required.

The network of payment channels of our friends can be seen here:

routing network

The main challenge is to do this in a way that prevents Bob and Wei from stealing the money that Alice wants delivered to Gloria. To understand how the Lightning Network protects the payment while being routed, we can compare to an example of routing physical payments with golden coins in the real world.

Assume Alice wants to give 10 gold coins to Gloria, but does not have direct access to Gloria. However, Alice knows Bob, who knows Wei, who knows Gloria and so she decides to ask Bob and Wei for help. She can pay Bob to pay Wei to pay Gloria, but how does she make sure that Bob or Wei don’t run off with the coins after receiving them? In the physical world contracts could be used for safely carrying out a series of payments.

Alice could negotiate a contract with Bob which reads:

I (Alice) will give you (Bob) 10 gold coins if you pass them on to Wei

While this contract is nice in the abstract, in the real world, Alice runs the risk that Bob might breach the contract and hope to not get caught by law enforcement. Even if Bob gets caught by law enforcement, Alice faces the risk that he might be bankrupt and be unable to return her 10 gold coins. Assuming these issues are magically solved, it’s still unclear how to leverage such a contract to achieve our desired outcome: the coins ultimately being delivered to Gloria.

We thus improve our contract:

I (Alice) will reimburse you (Bob) with 10 gold coins if you can prove to me (for example via a receipt) that you already have delivered 10 gold coins to Wei

You might ask yourself why should Bob sign such a contract. He has to pay Wei but ultimately gets nothing out of the exchange, and he runs the risk that Alice might not reimburse him. Bob could offer Wei a similar contract to pay Gloria, but similarly Wei has no reason to accept it either. Even putting aside the risk, Bob and Wei must already have 10 gold coins to send, otherwise they wouldn’t be able to participate in the contract. Thus Bob and Wei face both risk and opportunity cost for agreeing to this contract, and they would need to be compensated in order for them to accept it.

Alice can this make this attractive to both Bob and Wei, by offering them fees of 1 gold coin each, if they transmit her payment to Gloria. The final contract would instead read:

I (Alice) will reimburse you (Bob) with 12 gold coins if you can prove to me (for example via a receipt) that you already have delivered 11 golden coins to Wei

Alice now promises Bob 12 gold coins. There are 10 to be delivered to Gloria and 2 for the fees. She promises 12 to Bob if he can prove that he has forwarded 11 to Wei. The difference of 1 gold coin is the fee that Bob will earn for helping out with this particular payment.

As there is still the issue of trust and the risk that either Alice or Bob don’t honor the contract, all parties decide to use an escrow service. At the start of the exchange, Alice could "lock up" these 12 golden coins in escrow that will only be paid to Bob once he proves that he’s paid 11 golden coins to Wei.

This escrow service is an "ideal functionality", which will later be replaced by a more trust-minimized mechanism. Let’s assume for now that everyone trusts this escrow.

In the Lightning Network, the receipt (proof of payment) could take the form of a secret that only Gloria knows. In practice, this secret would be a large random number that is large enough to prevent others from guessing it (typically very, very large number, encoded using 256 bits!). The secret could then be committed to the contract by including the SHA256 hash of the secret in the contract itself. We call this hash of the payment’s secret the payment hash. The secret which "unlocks" the payment is called the payment secret.

For now, we keep things simple and assume that Gloria’s secret is simply the text line: Glorias secret. In order to "commit" to this secret, she computes the SHA256 hash which when encoded in hex, can be displayed as: f23c83babfb0e5f001c5030cf2a06626f8a940af939c1c35bd4526e90f9759f5. [1]

To facilitate Alice’s payment, Gloria will create the secret and the payment hash and send the payment hash to Alice. Alice doesn’t know the secret but she can rewrite her contract to use the hash of the secret as a proof of payment:

I (Alice) will reimburse you (Bob) with 12 gold coins if you can show me a valid message that hashes to:`f23c83...`. You can acquire this message by setting up a similar Contract with Wei who has to set up a similar contract with Gloria. In order to assure you that you will get reimbursed I will provide the 12 gold coins to an trusted escrow before you set up your next contract.

This new contract now protects Alice from Bob not forwarding to Wei, protects Bob from not being reimbursed by Alice, and ensures that there will be proof that Gloria was ultimately paid via the hash of Gloria’s secret. This secret message that hashes to the f23c83... is called a pre-image.

After Bob and Alice agree to the contract, and Bob receives the message from the escrow that Alice has deposited the 12 gold coins, Bob can now negotiate a similar contract with Wei.

Note that since Bob is taking a service fee of 1 coin, he will only forward 11 gold coins to Wei once Wei shows proof that he has paid Gloria. Similarly, Wei will also demand a fee and will expect to receive 11 gold coins once he has proved that he has paid Gloria the promised 10 gold coins.

Bob’s contract with Wei will read:

I (Bob) will reimburse you (Wei) with 11 gold coins if you can show me a valid message that hashes to:`f23c83...`. You can acquire this message by setting up a similar contract with Gloria. In order to assure you that you will get reimbursed I will provide the 11 gold coins to an trusted escrow before you set up your next contract.

Once Wei gets the message from the escrow that Bob has deposited the 11 gold coins, Wei sets up a similar contract with Gloria:

I (Wei) will reimburse you (Gloria) with 10 golden coins if you can show me a valid message that hashes to:`f23c83...`. In order to assure you that you will get reimbursed after revealing the secret I will provide the 10 gold coins to an trusted escrow.

Everything is now in place. Alice has a contract with Bob and has placed 12 gold coins in escrow. Bob has a contract with Wei and has placed 11 gold coins in escrow Wei has a contract with Gloria and has placed 10 gold coins in escrow. It is now up to Gloria to reveal the secret, which is the pre-image to the hash she has established as proof of payment.

Gloria now sends "Glorias secret" to Wei.

He checks that "Glorias secret" hashes to +f23c83.... Wei now has proof of payment and so instructs the escrow service to release the 10 golden coins to Gloria.

Wei now provides the secret to Bob. Bob checks it and instructs the escrow service to release the 11 gold coins to Wei.

Bob now provides the secret to Alice. Alice checks it and instructs the escrow to release 12 gold coins to Bob.

All the contracts are now settled. Alice has paid a total of 12 gold coins, 1 of which was received by Bob, 1 of which was received by Wei, and 10 of which were received by Gloria. With a chain of contracts like this in place, Bob and Wei could not run away with the money because they deposited it in escrow first.

However, one issue still remains. If Gloria refused to release her secret pre-image, then Wei, Bob, and Alice would all have their coins stuck in escrow but wouldn’t be reimbursed. And similarly if anyone else along the chain failed to pass on the secret, the same thing would happen. So while no one can steal money from Alice everyone can still lose money.

Luckily, this can be resolved by adding a deadline to the contract.

We could amend the contract so that if it is not fulfilled by a certain deadline, then the contract expires and the escrow service returns the money to the person who made the original deposit. We call this deadline a "time lock".

The deposit is locked with the escrow service for a certain amount of time, and is eventually released even if no proof of payment was provided.

In order to factor this in, the contract between Alice and Bob is once again amended with a new clause:

Bob has 24 hours to show the secret after the contract was signed. If Bob does not provide the secret by this time, Alice’s deposit will be refunded by the escrow service and the contract becomes invalid.

Bob, of course, now has to make sure he receives the proof of payment within 24 hours. Even if he successfully pays Wei, if he receives the proof of payment later than 24 hours he will not be reimbursed. To remove that risk, Bob must give Wei and even shorter deadline.

In turn, Bob will alter his contract with Wei in the following way:

Wei has 22 hours to show the secret after the contract was signed. If he does not provide the secret by this time, Bob’s deposit will be refunded by the escrow service and the contract becomes invalid.

As you might have guessed, Wei is now incentivized to also alter his contract with Gloria:

Gloria has 20 hours to show the secret after the contract was signed. If he does not provide the secret by this time, Bob’s deposit will be refunded by the escrow service and the contract becomes invalid.

With such a chain of contracts we can ensure that, after 24 hours, the payment will successfully deliver from Alice to Bob to Wei to Gloria, or it will fail and everyone will be refunded. Either the contract fails or succeeds, there’s no middle ground.

In the context of the Lightning Network, we call this "all or nothing" property atomicity.

As long as the escrow is trustworthy and faithfully performs its duty, then no party will have their coins stolen in the process.

The pre-condition to this route working at all, is that all parties in the path have enough money to satisfy the required series of deposits.

While this seems like a minor detail we will see in later this chapter that this requirement is actually one of the more difficult issues for Lightning Network nodes. It becomes progressively more difficult as the size of the payment increases. Furthermore, the parties cannot use their money while it is locked in escrow. Thus users forwarding payments face an opportunity cost for locking the money, which is ultimately reimbursed through routing fees, as we saw in the example above.

In the following two sections we will discuss how the Bitcoin scripting language can be used to set up conditional chained end-to-end secure payment contracts without third party escrows, similar to the gold coin contracts described above. These are called Hash Time Locked Contracts (HTLCs). For HTLCs, there are no trusted third parties who act as an escrow; the Bitcoin Network itself becomes the "escrow" service.

Then, we will discuss how users are able to use an HTLC to securely "route" a payment through the Lightning Network.

Currently (in 2020), the Lightning Network uses a routing protocol called source-based onion routing, although it is possible to route payments with other routing protocols.

Finally we will discuss the precise details of forwarding, settling, and canceling HTLCs in the network.

Hash Time Locked Contracts as a Conditional Chained End to End Secure Payment

Our example in the prior section using "golden coins", was intended to lay same base intuition which we’ll leverage in this section to explain how HTLCs work in practice. HTLC is actually an acronym that stands for "Hash Time-Locked Contracts". A HTLC is a specific instantiation of a Conditional Chained End to End Secure Payment (CCESP, don’t use this acronym?). As we’ll see in the later chapters, given a set of adequate cryptographic constructs, many other instantiations are possible as well.

Before we dive into the specifics of HTLCs, it may be helpful to first build intuition on an abstraction over this concrete concept. First, let’s unpack what it means for something to be a conditional chained end to end secure payment:

Conditional End to End Secure Payments by Construction

Conditional Payments

A payment can be said to be conditional, if the completion of the payment relies on the completion of a certain event. In the golden coins example, this "condition" was the reveal of a hash pre-image. We could feasibly substitute this hash pre-image reveal for any other construct with "hardness" properties. Namely: it should be infeasible for a party that doesn’t know the proper "solution" of the condition to satisfy it, the "description" of the condition shouldn’t give away any information about the true "solution", and once a solution has been chosen and a description created from it, it shouldn’t be possible to "alter" that solution and have it still be a valid condition for the description.

The payment should only be able to be redeemed if a valid solution is revealed. Critical, all conditions need to be timed in order to allow the construct to return the funds back to the sender if a solution to this condition isn’t revealed. The combination of the condition, and a timeout on the condition gives the payment a trait we commonly refer to as atomicity: either the payment happens, or the receiver if refunded the funds.

Conditional Chained Payment

Building upon our conditional payment, it may be possible to chain this payment, allowing it to involve the payer, the payee, and possibly several intermediaries. Each intermediary, is able to present a slightly modified version of the condition (without invalidating it all together), and so on in an iterated manner until the conditional payment reaches the payee. Once it reaches the payee, then the payment should be able to be iteratively resolved, starting at the payee all the way back to the payer.

Each chaining creates an "incoming" and "outgoing" conditional payment. A node receives a conditional payment from a party (incoming condition), and then extends the conditional payment to the next party in the chain (outgoing condition). The payment is extended in from payer to payee, but settled from payee to payer, as each of the intermediaries gain the solution to the outgoing condition, and use that (possibly augmenting it) to satisfy the incoming solution.

Typically the payer rewards the intermediaries by sending slightly more than the payment amount, in order to allow the intermediaries to send out less with their outgoing payment than what they received from the incoming payment. The difference between these two payment values makes up the "forwarding fee" collected by the intermediary.

Conditional Chained End to End Secure Payment

With our final addition, we’ll achieve "end to end security". By this we mean that: no intermediaries are able to "claim" the payment without first obtaining the solution from someone further down from them in the chain. Additionally, we also require that the amount the payer intended to send is fully received by the payee. Finally, we require that non of the intermediaries are able to "contaminate" the payment beyond giving incorrect directions to the party that directly follows them. In other words, the intermediary shouldn’t be able to materially affect the propagation of the payment several hops away from it.

Hash Time Locked Contracts

In this section, we’ll construct a conditional chained end to end payment known as the HTLC. At each step we’ll add a new component, then examine it in light of our original definition to ensure all requirements and security properties are reached.

First, the "condition". For an HTLC, the condition is typically the reveal of a hash pre-image that matches a particular hash. This hash is typically referred to as the "payment hash", with the pre-image being called the "payment pre-image". If the name didn’t give too much away, for an HTLC, we’ll use a cryptographically secure hash function as one part of our condition. By using a cryptographic hash function, we ensure that it’s infeasible for another party to "guess" the solution of our condition, it’s easy for anyone to verify the solution, and there’s only one "solution" to the condition.

In order to implement the "refund" functionality, we rely on the "absolute time lock" functionality of Bitcoin script.

With all that said, a basic Bitcoin script implementing a hash time-locked contract would look something like the following:

OP_SIZE 32 OP_EQUAL

OP_IF
    OP_HASH160 <ripemd(payHash)> OP_EQUALVERIFY
    <receiver key>
OP_ELSE
   OP_CHECKLOCKTIMEVERIFY <timeout>
   OP_DROP
   <sender key>

OP_CHECKSIGVERIFY

Alice can present this script to Bob in order to kick off the conditional payment. For the chained aspect, Alice needs to be able to communicate the proper payment details to each hop in the route. Recall that each hop will specify a forwarding fee rate, as well as other parameters that express their forwarding policy. In addition to this forwarding rate, Alice also needs to be conceded about what time locks to use. Each node in the hop needs some time to be able to settle the outgoing, then incoming payment on-chain in the worst case. As a result, when constructing the final route, we need to give each node some buffer time, we call this before time, the "time lock delta". Factoring in this time-lock delta, the time-lock of the outgoing HTLC will decrease as the route progresses, as the outgoing HTLC will expire before the incoming HTLC. This set of decrementing time-locks is critical to the operation of the system, as it ensure out atomicity property for each hop, assuming they’re able to get into the chain in time.

In the next section, we’ll go into the exact mechanism of how Alice is able to deliver forwarding details to each hop in the route. In addition, we’ll dive further into proper time-lock construction, as incorrect time-lock set up can violate our atomicity property and lead to a loss of funds.

HTLC Packet Forwarding: Source Based Onion Routing

So far you have learnt that payment channels can be connected to a network which can be utilized to send payment from one participant to another one through a path of payment channels. You have seen that with the use of HTLCs the intermediary nodes along the path are not able to steal any funds that they are supposed to forward and also how a node can set up and settle an HTLC. With this bare foundation laid, the following questions may have come across you mind:

  • Who chooses the path for a candidate route?

  • How is a path selected as a candidate to attempt to route the HTLC for a payment?

  • How much information do nodes know about the total path?

  • How exactly does a payment flow through the network at each node?

In the network today, the sender is the one that selects the route and decides nearly all the details of the resulting route.

As for how path finding is done, there is no single approach that all nodes in the network use. Instead, answer to the second question has a very large solution space, meaning there are several algorithms and neuritics used in the network today. Most commonly, a variation of Dijkstra’s algorithm is used which takes into account additional Lightning Network details such as fees and time-locks. Remember from earlier that a path turns into a route which is used to trigger a payment attempt. As several conditions need to be satisfied for the HTLC to be completely extended, the sender may need to try several routes until one succeeds. However, the user of the wallet typically will not be aware of these failed path finding attempts, just as when we load a web-page on the Internet, we don’t learn of any TCP packet retransmissions.

In the early days of the network, a payment could only utilize a single channel in its final route. With the rise of Multi-Path Payments, the sender is able to split the amount into smaller pieces, and use distinct strategies to route all the payment chunks. This splitting behavior is similar to IP packet fragmentation on the IP layer: each node expresses its Maximum Payment Unit, with the sender using this as a guide to adequately split all payments. In later chapters, we’ll discuss further details of payment splitting and combination once we get to advanced path finding.

At a high level, each node in the route is only explicitly told: how to validate the incoming HTLC packet (remember all details need to be correct for a payment to flow!), who the next hop in the route is, and how to modify the incoming HTLC packet into a valid outgoing HTLC packet to forward to the next node. Combined with the fact that intermediate forwarding nodes aren’t explicitly given the sender and receiver of a payment, nodes are given the least amount of information they need to successfully forward a payment. In addition to these privacy enhancing attributes, intermediate nodes aren’t able to arbitrarily modify an HTLC packet, as all information is encrypted and cryptically authenticated with integrity checks carried out at each hop to ensure contents haven’t been modified. Readers familiar with onion routing may have realized that we’ll be using some clever cryptographic technique application to achieve all thees traits. We call this series of clever application of cryptographic techniques: sourced based onion routing!

Source based routing (the non-cryptographic portion of onion routing), is distinct from how packets are typically transmitted on the IP layer. On the Internet today, packet switching is widely used to transmit data across the Internet. Packet switching typically explicitly indicates the sender and receiver of a given packet. Intermediate routing nodes then attempt to deliver the packet on a best effort basis, with great freedom with to exactly how they select the next node in the route. However, the lack of encryption, end-to-end integrity checks, and arbitrary choice of routes may this a poor system to use in a payment network.

Source routing instead has the sender select the route entirely (which all we’ll learn later is important due to fees and timelocks). The onion routing layers then gives the sender nearly completely control of the route, and allows the sender to only tell the intermediate nodes what they need to successfully forward a payment. Onion routing is used in several popular protocols on the Internet, with the most notable of them being Tor. In the Lightning Network, we use a specific onion routing packet format called Sphinx, with some special modifications made in order to make it more suited to the unique constraints of the Lightning Network.

Note

While the Lightning Network also uses an onion routing scheme it is actually very different to the onion routing scheme that is used in the TOR network. Aside from the distinct cryptographic techniques they use, the biggest difference is that TOR is being used for arbitrary data to be exchanged between two participants where on the Lightning Network the main use case is to pay people and transfer data that encodes monetary value. In the Lightning Network, we’re only concerned with transmitting the details that are needed for a successful payment. On the Lightning Network there is no analogy to the exit nodes of the Tor Network as there’s no need to "exit" the network: all payments flow within the network. Although, in an idea model only a precise amount of information is leaked by a route, in practice several "side channels' exist, that may allow an adversary to deduce more information about a route. As an example, information about CTLV deltas, or the set of possible routes in the network may give away additional information about a given route. Similar to Tor, onion routing in the Lightning Network isn’t secure against a global passive adversary (one that can monitor all links and information flows in the network). Today in the network, every node in the route sees the same payment hash, meaning that if two nodes are "compromised" more details of the route are leaked. On the TOR network nodes can theoretically be connected via a full graph as every node could create an encrypted connection with every other node on top of the Internet Protocol almost instantaneously and at no cost. On the Lightning Network payments can only flow along existing payment channels. Removing and adding of those channels is a slow and expensive process as it requires onchain bitcoin transactions. On the Lightning Network nodes might not be able to forward a payment package because they do not own enough funds on their side of the payment channel. On the other hand there are hardly any plausible reasons other then its wish to act maliciously why a TOR node might not be able to forward an onion. Last but not least the Lightning Network can actually run on Tor to use it as a message transport layer. This means that all connections of a node with its peers and the resulting communication will by obfuscated once more through the TOR network.

Lets stick to our example in which Alice still wants to tip Gloria and has decided to use the path via Bob and Wei. We note that there might have been alternative paths from Alice to Gloria but for now we will just assume it is this path that Alice has decided to use. In order to kick off the transfer, Alice needs to send a special message to Bob to kick off the multi-hop transfer. You’ll learn about the specific structure of this message in later chapters, but for now we’ll call it an "HTLC Add" message. Aside from the amount, the payment hash, and the time-lock, this message also contains an opaque field use to store encrypted forwarding information. Today in the network, this field is 1366 bytes, as that’s the fixed size length of the onion packet. #TODO(roasbeef): explain security properties earlier This onion contains all the information about the path that Alice intends to use to send the payment to Gloria. However Bob who receives the onion cannot read all the information about the path as most of the onion is hidden from him through a sequence of encryptions. The name onion comes from the analogy to an onion that consists of several layers. In our case every layer corresponds to one round of encryption. Each round of encryption uses different encryption keys. They are chosen by Alice in a way that only the rightful recipient of an onion can peel of (decrypt) the top layer of the onion.

For example after Bob received the onion from Alice he will be able to decrypt the first layer and he will only see the information that he is supposed to forward the onion to Wei by setting up an HTLC with Wei. The HTLC with Wei should use the same Payment Hash as the receiving HTLC from Alice. The amount of the forwarded HTLC was specified in Bob decrypted layer of the onion. It will be slightly smaller than the amount of his incoming HTLC from Alice. The difference of these two amounts has to be at least as big as to cover the routing fees that Bob’s node announced earlier on the gossip protocol.

In order to set up the HTLC Bob will modify the onion a little bit in a deterministic manner. He removes the information that he could read from it and passes it along to Wei.

Wei in turn is only able to see that he is supposed to forward the package to Gloria. Wei knows he received the onion from Bob but has no clue that it was actually Alice who initiated the onion in the first place. In this way every participant is only able to peel of one layer of the onion by decrypting it. Each participant will only learn the information it has to learn to fulfill the routing request. For example Bob will explicitly be told that Alice offered him an HTLC and sent him an onion and that he is supposed to offer an HTLC to Wei and forward a slightly modified onion. Bob isn’t explicitly told if Alice is the originator of this payment as she could also just have forwarded the payment to him. Due to the layered encryption he cannot see the inside of Wei’s, and Gloria’s layer. The only thing Bob is told explicitly is that he was involved in a path that involved Alice, him and Wei.

While the Onion is decrypted layer by layer while it travels along the path from Alice via Bob and Wei to Gloria it is created from the inside layer to the outside layers via several rounds of encryption. Being created from the inside means that the construction starts with the Onion Package that Gloria is supposed to receive in plain text. Let us now look at the construction of the Onion that Alice has to follow and at the exact information that is being put inside each layer of the onion.

The onions are a data structure that at every hop consists of four parts:

  1. The version byte

  2. The header consisting of a public key that can be used by the recipient to produce the shared secret for decrypting the outer layer and to derive the public key that has to be put in the header of the modified onion for the next recipient.

  3. The payload

  4. an authentication via an HMAC.

For now we will ignore how the public keys are derived and exchanged and focus on the payload of the onion. Only the payload is actually encrypted and will be peeled of layer by layer. The payload consists of a sequence of a sequence of per hop data. This data can come in two formats the legacy one and the Type Length Value (TLV) Format. While the TLV format offers more flexibility in both cases the routing information that is encoded into the onion is the same for every but the last hop. For example, with the new TLV format, the sender can actually included the preimage in the payload for the last hop. This is nice as it allow a payer to initiate a payment without the necessity to ask the payer for an invoice and payment hash first. We will this feature called key send in a different chapter.

A node needs three pieces of information to forward the package:

  1. The short channel id of the next channel along which it is supposed to forward the onion by setting up an HTLC with the same payment hash.

  2. The amount that it is supposed to be forwarded and thus being used in the HTLC.

  3. Timelock information encoded to a cltv_delta is the last piece of information that is needed as HTLCs are hashed time locked contracts.

For easier readability we have used just a small integer as short_channel_ids in the following example and graphics.

per_hop payload of Glorias onion and the encrypted

routing onion 1

We can see that Alice has created some per hop data for David. The short channel id is set to 0 signaling David that this payment is intended to be for him. Note that this example is slightly simplified, in that David can also use attributes of the onion packet format itself to be able to know when he’s the final hop. The amount to forward is set to 3000. On the incoming HTLCs David should have seen that exact amount. Usually this amount is intended to say how many satoshis should be forwarded. Since the short channel id was set to zero in this particular case it is interpreted as the payment amount. Finally the CLTV delta which David should use to forward the payment is also set to block height 800 (the current height minus David’s CLTV grace delta) as David is the final hop. These data fields consist of 20 Bytes. The Lightning Network protocol permits usage of up to 65 bytes to signal routing information in the Onion for every hop.

  • 1 Byte Realm which signals nodes how to decode the following 32 Bytes.

  • 32 Byte for routing payload information (20 of which we have already used).

  • 32 Byte of a Hashed Message Authentication code.

As we’ll see in later sections, the more modern onion payloads used in Lightning today are much more flexible in that they allow a series of arbitrary key-values pairs. These arbitrary key-value pairs can be used to extend the protocol in an end-to-end manner, as it many cases, only the sender and receiver need to know how to interpret the data. In the next diagram we can see how the per hop payload for David looks like.

per_hop payload of Glorias onion and the encrypted

routing onion 2

On important feature to protect the privacy is to make sure that onions are always of equal length in depend of their position along the payment path. Thus onions are always expected to contain 20 entries of 65 Bytes with per hop data. If this wasn’t the case, and the onion packet shrank as it was being processed, then this would leak information about the true path length to nodes in the route as the packet would be smaller the further down the route we went. Since David is the final recipient of the payment, we only have 65 bytes worth of data to fill with actual content. The remaining bytes are filled with random bytes to pad out the packet in an unpredictable manner.

Taking a step back, before Alice is able to prepare the remainder of the packet, we needs to generate an ephemeral key (a key only used once). This ephemeral key is then used to generate a series of additional keys, which are themselves used for encryption, authentication, and also as input to a CSPRNG to deterministically generate the set of random filler bytes. In the spirit of onion encryption, Alice will begin encrypting the payload from the last hop, adding a new layer of encryption with each new hop. During processing, each node will authenticate the contents of the payload, then process the packet (decryption it and shifting around some bytes) to prepare it for processing by the next node in the route. As we want each node to use a new shared secret to authenticate and encrypt its portion of the packet, the Sphinx onion packet format uses a re-randomization scheme to allow Alice to generate a single ephemeral Diffie-Hellman key for the entire route. Rather than occupying space in the routing payload for N public keys, with this little trick, we’re able to only include a single public key, which is used for ECDH at each step, and randomized in a deterministic manner for the next hop.

per_hop payload of Glorias onion and the encrypted

routing onion 3

You can see that Alice put the encrypted payload inside the full Onion Package which contains the public keys from the secret key that she used to derive the shared secret. The full onion packet also has a version byte in the beginning (for future extensibility) and an HMAC for the entire Onion. When David receives the Onion packet he will extract the public key from the unencrypted part of the onion package. David then uses ECDH to derive the shared secret using that ephemeral public key which he’ll use to process the packet in full. The properties of ECDH make is such that only Alice and David are able to derive the corresponding shared secret.

After the encrypted Onion for David is created Alice will create the next outer layer by creating the onion for Wei.

She truncates 65 Bytes from the end of the encrypted onion and prepends the truncated onion with 65 Byte per Hop data for Wei. The per hop data follows the same structure as the per hop data for David. Thus she starts with the Realm Byte that she will set to 0 again. Then comes the short channel id. This is set to 452 as Wei is meant to use the channel with this channel ID as the next outgoing channel. She sets the amount to 3000 satoshi as this is the amount that David is supposed to receive. Finally she uses the CLTV delta added to the current height that was announced for this channel on the gossip protocol and that Wei should use for the HTLC when he forwards the Onion. Notice how this CTLV expiry (the expiry is the current height plus the delta) increase as we travel forwards (towards the sender) in the route. As we’ll see later, this series of decrementing time-locks must carefully be set in order to avoid time-based race conditions in the created contracts. Again 12 Bytes of zeros are padded and an HMAC is computed. Note that she did not have to compute filler this time as she already has too much data with the encrypted inner onion. That is why the inner onion had to be truncated at the end. This is the plain text version of Weis Onion payload and can be seen in the following diagram:

per_hop payload of Glorias onion and the encrypted

routing onion 4

We emphasize that Wei cannot decrypt the inner part of the onion (that’s still encrypted from his PoV), as he cannot derive the encryption keys. However the information for Wei should also be protected from others. Thus Alice conducts another ECDH. This time with Wei’s public key and the randomized ephemeral key pair. She uses the shared secret to encrypt the onion payload. She would be able to construct the entire onion for Wei - which actually Bob does while he forwards the onion. The Onion that Wei would receive can be seen in the following diagram:

per_hop payload of Glorias onion and the encrypted

routing onion 5

Note that in the entire onion there will be Wei’s ephemeral public key. We’ve omitted the details here for brevity, but notice how only a single ephemeral key is communicated. During processing each node will re-randomize the ephemeral key for the following node. Luckily the ephemeral keys that Alice used for the ECDH with David can be derived from the ephemeral key that she used for Wei. Thus after Wei decrypts his layer he can use the shared secret and his ephemeral public key to derive the ephemeral public key that David is supposed to use and store it in the header of the Onion that he forwards to David. The exact progress to generate the ephemeral keys for every hope will be explained at the very end of the chapter. Similarly it is important to recognize that Alice removed data from the end of Davids onion payload to create space for the per hop data in Wei’s onion. Thus when Wei has received his onion and removed his routing hints and per hop data the onion would be to short and he somehow needs to be able to append the 65 Bytes of filled junk data in a way that the HMACs will still be valid. This process is of filler generation as well as the process of deriving the ephemeral keys is described in the end of this chapter. What is important to know is that every hop can derive the Ephemeral Public key that is necessary for the next hop and that the onions save space by always storing only one ephemeral key instead of all the keys for all the hops.

Finally after Alice has computed the encrypted version for Wei she will use the exact same process to compute the encrypted version for Bob. For Bobs onion she actually computes the header and provides the ephemeral public key herself. Note how Wei was still supposed to forward 3000 satoshis but How Bob was supposed to forward a different amount. The difference is the routing fee for Wei. Bob on the other hand will only forward the onion if the difference between the mount to forward and the HTLC that Alice sets up while transferring the Onion to him is large enough to cover for the fees that he would like to earn.

Note

We have not discussed the exact cryptographic algorithms and schemes that are being used to compute the ciphertext from the plain text. Also we have not discussed how the HMACs are being computed at every step and how everything fits together while the Onions are always being truncated and modified on the outer layer. For readers seeking more details with respect to the cryptographic algorithms used, we invite them to review BOLT 04 itself in full.

per_hop payload of Glorias onion and the encrypted

routing onion 6

Since we use the network itself for transport of these onion packets, Alice is able to construct the entire onion without needing to communicate directly with each node in the route. She only needs a public key from each participant which is the public node_id of the Lightning node and known to Alice. In the network today, Alice learns about the public key via the gossip network, which is described in Chapter N.

CLTV expiry and deltas

Pitfalls with source based Routing and HTLCs

In the first part of the routing chapter you have learnt that payments securely flow through the network via a path of HTLCs. You saw how a single HTLC is negotiated between two peer and added to the commitment transaction of each peer. In the second part you have seen how the necessary information for setting up HTLCs along a path of hops are being transferred via onion packets from the source to the sender. However, in the above scenarios, we only discussed flows where everything goes as expected (the optimistic path). In this section, we’ll now turn out attention into the various scenarios where the payment flow across the route breaks down.

First, it’s important to know that once a node sends a fully valid onion packet out to the first hop, they cannot directly influence the course of the route. In other words:

  • You cannot force nodes to forward the onion immediately.

  • You cannot force nodes to send back an error if they cannot forward the onion because of missing liquidity or other reasons.

  • You cannot be sure that the recipient has the preimage to the payment hash or releases it as soon as the HTLCs of the correct amount arrived.

When sending out an HTLC and its corresponding onion packet, you as the sender must be prepared to wait the worst-case CTLV timelock period before funds are returned back to the sender (if the route fails). This explicit, awareness of the worst-case delay when sending a payment may be difficult to explain properly from a user experience perspective for end user wallets. You want to quickly pay a person but the payment path that your node choose has CLTV deltas that quickly add up to several 100 blocks which is a couple of days. This means now that if nodes on the path misbehave - on purpose or maybe just because they have a downtime which your node didn’t know about - you will have to wait even though you don’t see a preimage. You must not send out another onion along a different path which uses the same payment hash because there is a risk that both payments will settle eventually. While our user experience is that most payments find a path and settle in far less than 10 seconds the Lightning Network protocol cannot and does not give any service level agreement that within this time payments will settle or fail.

Note

There are ideas out that might solve this issue to some degree by allowing the payer to abort a payment. You can find more about that under the terms cancelable payments or stuckless payments. However the proposals that exist only reverse the problem as now the sender can misbehave and the recipient looses control. Another solution is to use many paths in a multipath payment and include some redundancy and ignore the problem that a path takes longer to complete.

Despite these principle problems there are plausible situations in which the routing process fails and in which honest nodes can and should react. This is why the onion protocol has the ability to send back errors in a fail-fast manner that allows nodes to remove the HTLC off chain, without needing to close out channels. Some - but not all - of the reasons for errors could be:

  • A node has not enough liquidity to set up the next HTLC

  • The next payment channel does not exist anymore as it might have been closed while the onion was routed to node that was supposed to forward the onion along the channel.

  • While the channel might still be open - as the funding transaction was never spent - it might happen that the other peer is offline. This of course prevents the node to forward the onion.

  • The key exchanges of the sender might have been wrong so the decryption of the onion or the HMCAs do not match. (also because someone tried to tamper with the onion)

  • The recipient might not have issued an invoice and does not know the payment details.

  • The amount of the final HTLC is too low and the recipient does not want to release the preimage.

If any of the above errors are encountered, a node will send back an encrypted error reply onion back the sender. The reply onion will be encrypted at each hop with the same shared secrets that have been used to construct the onion or decrypt a layer. These shared keys are all known to the originator of the payment. The innermost onion contains the error message and an HMAC for the error message. The process makes sure that the sender of the onion and recipient of the reply can be sure that the error really originated from the node that the error messages says. Another important step in the process of handling errors is to abort the routing process. We discussed that the sender of a payment cannot just remove the HTLC on the channel along which the sender sent the payment. Recall for example the situation in which Alice sent and onion to Bob who set up an HTLC with Wei. If Alice wanted to remove the HTLC with Bob this would put a financial risk on Bob. He fears that his HTLC with Wei still might be fulfilled meaning that he could not claim the reimbursement from Alice. Thus Bob would never agree to remove the HTLC with Alice unless he already has removed his HTLC with Wei. If however the HTLC between Alice and Bob are set up and the HTLC between Bob and Wei are set up but Wei encounters problems with forwarding the onion it is perfectly Wei has more options than Alice. While sending back the error Onion to Bob Wei could ask him to remove the HTLC. Bob has no risk in removing the HTLC with Wei and Wei also has no risk as there is no downstream HTLC. Removing an HTLC is the reverse of adding one in the first place from the PoV the commitment transaction. In the case of errors peers signals that they wish to remove the HTLC by sending an update_fail_htlc or update_fail_malformed_htlc message. These messages contain the id of an HTLC that should be removed in the next version of the commit transaction. In the same handshake like process that was used to exchange commitment_signed and revoke_and_ack messages the new state and thus pair of commitment signatures has to be negotiated and agreed upon. This also means while the balance of a channel that was involved in a failed routing process will not have changed at the end it will have negotiated two new commitment transactions. Despite having the same balance it must not got back to the previous commitment transaction which did not include the HTLC as this commitment transaction was revoked. If it was used to force close the channel the channel partner would have the ability to create a penalty transaction and get all the funds.

Settling HTLCs

In the last section you you understood the error cases that can happen with onion routing via the chain of HTLCs. You have learnt how HTLCs are removed if there is an error. Of course HTLCs also need to be removed and the balance needs to be updated if the chain of HTLCs was successfully set up to the destination and the preimage is being released. Not surprisingly this process is initiated with anther lightning message called update_fulfill_htlc. You will remember that HTLCs are set up and supposed to be removed with a new balance for the recipient in exchange for a secret preimage. Recalling the full-duplex protocol with commitment_signed and revoke_and_ack messages you might wonder how to make this exchange preimage for new state atomic. The cool thing is it doesn’t have to be. Once a channel partner with an accepted incoming HTLC knows the preimage can safely just pass it to the channel partner. That is why the update_fulfill_htlc message contains only the channel_id the id of the HTLC and the preimage. You might wonder that channel partner could now refuse to sign a new channel state by sending commitment_siged and revoke_and_ack messages. This is not a problem though. In that case the recipient of the offered HTLC can just go on chain by force closing the channel. Once that has happened the preimage can be used to claim the HTLC output.

Some Considerations for routing nodes

Accepting and HTLC removes funds from a peer that the peer cannot utilize unless the HTLC is removed due to success or failure. Similarly forwarding an HTLC binds some funds from your nodes payment channel until the HTLC is being removed again. As we explained in the very beginning of the chapter engaging into the forwarding process of HTLCs does neither yield a direct risk to loose funds nor does it gain the chance to gain funds. However the funds in jeopardy could be locked for some time. In the worst case the routing process needs to be resolved on chain as the payment channel was forced close due to some other circumstances. In that case outstanding HTLCs produce additional onchain food print and costs. Thus there are two small economic risks involved with the participation in the routing process.

  1. Higher onchain fees in case of forced channel closes due to the higher footprint of HTLCs

  2. Opportunity costs of locked funds. While the HTLC is active the funds cannot be used otherwise.

Owners of routing nodes might want to monitor the routing behavior and opportunities and compare them to the onchain costs and the opportunity costs in order to compute their own routing fees that they wish to charge to accept and forward HTLCs.

Also one should notice that HTLCs are outputs in the commitment transaction. Lightning network protocol allows users to pay a single satoshi. However it is impossible to set up HTLCs for this amount. The reason is that the corresponding outputs in the commitment transaction would be below the dust limit. Such cases are solved in practice with the following trick: Instead of setting up an HTLC the amount is taken from the output of the sender but not added to the output of the recipient. Thus the HTLCs which are below the dust limit can understood as additional fees in the commitment transaction. Most Lightning Nodes support the configuration of minimum accepted HTLC values. Operators have to consider if they want to risk overpaying fees or loosing funds in the forced channel close cases because the commitment transactions have been added to the fees.

Explain fee and time-lock considerations The “HTLC Switch” analogy compared to regular network switch Circuit map concept, how to handle forwarding Pipeline styles for HTLCs Error handling and encryption for HTLCs

Explain “one little trick” of DH re-randomization Explain how we keep the packet size fixed, what’s MAC’d, etc Introduce the new modern payload format which uses TLV

Routing failures


1. You can verify this by typing echo -n "Glorias secret" | sha256sum to your Linux command line shell.