Attributable errors (feature 36/37) #1044

joostjager · 2022-11-21T11:28:26Z

Error attribution is important to properly penalize nodes after a payment failure occurs. The goal of the penalty is to give the next attempt a better chance at succeeding. In the happy failure flow, the sender is able to determine the origin of the failure and penalizes a single node or pair of nodes.

Unfortunately it is possible for nodes on the route to hide themselves. If they return random data as the failure message, the sender won't know where the failure happened.

This PR proposes a new failure message format that lets each node commit to the failure message. If one of the nodes corrupts the failure message, the sender will be able to identify that node.

For more information, see https://lists.linuxfoundation.org/pipermail/lightning-dev/2022-October/003723.html.

LND implementation: lightningnetwork/lnd#7139

Note: the PR is currently authored as a change rather than an addition. Later on it can be converted either to an extension bolt or merged into the main spec with if..then..else sentences.

04-onion-routing.md

thomash-acinq · 2022-12-06T14:53:54Z

I've started implementing it in eclair, do you have some test vectors so we can check that we are compatible?
The design seems good to me, but as I've said previously, I think keeping hop payloads and hmacs for 8 nodes only (instead of 27) is enough for almost all use cases and would give us huge size savings.

joostjager · 2022-12-06T15:41:32Z

I don't have test vectors yet, but I can produce them. Will add them to this PR when ready.

Capping the max hops at a lower number is fine to me, but do you have a scenario in mind where this would really make the difference? Or is it to more generally that everything above 8 is wasteful?

joostjager · 2022-12-06T16:53:09Z

@thomash-acinq added a happy fat error test vector.

04-onion-routing.md

joostjager · 2022-12-07T09:27:19Z

09-features.md

@@ -41,6 +41,7 @@ The Context column decodes as follows:
 | 20/21 | `option_anchor_outputs`          | Anchor outputs                                            | IN       | `option_static_remotekey` | [BOLT #3](03-transactions.md)         |
 | 22/23 | `option_anchors_zero_fee_htlc_tx` | Anchor commitment type with zero fee HTLC transactions   | IN       | `option_static_remotekey` | [BOLT #3][bolt03-htlc-tx], [lightning-dev][ml-sighash-single-harmful]|
 | 26/27 | `option_shutdown_anysegwit`         | Future segwit versions allowed in `shutdown`              | IN       |                   | [BOLT #2][bolt02-shutdown]   |
+| 28/29 | `option_fat_error`               | Can generate/relay fat errors in `update_fail_htlc`       | IN       |                   | [BOLT #4][bolt04-fat-errors]   |


I think this big gap in the bits has emerged here because of tentative spec changes that may or may not make it. Not sure why that is necessary. I thought for unofficial extensions, the custom range is supposed to be used?

I can see that with unofficial features deployed in the wild, it is easier to keep the same bit when something becomes official. But not sure if that is worth creating the gap here? An alternative is to deploy unofficial features in the custom range first, and then later recognize both the official and unofficial bit. Slightly more code, but this feature list remains clean.

joostjager · 2022-12-07T09:27:48Z

Added fat error signaling to the PR.

04-onion-routing.md

thomash-acinq · 2022-12-09T17:12:30Z

I've spent a lot of time trying to make the test vector pass and I've finally found what was wrong:
In the spec you write that the hmac covers

failure_len, failuremsg, pad_len and pad.

The first y+1 payloads in payloads. For example, hmac_0_2 would cover all three payloads.

y downstream hmacs that correspond to downstream node positions relative to x. For example, hmac_0_2 would cover hmac_1_1 and hmac_2_0.

implying that we need to concatenate them in that order. But in your code you follow a different order:

// Include payloads including our own.
_, _ = hash.Write(payloads[:(NumMaxHops-position)*payloadLen])

// Include downstream hmacs.
var hmacsIdx = position + NumMaxHops
for j := 0; j < NumMaxHops-position-1; j++ {
	_, _ = hash.Write(
		hmacs[hmacsIdx*sha256.Size : (hmacsIdx+1)*sha256.Size],
	)

	hmacsIdx += NumMaxHops - j - 1
}

// Include message.
_, _ = hash.Write(message)

I think the order message + hop payloads + hmacs is more intuitive as it matches the order of the fields in the packet.

joostjager · 2022-12-09T17:16:40Z

Oh great catch! Will produce a new vector.

joostjager · 2022-12-12T07:34:33Z

@thomash-acinq updated vector

04-onion-routing.md

joostjager · 2023-01-23T15:38:27Z

Updated LND implementation with sender-picked fat error structure parameters: lightningnetwork/lnd#7139

04-onion-routing.md

thomash-acinq

Eclair has a basic implementation of the latest version ready: ACINQ/eclair#2519
It doesn't signal the feature yet (conflict with dual funding) and does not make use of attributable errors but should relay them correctly (compatible with the test vector).

09-features.md

thomash-acinq · 2023-08-11T14:38:30Z

04-onion-routing.md

+`attributable_error` is a marker record that specifies that the failure message
+in the `update_fail_htlc` response should be structured as an attributable
+error. For the legacy format, this record should not be set.
+
 ### Requirements


We need to add requirements for the reader.
It may also be useful to create a new failure code for when the downstream node sent us a legacy error.

Requirement added for the reader.

It may also be useful to create a new failure code for when the downstream node sent us a legacy error.

I don't think we can know this, can we?

We can't always know for sure but the size of the packet we receive should be a good indicator, legacy error being much shorter than attributable errors.
What do you do if the downstream node returns a legacy error (or any error that's too short)? There are no payloads and hmacs to shift.

Ah I see. In lightningnetwork/lnd#7139, a properly-structured all-zeroes (attributable) failure message is returned in this case. With that, the sender at least has an indication of the offending node's position, but the message itself is not defined.

Good suggestion to make that a specific error.

thomash-acinq · 2023-11-07T13:04:22Z

The spec should say what to do if

the onion is invalid,
the error to relay is too short.

I've updated my implementation to match yours:

default to legacy errors,
pretend we've received all zeros.
This seems like an acceptable solution as it preserves the attributability of the error.

We could also consider an alternative system where the use-attributable-errors bit is in the update_add_htlc message instead of being inside the onion. That way we always have access to it even if the onion is invalid (or if we decide to reject the HTLC before even reading the onion, for instance if the CLTV is too low).
And by adding a new error message to wrap a legacy error, we could even get partial attributability if the first nodes of the route support attributable errors but not the last ones.

joostjager · 2023-11-07T13:38:35Z

We could also consider an alternative system where the use-attributable-errors bit is in the update_add_htlc message instead of being inside the onion. That way we always have access to it even if the onion is invalid (or if we decide to reject the HTLC before even reading the onion, for instance if the CLTV is too low).

The idea would be to let all nodes just carry forward the use-attributable-errors bit to the node downstream? It would save some bytes in the onion too, although at some point in the future those bytes would probably be freed up anyway when it is decided that legacy failures are no longer supported.

I do wonder if this opens up attack vectors. A node forwarding an htlc and flipping use-attributable-errors from false to true, or vice versa.

And by adding a new error message to wrap a legacy error, we could even get partial attributability if the first nodes of the route support attributable errors but not the last ones.

The legacy error is wrapped and the sender can interpret it, even though it went through a few attributable error nodes? I've also been thinking about this, but I doubted that it is worth the extra complexity. Given enough time, most nodes will be upgraded and mixed paths aren't needed any longer. Attributable errors are not a critical hot fix, so I think we've the time for that?

joostjager · 2023-11-08T09:35:12Z

04-onion-routing.md

+
+The field `hmacs` contains truncated authentication codes for each hop, with a
+`um` type key generated using the above process. Regular 32 byte hmacs are
+truncated to the first 4 bytes to save space.


I came across the algorithm for hmac based one time passwords: https://en.wikipedia.org/wiki/HMAC-based_one-time_password#Definition

Interestingly they use a more complicated way to truncate the hash (truncate(MAC)). It isn't clear to me that this makes it any more secure, but perhaps it is defense in depth?

https://datatracker.ietf.org/doc/html/rfc4226#section-5.3

joostjager mentioned this pull request Nov 21, 2022

Lightning Specification Meeting 2022/11/21 #1041

Closed

28 tasks

tnull reviewed Nov 21, 2022

View reviewed changes

04-onion-routing.md Show resolved Hide resolved

joostjager mentioned this pull request Nov 22, 2022

draft: Staking Credentials token issuance/redemption #1043

Draft

t-bast mentioned this pull request Dec 5, 2022

Lightning Specification Meeting 2022/12/05 #1046

Closed

28 tasks

joostjager mentioned this pull request Dec 6, 2022

Attributable errors lightningnetwork/lightning-onion#60

Open

thomash-acinq mentioned this pull request Dec 6, 2022

Attributable errors ACINQ/eclair#2519

Draft

joostjager force-pushed the fat-errors branch 2 times, most recently from 4b48481 to 24b10d5 Compare December 6, 2022 16:52

joostjager force-pushed the fat-errors branch from 24b10d5 to 76dbf21 Compare December 7, 2022 09:14

joostjager commented Dec 7, 2022

View reviewed changes