Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attributable errors (feature 36/37) #1044

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

joostjager
Copy link
Collaborator

@joostjager joostjager commented Nov 21, 2022

Error attribution is important to properly penalize nodes after a payment failure occurs. The goal of the penalty is to give the next attempt a better chance at succeeding. In the happy failure flow, the sender is able to determine the origin of the failure and penalizes a single node or pair of nodes.

Unfortunately it is possible for nodes on the route to hide themselves. If they return random data as the failure message, the sender won't know where the failure happened.

This PR proposes a new failure message format that lets each node commit to the failure message. If one of the nodes corrupts the failure message, the sender will be able to identify that node.

For more information, see https://lists.linuxfoundation.org/pipermail/lightning-dev/2022-October/003723.html.

LND implementation: lightningnetwork/lnd#7139

Note: the PR is currently authored as a change rather than an addition. Later on it can be converted either to an extension bolt or merged into the main spec with if..then..else sentences.

@thomash-acinq
Copy link

I've started implementing it in eclair, do you have some test vectors so we can check that we are compatible?
The design seems good to me, but as I've said previously, I think keeping hop payloads and hmacs for 8 nodes only (instead of 27) is enough for almost all use cases and would give us huge size savings.

@joostjager
Copy link
Collaborator Author

joostjager commented Dec 6, 2022

I don't have test vectors yet, but I can produce them. Will add them to this PR when ready.

Capping the max hops at a lower number is fine to me, but do you have a scenario in mind where this would really make the difference? Or is it to more generally that everything above 8 is wasteful?

@joostjager joostjager force-pushed the fat-errors branch 2 times, most recently from 4b48481 to 24b10d5 Compare December 6, 2022 16:52
@joostjager
Copy link
Collaborator Author

@thomash-acinq added a happy fat error test vector.

04-onion-routing.md Outdated Show resolved Hide resolved
04-onion-routing.md Outdated Show resolved Hide resolved
09-features.md Outdated
@@ -41,6 +41,7 @@ The Context column decodes as follows:
| 20/21 | `option_anchor_outputs` | Anchor outputs | IN | `option_static_remotekey` | [BOLT #3](03-transactions.md) |
| 22/23 | `option_anchors_zero_fee_htlc_tx` | Anchor commitment type with zero fee HTLC transactions | IN | `option_static_remotekey` | [BOLT #3][bolt03-htlc-tx], [lightning-dev][ml-sighash-single-harmful]|
| 26/27 | `option_shutdown_anysegwit` | Future segwit versions allowed in `shutdown` | IN | | [BOLT #2][bolt02-shutdown] |
| 28/29 | `option_fat_error` | Can generate/relay fat errors in `update_fail_htlc` | IN | | [BOLT #4][bolt04-fat-errors] |
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this big gap in the bits has emerged here because of tentative spec changes that may or may not make it. Not sure why that is necessary. I thought for unofficial extensions, the custom range is supposed to be used?

I can see that with unofficial features deployed in the wild, it is easier to keep the same bit when something becomes official. But not sure if that is worth creating the gap here? An alternative is to deploy unofficial features in the custom range first, and then later recognize both the official and unofficial bit. Slightly more code, but this feature list remains clean.

@joostjager
Copy link
Collaborator Author

Added fat error signaling to the PR.

04-onion-routing.md Outdated Show resolved Hide resolved
@thomash-acinq
Copy link

I've spent a lot of time trying to make the test vector pass and I've finally found what was wrong:
In the spec you write that the hmac covers

  • failure_len, failuremsg, pad_len and pad.

  • The first y+1 payloads in payloads. For example, hmac_0_2 would cover all three payloads.

  • y downstream hmacs that correspond to downstream node positions relative to x. For example, hmac_0_2 would cover hmac_1_1 and hmac_2_0.

implying that we need to concatenate them in that order. But in your code you follow a different order:

// Include payloads including our own.
_, _ = hash.Write(payloads[:(NumMaxHops-position)*payloadLen])

// Include downstream hmacs.
var hmacsIdx = position + NumMaxHops
for j := 0; j < NumMaxHops-position-1; j++ {
	_, _ = hash.Write(
		hmacs[hmacsIdx*sha256.Size : (hmacsIdx+1)*sha256.Size],
	)

	hmacsIdx += NumMaxHops - j - 1
}

// Include message.
_, _ = hash.Write(message)

I think the order message + hop payloads + hmacs is more intuitive as it matches the order of the fields in the packet.

@joostjager
Copy link
Collaborator Author

Oh great catch! Will produce a new vector.

@joostjager
Copy link
Collaborator Author

@thomash-acinq updated vector

04-onion-routing.md Outdated Show resolved Hide resolved
@joostjager
Copy link
Collaborator Author

Updated LND implementation with sender-picked fat error structure parameters: lightningnetwork/lnd#7139

04-onion-routing.md Outdated Show resolved Hide resolved
Copy link

@thomash-acinq thomash-acinq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eclair has a basic implementation of the latest version ready: ACINQ/eclair#2519
It doesn't signal the feature yet (conflict with dual funding) and does not make use of attributable errors but should relay them correctly (compatible with the test vector).

09-features.md Outdated Show resolved Hide resolved
`attributable_error` is a marker record that specifies that the failure message
in the `update_fail_htlc` response should be structured as an attributable
error. For the legacy format, this record should not be set.

### Requirements

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add requirements for the reader.
It may also be useful to create a new failure code for when the downstream node sent us a legacy error.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requirement added for the reader.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may also be useful to create a new failure code for when the downstream node sent us a legacy error.

I don't think we can know this, can we?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't always know for sure but the size of the packet we receive should be a good indicator, legacy error being much shorter than attributable errors.
What do you do if the downstream node returns a legacy error (or any error that's too short)? There are no payloads and hmacs to shift.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. In lightningnetwork/lnd#7139, a properly-structured all-zeroes (attributable) failure message is returned in this case. With that, the sender at least has an indication of the offending node's position, but the message itself is not defined.

Good suggestion to make that a specific error.

@joostjager joostjager reopened this Nov 6, 2023
@thomash-acinq
Copy link

The spec should say what to do if

  • the onion is invalid,
  • the error to relay is too short.

I've updated my implementation to match yours:

  • default to legacy errors,
  • pretend we've received all zeros.
    This seems like an acceptable solution as it preserves the attributability of the error.

We could also consider an alternative system where the use-attributable-errors bit is in the update_add_htlc message instead of being inside the onion. That way we always have access to it even if the onion is invalid (or if we decide to reject the HTLC before even reading the onion, for instance if the CLTV is too low).
And by adding a new error message to wrap a legacy error, we could even get partial attributability if the first nodes of the route support attributable errors but not the last ones.

@joostjager
Copy link
Collaborator Author

joostjager commented Nov 7, 2023

We could also consider an alternative system where the use-attributable-errors bit is in the update_add_htlc message instead of being inside the onion. That way we always have access to it even if the onion is invalid (or if we decide to reject the HTLC before even reading the onion, for instance if the CLTV is too low).

The idea would be to let all nodes just carry forward the use-attributable-errors bit to the node downstream? It would save some bytes in the onion too, although at some point in the future those bytes would probably be freed up anyway when it is decided that legacy failures are no longer supported.

I do wonder if this opens up attack vectors. A node forwarding an htlc and flipping use-attributable-errors from false to true, or vice versa.

And by adding a new error message to wrap a legacy error, we could even get partial attributability if the first nodes of the route support attributable errors but not the last ones.

The legacy error is wrapped and the sender can interpret it, even though it went through a few attributable error nodes? I've also been thinking about this, but I doubted that it is worth the extra complexity. Given enough time, most nodes will be upgraded and mixed paths aren't needed any longer. Attributable errors are not a critical hot fix, so I think we've the time for that?


The field `hmacs` contains truncated authentication codes for each hop, with a
`um` type key generated using the above process. Regular 32 byte hmacs are
truncated to the first 4 bytes to save space.
Copy link
Collaborator Author

@joostjager joostjager Nov 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I came across the algorithm for hmac based one time passwords: https://en.wikipedia.org/wiki/HMAC-based_one-time_password#Definition

Interestingly they use a more complicated way to truncate the hash (truncate(MAC)). It isn't clear to me that this makes it any more secure, but perhaps it is defense in depth?

https://datatracker.ietf.org/doc/html/rfc4226#section-5.3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants