feat: structured screens for SIGN_MODE_TEXTUAL (cosmos#13434)

## Description Refs: cosmos#11970 Changes target of `SIGN_MODE_TEXTUAL` rendering to be a structured datatype instead of lines of ASCII text. This avoids the complexities of in-band, signaling and allows more capable signing devices not to be hindered by the limitations of those less capable. --- ### Author Checklist *All items are required. Please add a note to the item if the item is not applicable and please add links to any relevant follow up issues.* I have... - [X] included the correct [type prefix](https://github.com/commitizen/conventional-commit-types/blob/v3.0.0/index.json) in the PR title - [X] added `!` to the type prefix if API or client breaking change - [X] targeted the correct branch (see [PR Targeting](https://github.com/cosmos/cosmos-sdk/blob/main/CONTRIBUTING.md#pr-targeting)) - [X] provided a link to the relevant issue or specification - [X] followed the guidelines for [building modules](https://github.com/cosmos/cosmos-sdk/blob/main/docs/building-modules) - [X] included the necessary unit and integration [tests](https://github.com/cosmos/cosmos-sdk/blob/main/CONTRIBUTING.md#testing) - [ ] added a changelog entry to `CHANGELOG.md` - [X] included comments for [documenting Go code](https://blog.golang.org/godoc) - [X] updated the relevant documentation or specification - [X] reviewed "Files changed" and left comments if necessary - [x] confirmed all CI checks have passed NOTE: changelog intentionally omitted - we'll add an entry when cosmos#11970 is complete. ### Reviewers Checklist *All items are required. Please add a note if the item is not applicable and please add your handle next to the items reviewed if you only reviewed selected items.* I have... - [ ] confirmed the correct [type prefix](https://github.com/commitizen/conventional-commit-types/blob/v3.0.0/index.json) in the PR title - [ ] confirmed `!` in the type prefix if API or client breaking change - [ ] confirmed all author checklist items have been addressed - [ ] reviewed state machine logic - [ ] reviewed API design and naming - [ ] reviewed documentation is accurate - [ ] reviewed tests and test coverage - [ ] manually tested (if applicable)
elias-orijtech · Oct 15, 2022 · d9f3eb0 · d9f3eb0
1 parent ebe0f5c
commit d9f3eb0
Show file tree

Hide file tree

Showing 16 changed files with 312 additions and 199 deletions.
diff --git a/docs/architecture/adr-050-sign-mode-textual-annex2.md b/docs/architecture/adr-050-sign-mode-textual-annex2.md
@@ -0,0 +1,122 @@
+# ADR 050: SIGN_MODE_TEXTUAL: Annex 2 XXX
+
+## Changelog
+
+- Oct 3, 2022: Initial Draft
+
+## Status
+
+DRAFT
+
+## Abstract
+
+This annex provides normative guidance on how devices should render a
+`SIGN_MODE_TEXTUAL` document.
+
+## Context
+
+`SIGN_MODE_TEXTUAL` allows a legible version of a transaction to be signed
+on a hardware security devices, such as a Ledger. Early versions of the
+design rendered transactions directly to lines of ASCII text, but this
+proved awkward from its in-band signaling, and for the need to display
+Unicode text within the transaction.
+
+## Decision
+
+`SIGN_MODE_TEXTUAL` renders to an abstract representation, leaving it
+up to device-specific software how to present this representation given the
+capabilities, limitations, and conventions of the deivce.
+
+We offer the following normative guidance:
+
+1. The presentation should be as legible as possible to the user, given
+the capabilities of the device. If legibility could be sacrificed for other
+properties, we would recommend just using some other signing mode.
+Legibility should focus on the common case - it is okay for unusual cases
+to be less legible.
+
+2. The presentation should be invertible if possible without substantial
+sacrifice of legibility.  Any change to the rendered data should result
+in a visible change to the presentation. This extends the integrity of the
+signing to user-visible presentation.
+
+3. The presentation should follow normal conventions of the device,
+without sacrificing legibility or invertibility.
+
+As an illustration of these principles, here is an example algorithm
+for presentation on a device which can display a single 80-character
+line of printable ASCII characters:
+
+- The presentation is broken into lines, and each line is presented in
+sequence, with user controls for going forward or backward a line.
+
+- Expert mode screens are only presented if the device is in expert mode.
+
+- Each line of the screen starts with a number of `>` characters equal
+to the screen's indetation level, followed by a `+` character if this
+isn't the first line of the screen, followed by a space if either a
+`>` or a `+` has been emitted,
+or if this header is followed by a `>`, `+`, or space.
+
+- If the line ends with whitespace or an `@` character, and `@` character
+is appended to the line.
+
+- The following ASCII control characters or backslash (`\`) are converted
+to a backslash followed by a letter code, in the manner of string literals
+in many languages:
+
+    - a: U+0007 alert or bell
+    - b: U+0008 backspace
+    - f: U+000C form feed
+    - n: U+000A line feed
+    - r: U+000D carriage return
+    - t: U+0009 horizontal tab
+    - v: U+000B vertical tab
+    - `\`: U+005C backslash
+
+- All other ASCII control characters, plus non-ASCII Unicode code points,
+are shown as either:
+
+    - `\u` followed by 4 uppercase hex chacters for code points
+    in the basic multilingual plane (BMP).
+
+    - `\U` followed by 8 uppercase hex characters for other code points.
+
+- The screen will be broken into multiple lines to fit the 80-character
+limit, considering the above transformations in a way that attempts to
+minimize the number of lines generated. Expanded control or Unicode characters
+are never split across lines.
+
+Example output:
+
+```
+An introductory line.
+key1: 123456
+key2: a string that ends in whitespace   @
+key3: a string that ends in  a single ampersand - @@
+ >tricky key4<: note the leading space in the presentation
+introducing an aggregate
+> key5: false
+> key6: a very long line of text, please co\u00F6perate and break into
+>+  multiple lines.
+> Can we do further nesting?
+>> You bet we can!
+```
+
+The inverse mapping gives us the only input which could have
+generated this output (JSON notation for string data):
+
+```
+Indent  Text
+------  ----
+0       "An introductory line."
+0       "key1: 123456"
+0       "key2: a string that ends in whitespace   "
+0       "key3: a string that ends in  a single ampersand - @"
+0       ">tricky key4<: note the leading space in the presentation"
+0       "introducing an aggregate"
+1       "key5: false"
+1       "key6: a very long line of text, please coöperate and break into multiple lines."
+1       "Can we do further nesting?"
+2       "You bet we can!"
+```
diff --git a/docs/architecture/adr-050-sign-mode-textual.md b/docs/architecture/adr-050-sign-mode-textual.md
@@ -7,6 +7,7 @@
 - May 16, 2022: Change status to Accepted.
 - Aug 11, 2022: Require signing over tx raw bytes.
 - Sep 07, 2022: Add custom `Msg`-renderers.
+- Sep 18, 2022: Structured format instead of lines of text
 
 ## Status
 
@@ -23,72 +24,119 @@ Protobuf-based SIGN_MODE_DIRECT was introduced in [ADR-020](./adr-020-protobuf-t
 - SIGN_MODE_DIRECT is binary-based and thus not suitable for display to end-users. Technically, hardware wallets could simply display the sign bytes to the user. But this would be considered as blind signing, and is a security concern.
 - hardware cannot decode the protobuf sign bytes due to memory constraints, as the Protobuf definitions would need to be embedded on the hardware device.
 
-In an effort to remove Amino from the SDK, a new sign mode needs to be created for hardware devices. [Initial discussions](https://github.com/cosmos/cosmos-sdk/issues/6513) propose a string-based sign mode, which this ADR formally specifies.
+In an effort to remove Amino from the SDK, a new sign mode needs to be created for hardware devices. [Initial discussions](https://github.com/cosmos/cosmos-sdk/issues/6513) propose a text-based sign mode, which this ADR formally specifies.
 
 ## Decision
 
-We propose to have SIGN_MODE_TEXTUAL’s signing payload `SignDocTextual` to be an array of strings, encoded as a `\n`-delimited string (see point #9). Each string corresponds to one "screen" on the hardware wallet device, with no (or little) additional formatting done by the hardware wallet itself.
+In SIGN_MODE_TEXTUAL, a transaction is rendered into a textual representation,
+which is then sent to a secure device or subsystem for the user to review and sign.
+Unlike `SIGN_MODE_DIRECT`, the transmitted data can be simply decoded into legible text
+even on devices with limited processing and display.
 
-```protobuf
-message SignDocTextual {
-  repeated string screens = 1;
-}
-```
+The textual representation is a sequence of _screens_.
+Each screen is meant to be displayed in its entirety (if possible) even on a small device like a Ledger.
+A screen is roughly equivalent to a short line of text.
+Large screens can be displayed in several pieces,
+much as long lines of text are wrapped,
+so no hard guidance is given, though 40 characters is a good target.
+A screen is used to display a single key/value pair for scalar values
+(or composite values with a compact notation, such as `Coins`)
+or to introduce or conclude a larger grouping.
+
+The text can contain the full range of Unicode code points, including control characters and nul.
+The device is responsible for deciding how to display characters it cannot render natively.
+See [annex 2](./adr-050-sign-mode-textual-annex2.md) for guidance.
 
-The string array MUST follow the specifications below.
+Screens have a non-negative indentation level to signal composite or nested structures.
+Indentation level zero is the top level.
+Indentation is displayed via some device-specific mechanism.
+Message quotation notation is an appropriate model, such as
+leading `>` characters or vertical bars on more capable displays.
 
-### 1. Bijectivity with Protobuf transactions
+Some screens are marked as _expert_ screens,
+meant to be displayed only if the viewer chooses to opt in for the extra detail.
+Expert screens are meant for information that is rarely useful,
+or needs to be present only for signature integrity (see below).
 
-The encoding and decoding operations between a Protobuf transaction (whose definition can be found [here](https://github.com/cosmos/cosmos-sdk/blob/master/proto/cosmos/tx/v1beta1/tx.proto#L13)) and the string array MUST be bijective.
+### Invertible Rendering
 
-We concede that bijectivity is not strictly needed. Avoiding transaction malleability only requires collision resistance on the encoding. Lossless encoding also does not require decodability. However, bijectivity assures both non-malleability and losslessness.
+We require that the rendering of the transaction be invertible:
+there must be a parsing function such that for every transaction,
+when rendered to the textual representation,
+parsing that representation yeilds a proto message equivalent
+to the original under proto equality.
 
-Bijectivity will be tested in two ways:
+Note that this inverse function does not need to perform correct
+parsing or error signaling for the whole domain of textual data.
+Merely that the range of valid transactions be invertible under
+the composition of rendering and parsing.
 
-- by providing a set of test fixtures between a transaction's Proto JSON representation and its TEXTUAL representation, and checking that encoding/decoding in both directions matches the fixtures,
-- by using property testing on the proto transaction itself, and testing that the composition of encoding and decoding yields the original transaction itself.
+Note that the existence of an inverse function ensures that the
+rendered text contains the full information of the original transaction,
+not a hash or subset.
 
-This also prevents users signing over any hashed transaction data (fee, transaction body, `Msg` content that might be hashed etc), which is a security concern for Ledger's security team.
+### Chain State
 
-We propose to maintain functional tests using bijectivity in the SDK.
+The rendering function (and parsing function) may depend on the current chain state.
+This is useful for reading parameters, such as coin display metadata,
+or for reading user-specific preferences such as language or address aliases.
+Note that if the observed state changes between signature generation
+and the transaction's inclusion in a block, the delivery-time rendering
+might differ. If so, the signature will be invalid and the transaction
+will be rejected.
 
-### 2. Only ASCII 32-127 characters allowed
+### Signature and Security
 
-Ledger devices have limited character display capabilities, so all strings MUST only contain ASCII characters in the 32-127 range.
+For security, transaction signatures should have three properties:
 
-In particular, the newline `"\n"` (ASCII: 10) character is forbidden.
+1. Given the transaction, signatures, and chain state, it must be possible to validate that the signatures matches the transaction,
+to verify that the signers must have known their respective secret keys.
 
-### 3. Strings SHOULD have the `<key>: <value>` format
+2. It must be computationally infeasible to find a substantially different transaction for which the given signatures are valid, given the same chain state.
 
-Given the Regex `/^(\* )?(>* )?(.*: )?(.*)$/`, all strings SHOULD match the Regex with capture groups 3 and 4 non-empty. This is helpful for UIs displaying SignDocTextual to users.
+3. The user should be able to give informed consent to the signed data via a simple, secure device with limited display capabilities.
 
-- The initial `*` character is optional and denotes the Ledger Expert mode, see #5.
-- Strings can also include a number of `>` character to denote nesting.
-- In the case where the first Regex capture group is not empty, it represents an indicative key, whose associated value is given in the second capture group. This MAY be used in the Ledger app to perform custom on-screen formatting, for example to break long lines into multiple screens.
+The correctness and security of `SIGN_MODE_TEXTUAL` is guaranteed by demonstrating an inverse function from the rendering to transaction protos.
+This means that it is impossible for a different protocol buffer message to render to the same text.
 
-This Regex is however not mandatory, to allow for some flexibility, for example to display an English sentence to denote end of sections.
+### Transaction Hash Malleability
 
-The `<value>` itself can contain the `": "` characters.
+When client software forms a transaction, the "raw" transaction (`TxRaw`) is serialized as a proto
+and a hash of the resulting byte sequence is computed.
+This is the `TxHash`, and is used by various services to track the submitted transaction through its lifecycle.
+Various misbehavior is possible if one can generate a modified transaction with a different TxHash
+but for which the signature still checks out.
 
-### 4. Values are encoded using Value Renderers
+SIGN_MODE_TEXTUAL prevents this transaction malleability by including the TxHash as an expert screen
+in the rendering.
 
-Value Renderers describe how Protobuf types are encoded to and decoded from a string array. The full specification of Value Renderers can be found in [Annex 1](./adr-050-sign-mode-textual-annex1.md).
+### SignDoc
 
-### 5. Strings starting with `*` are only shown in Expert mode
+The SignDoc for `SIGN_MODE_TEXTUAL` is formed from a data structure like:
 
-Ledger devices have the an Expert mode for advanced users. Expert mode needs to be manually activated by the device holder, inside the device settings. There is currently no official documentation on Expert mode, but according to the [@Ledger_Support twitter account](https://twitter.com/Ledger_Support/status/1364524431800950785),
+```
+type Screen struct {
+  Text string text  // possibly size limited to, e.g. 255 characters
+  Indent uint8  // size limited to something small like 16 or 32
+  Expert bool
+}
 
-> Expert mode enables further, more sophisticated features. This could be useful for advanced users
+type SignDocTextual = []Screen
+```
 
-Strings starting with the `*` character will only be shown in Expert mode. These strings are either hardcoded in the transaction envelope (see point #7).
+We do not plan to use protobuf serialization to form the sequence of bytes
+that will be tranmitted and signed, in order to keep the decoder simple.
+We will use [CBOR](https://cbor.io) ([RFC 8949](https://www.rfc-editor.org/rfc/rfc8949.html)) instead.
 
-For hardware wallets that don't have an expert mode, all strings MUST be shown on the device.
+TODO: specify the details of the CBOR encoding.
 
-### 6. Strings MAY contain `>` characters to denote nesting
+## Details
 
-Protobuf objects can be arbitrarily complex, containing nested arrays and messages. In order to help the Ledger-signing users, we propose to use the `>` symbol in the beginning of strings to represent nested objects, where each additional `>` represents a new level of nesting.
+In the examples that follow, screens will be shown as lines of text,
+indentation is indicated with a leading '>',
+and expert screens are marked with a leading `*`.
 
-### 7. Encoding of the Transaction Envelope
+### Encoding of the Transaction Envelope
 
 We define "transaction envelope" as all data in a transaction that is not in the `TxBody.Messages` field. Transaction envelope includes fee, signer infos and memo, but don't include `Msg`s. `//` denotes comments and are not shown on the Ledger device.
 
@@ -120,7 +168,7 @@ Tip: <string>
 *Hash of raw bytes: <hex_string>                            // Hex encoding of bytes defined in #10, to prevent tx hash malleability.
 ```
 
-### 8. Encoding of the Transaction Body
+### Encoding of the Transaction Body
 
 Transaction Body is the `Tx.TxBody.Messages` field, which is an array of `Any`s, where each `Any` packs a `sdk.Msg`. Since `sdk.Msg`s are widely used, they have a slightly different encoding than usual array of `Any`s (Protobuf: `repeated google.protobuf.Any`) described in Annex 1.
 
@@ -161,7 +209,7 @@ Grantee: cosmos1ghi...jkl
 End of transaction messages
 ```
 
-### 9. Custom `Msg` Renderers
+### Custom `Msg` Renderers
 
 Application developers may choose to not follow default renderer value output for their own `Msg`s. In this case, they can implement their own custom `Msg` renderer. This is similar to [EIP4430](https://github.com/ethereum/EIPs/blob/master/EIPS/eip-4430.md), where the smart contract developer chooses the description string to be shown to the end user.
 
@@ -181,25 +229,9 @@ When this option is set on a `Msg`, a registered function will transform the `Ms
 
 The `<unique algorithm identifier>` is a string convention chosen by the application developer and is used to identify the custom `Msg` renderer. For example, the documentation or specification of this custom algorithm can reference this identifier. This identifier CAN have a versioned suffix (e.g. `_v1`) to adapt for future changes (which would be consensus-breaking). We also recommend adding Protobuf comments to describe in human language the custom logic used.
 
-Moreover, the renderer must provide 2 functions: one for formatting from Protobuf to string, and one for parsing string to Protobuf. These 2 functions are provided by the application developer. To satisfy point #1, these 2 functions MUST be bijective with each other. Bijectivity of these 2 functions will not be checked by the SDK at runtime. However, we strongly recommend the application developer to include a comprehensive suite in their app repo to test bijectivity, as to not introduce security bugs. A simple bijectivity test looks like:
-
-```
-// for renderer, msg, and ctx of the right type
-keyvals, err := renderer.Format(ctx, msg)
-if err != nil {
-    fail_check()
-}
-msg2, err := renderer.Parse(ctx, keyvals)
-if err != nil {
-    fail_check()
-}
-if !proto.Equal(msg, msg2) {
-    fail_check()
-}
-pass_check()
-```
+Moreover, the renderer must provide 2 functions: one for formatting from Protobuf to string, and one for parsing string to Protobuf. These 2 functions are provided by the application developer. To satisfy point #1, these 2 functions MUST be bijective with each other. Bijectivity of these 2 functions will not be checked by the SDK at runtime. However, we strongly recommend the application developer to include a comprehensive suite in their app repo to test bijectivity, as to not introduce security bugs.
 
-### 10. Require signing over the `TxBody` and `AuthInfo` raw bytes
+### Require signing over the `TxBody` and `AuthInfo` raw bytes
 
 Recall that the transaction bytes merklelized on chain are the Protobuf binary serialization of [TxRaw](https://github.com/cosmos/cosmos-sdk/blob/v0.46.0/proto/cosmos/tx/v1beta1/tx.proto#L33), which contains the `body_bytes` and `auth_info_bytes`. Moreover, the transaction hash is defined as the SHA256 hash of the `TxRaw` bytes. We require that the user signs over these bytes in SIGN_MODE_TEXTUAL, more specifically over the following string:
 
@@ -218,17 +250,9 @@ By including this hash in the SIGN_MODE_TEXTUAL signing payload, we keep the sam
 
 These bytes are only shown in expert mode, hence the leading `*`.
 
-### 11. Signing Payload and Wire Format
-
-This string array is encoded as a single `\n`-delimited string before transmitted to the hardware device, and this long string is the signing payload signed by the hardware wallet.
-
 ## Additional Formatting by the Hardware Device
 
-Hardware devices differ in screen sizes and memory capacities. The above specifications are all verified on the protocol level, but we still allow the hardware device to add custom formatting rules that are specific to the device. Rules can include:
-
-- if a string is too long, show it on multiple screens,
-- break line between the `key` and `value` from #3,
-- perform line breaks on a number or a coin values only when necessary. For example, a `sdk.Coins` with multiple denoms would be better shown as one denom per line instead of an coin amount being cut in the middle.
+See [annex 2](./adr-050-sign-mode-textual-annex2.md).
 
 ## Examples