Skip to content

Commit

Permalink
Address review
Browse files Browse the repository at this point in the history
  • Loading branch information
arajasek committed Mar 17, 2022
1 parent e854378 commit abdfdce
Showing 1 changed file with 11 additions and 16 deletions.
27 changes: 11 additions & 16 deletions FIPS/fip-0027.md
Expand Up @@ -16,11 +16,12 @@ spec-sections:

## Simple Summary
<!--"If you can't explain it simply, you don't understand it well enough." Provide a simplified and layman-accessible explanation of the FIP.-->
Makers of deals on Filecoin can specify a "label" for their deals. This label is stored on the Filecoin blockchain, so it is important that there be no risk that a "bad" label can cause issues in nodes on the Filecoin network. Today, Filecoin does not enforce that this label meet UTF-8 encoding, even though it is increasingly the case that systems assume all strings are UTF-8 (see [here](https://en.wikipedia.org/wiki/Popularity_of_text_encodings#Popularity_internally_in_software) for more). This FIP proposes making a change to remove this abnormality.
Makers of deals on Filecoin can specify a "label" for their deals. This label is stored on the Filecoin blockchain, so it is important that there be no risk that a "bad" label can cause issues in nodes on the Filecoin network. Today, Filecoin does not enforce that this label meet UTF-8 encoding, even though valid CBOR-encoded strings must be UTF-8.
It is also increasingly the case that systems assume all strings are UTF-8 (see [here](https://en.wikipedia.org/wiki/Popularity_of_text_encodings#Popularity_internally_in_software) for more). This FIP proposes making a change to remove this abnormality.

## Abstract
<!--A short (~200 word) description of the technical issue being addressed.-->
The market state's `DealProposal`'s `Label` field is currently a Golang `String`, which is not enforced to be UTF-8. This is outside the CBOR string specification, and can also be a source of bugs and difficulties in client implementations due to variations in String libraries. Rather than enforce it as UTF-8 everywhere, this FIP changes the type to be a Union type that can be either Strings or raw bytes. This meets the goal of CBOR compliance and safety, while still allowing for users to have arbitrary bytes as the `Label`.
The market state's `DealProposal`'s `Label` field is currently a `String` that is not enforced to be UTF-8. This is outside the CBOR string specification, and can also be a source of bugs and difficulties in client implementations due to variations in String libraries. Rather than enforce it as UTF-8 everywhere, this FIP changes the type to be a Union type that can be either Strings or raw bytes. This meets the goal of CBOR compliance and safety, while still allowing for users to have arbitrary bytes as the `Label`.

## Change Motivation
<!--The motivation is critical for FIPs that want to change the Filecoin protocol. It should clearly explain why the existing protocol specification is inadequate to address the problem that the FIP solves. FIP submissions without sufficient motivation may be rejected outright.-->
Expand All @@ -32,27 +33,21 @@ Summary: @ec2 noted that the non-UTF8 strings in this field are causing problems
## Specification
<!--The technical specification should describe the syntax and semantics of any new feature. The specification should be detailed enough to allow competing, interoperable implementations for any of the current Filecoin implementations. -->

The proposal is to change the `Label` to be a Union type, something like
The proposal is to change the `Label` to be a Union type, with the following IPLD schema:

```
struct Label {
label_string string
label_bytes []byte
}
type Label union {
| String string
| Bytes bytes
} representation kinded
```

We apply the following rules:

- Assert that exactly one of `label_string` and `label_bytes` is set.
- De/serialize this object as though it were just one field -- "If label_string is set, serialize it as a string, if not, serialize it as bytes" (and the reverse for decoding). This is NOT the default CBOR encoding of such a struct, custom encoding must be written.
- Assert that if `label_string` is set, it is valid UTF-8. This check would automatically happen in a language like (safe) Rust, and can be enforced at the CBOR level for other languages.

An analogous type can be seen [here](https://github.com/filecoin-project/go-hamt-ipld/blob/8cf7cf9309c8f38dbc15d3673b2354c041817884/hamt.go#L92-L145), with its encoding found [here](https://github.com/filecoin-project/go-hamt-ipld/blob/8cf7cf9309c8f38dbc15d3673b2354c041817884/pointer_cbor.go).
String-type labels will be enforced to be UTF-8. An analogous type can be seen [here](https://github.com/filecoin-project/go-hamt-ipld/blob/8cf7cf9309c8f38dbc15d3673b2354c041817884/hamt.go#L92-L145), with its encoding found [here](https://github.com/filecoin-project/go-hamt-ipld/blob/8cf7cf9309c8f38dbc15d3673b2354c041817884/pointer_cbor.go).

## Design Rationale
<!--The rationale fleshes out the specification by describing what motivated the design and why particular design decisions were made. It should describe alternate designs that were considered and related work, e.g. how the feature is supported in other languages. The rationale may also provide evidence of consensus within the community, and should discuss important objections or concerns raised during discussion.-->

Arguably, the ideal design would have just been to use bytes. However, naively changing the `Label` field in the `DealProposal` from a string to bytes changes the serialization of the `DealProposal`. The current dealmaking flow on Filecoin is as follows (some aspects of this flow aren't protocol-specific, but might as well be, because they reflect the reality of dealmaking on Filecoin today):
For simplicity and largest possible user-functionality, the ideal design would have just been to use a byte array. However, naively changing the `Label` field in the `DealProposal` from a string to bytes changes the serialization of the `DealProposal`. The current dealmaking flow on Filecoin is as follows (some aspects of this flow aren't protocol-specific, but might as well be, because they reflect the reality of dealmaking on Filecoin today):

1) Client creates a `DealProposal`, signs it, wraps the proposal and signature into a `ClientDealProposal`, and sends that to the Storage Provider (SP)
2) SP deserializes the `ClientDealProposal`, validates the signature, adds it to a batch, and eventually `PublishStorageDeals` (sending the serialized `ClientDealProposal` to the chain)
Expand All @@ -61,7 +56,7 @@ Arguably, the ideal design would have just been to use bytes. However, naively c
- serialize the `DealProposal`
- validate that the signature is valid for the serialized `DealProposal`

The issue is that all of the de/serializaton steps fail if they were done on opposite sides of the switch from string to bytes. It is not clear when clients should start using bytes instead of strings, or when SPs should start expecting clients to be using bytes instead of strings. Further, all `PublishStorageDeals` messages in node message pools (waiting to be included in a block) at the time of the network upgrade will fail when they land on chain.
The issue is that all the de/serializaton steps fail if they were done on opposite sides of the switch from string to bytes. It is not clear when clients should start using bytes instead of strings, or when SPs should start expecting clients to be using bytes instead of strings. Further, all `PublishStorageDeals` messages in node message pools (waiting to be included in a block) at the time of the network upgrade will fail when they land on chain.

The Union type solves most of the problems described above. A large refactor may still be needed for client implementations if they don't abstract over possible `DealProposal`, but we avoid a messy period over the upgrade where large swathes of `PublishStorageDeals` messages fail on-chain and large numbers of clients are seeing SPs reject their deal proposals as invalid.

Expand Down

0 comments on commit abdfdce

Please sign in to comment.