From 77ed337aa6de030290af7dc2418e2eb0d89ce722 Mon Sep 17 00:00:00 2001 From: Aayush Rajasekaran Date: Thu, 10 Mar 2022 13:48:37 -0500 Subject: [PATCH 1/3] FIP-0027: Use a Union type for the Label field --- FIPS/fip-0027.md | 58 ++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 49 insertions(+), 9 deletions(-) diff --git a/FIPS/fip-0027.md b/FIPS/fip-0027.md index 47950579f..55d05ada3 100644 --- a/FIPS/fip-0027.md +++ b/FIPS/fip-0027.md @@ -1,7 +1,7 @@ --- fip: "0027" -title: Change type of DealProposal Label field from String to byte array -author: Laudiacay (@laudiacay) +title: Change type of DealProposal Label field from String to a Union +author: Laudiacay (@laudiacay), Steven Allen (@Stebalien), Aayush Rajasekaran (@arajasek) discussions-to: https://github.com/filecoin-project/FIPs/issues/187 Status: Draft type: Technical @@ -16,38 +16,78 @@ spec-sections: ## Simple Summary -DealProposal's Label field is currently a String, but is not enforced to be UTF-8. This is not only outside the CBOR string specification, but can also be a source of bugs and difficulties in client implementations due to variations in String libraries. Rather than enforce it as UTF-8 everywhere, this FIP changes the type to be raw bytes for ease of implementation, bug avoidance, and CBOR compliance. +Makers of deals on Filecoin can specify a "label" for their deals. This label is stored on the Filecoin blockchain, so it is important that there be no risk that a "bad" label can cause issues in nodes on the Filecoin network. Today, Filecoin does not enforce that this label meet UTF-8 encoding, even though it is increasingly the case that systems assume all strings are UTF-8 (see [here](https://en.wikipedia.org/wiki/Popularity_of_text_encodings#Popularity_internally_in_software) for more). This FIP proposes making a change to remove this abnormality. ## Abstract -String libraries differ in implementation of UTF-8. For maximum standardization and ease of bug-free client implementations, the DealProposal label should be raw bytes. Further, non-UTF-8 CBOR strings are technically outside of the CBOR specification, so the current implementation is outside of CBOR spec. +The market state's `DealProposal`'s `Label` field is currently a `String`, but is not enforced to be UTF-8. This is not only outside the CBOR string specification, but can also be a source of bugs and difficulties in client implementations due to variations in String libraries. Rather than enforce it as UTF-8 everywhere, this FIP changes the type to be a Union type that can be either Strings or raw bytes. This meets the goal of CBOR compliance and safety, while still allowing for users to have arbitrary bytes as the `Label`. ## Change Motivation -Discussion is here: https://github.com/filecoin-project/specs-actors/issues/1248 +Discussion is [here](https://github.com/filecoin-project/specs-actors/issues/1248) and [here](https://github.com/filecoin-project/builtin-actors/issues/69) Summary: @ec2 noted that the non-UTF8 strings in this field are causing problems for the client implementations like ChainSafe. @ribasushi noticed that the Label field takes user input more or less directly, so enforcing UTF-7 might be difficult and pointless. Finally, @mikeal posted notes from an IPLD call saying that the source of the problem is that some programming languages enforce UTF-8 on Strings, while others don't, so the most widely compatible type for any on-chain data like this would be to just use a byte array. ## Specification -The Label field of DealProposal should be CBOR bytes with arbitrary data allowed inside. +The proposal is to change the `Label` to be a Union type, something like + +``` +struct Label { + label_string string + label_bytes []byte +} +``` + +We apply the following rules: + +- Assert that exactly one of `label_string` and `label_bytes` is set. +- De/serialize this object as though it were just one field -- "If label_string is set, serialize it as a string, if not, serialize it as bytes" (and the reverse for decoding). This is NOT the default CBOR encoding of such a struct, custom encoding must be written. +- Assert that if `label_string` is set, it is valid UTF-8. This check would automatically happen in a language like (safe) Rust, and can be enforced at the CBOR level for other languages. + +An analagous type can be seen [here](https://github.com/filecoin-project/go-hamt-ipld/blob/8cf7cf9309c8f38dbc15d3673b2354c041817884/hamt.go#L92-L145), with its encoding found [here](https://github.com/filecoin-project/go-hamt-ipld/blob/8cf7cf9309c8f38dbc15d3673b2354c041817884/pointer_cbor.go). ## Design Rationale -Design decisions are in above sections. +Arguably, the ideal design would have just been to use bytes. However, naively changing the `Label` field in the `DealProposal` from a string to bytes changes the serialization of the `DealProposal`. The current dealmaking flow on Filecoin is as follows (some aspects of this flow aren't protocol-specific, but might as well be, because they reflect the reality of dealmaking on Filecoin today): + +1) Client creates a `DealProposal`, signs it, wraps the proposal and signature into a `ClientDealProposal`, and sends that to the Storage Provider (SP) +2) SP deserializes the `ClientDealProposal`, validates the signature, adds it to a batch, and eventually `PublishStorageDeals` (sending the serialized `ClientDealProposal` to the chain) +3) Validators of the Filecoin blockchain: + - deserialize the `ClientDealProposal` into a `DealProposal` and a signature + - serialize the `DealProposal` + - validate that the signature is valid for the serialized `DealProposal` + +The issue is that all of the de/serializaton steps fail if they were done on opposite sides of the switch from string to bytes. It is not clear when clients should start using bytes instead of strings, or when SPs should start expecting clients to be using bytes instead of strings. Further, all `PublishStorageDeals` messages in node message pools (waiting to be included in a block) at the time of the network upgrade will fail when they land on chain. + +The Union type solves most of the problems described above. A large refactor may still be needed for client implementations if they don't abstract over possible `DealProposal`, but we avoid a messy period over the upgrade where large swathes of `PublishStorageDeals` messages fail on-chain and large numbers of clients are seeing SPs reject their deal proposals as invalid. ## Backwards Compatibility -This changes specs-actors. All deal proposals on chain will need to change the encoding of the label field. +A migration is needed in the network upgrade that introduces this FIP. The migration runs over all `DealProposal`s on chain, applying the following rule: + +- If `Label` is UTF-8, use the `string` type of the Union +- Else, use the bytes type. + +Note that this means that most deals will actually not have their encodings change at all, since most deals on Filecoin today _do_ have UTF-8 encoded `Label`s. ## Test Cases -Test that v0-v6 proposals' String labels become CBOR bytes with the same raw byte payload after migration, and that new proposals have labels encoded as CBOR bytes. +Test that: + +- New deals can be created with UTF-8 string labels, and can be encoded and decoded +- New deals can be created with byte string labels, and can be encoded and decoded +- New deals CANNOT be created with non-UTF-8 string labels + +Additionally, migration testing should ensure that: + +- Existing deals with UTF-8 `Label`s are migrated to Unions with strings, and the CBOR encoding of such deals DO NOT change +- Existing deals with non-UTF-8 `Label`s are migrated to Unions with bytes, and the CBOR encoding of such deals do change ## Security Considerations From e854378c768f9884040a0db3467383c01b095226 Mon Sep 17 00:00:00 2001 From: Aayush Date: Sat, 12 Mar 2022 14:14:45 -0500 Subject: [PATCH 2/3] Add more detail to the Union label proposal --- FIPS/fip-0027.md | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/FIPS/fip-0027.md b/FIPS/fip-0027.md index 55d05ada3..580cc5813 100644 --- a/FIPS/fip-0027.md +++ b/FIPS/fip-0027.md @@ -1,6 +1,6 @@ --- fip: "0027" -title: Change type of DealProposal Label field from String to a Union +title: Change type of DealProposal Label field from a (Golang) String to a Union author: Laudiacay (@laudiacay), Steven Allen (@Stebalien), Aayush Rajasekaran (@arajasek) discussions-to: https://github.com/filecoin-project/FIPs/issues/187 Status: Draft @@ -20,7 +20,7 @@ Makers of deals on Filecoin can specify a "label" for their deals. This label is ## Abstract -The market state's `DealProposal`'s `Label` field is currently a `String`, but is not enforced to be UTF-8. This is not only outside the CBOR string specification, but can also be a source of bugs and difficulties in client implementations due to variations in String libraries. Rather than enforce it as UTF-8 everywhere, this FIP changes the type to be a Union type that can be either Strings or raw bytes. This meets the goal of CBOR compliance and safety, while still allowing for users to have arbitrary bytes as the `Label`. +The market state's `DealProposal`'s `Label` field is currently a Golang `String`, which is not enforced to be UTF-8. This is outside the CBOR string specification, and can also be a source of bugs and difficulties in client implementations due to variations in String libraries. Rather than enforce it as UTF-8 everywhere, this FIP changes the type to be a Union type that can be either Strings or raw bytes. This meets the goal of CBOR compliance and safety, while still allowing for users to have arbitrary bytes as the `Label`. ## Change Motivation @@ -47,7 +47,7 @@ We apply the following rules: - De/serialize this object as though it were just one field -- "If label_string is set, serialize it as a string, if not, serialize it as bytes" (and the reverse for decoding). This is NOT the default CBOR encoding of such a struct, custom encoding must be written. - Assert that if `label_string` is set, it is valid UTF-8. This check would automatically happen in a language like (safe) Rust, and can be enforced at the CBOR level for other languages. -An analagous type can be seen [here](https://github.com/filecoin-project/go-hamt-ipld/blob/8cf7cf9309c8f38dbc15d3673b2354c041817884/hamt.go#L92-L145), with its encoding found [here](https://github.com/filecoin-project/go-hamt-ipld/blob/8cf7cf9309c8f38dbc15d3673b2354c041817884/pointer_cbor.go). +An analogous type can be seen [here](https://github.com/filecoin-project/go-hamt-ipld/blob/8cf7cf9309c8f38dbc15d3673b2354c041817884/hamt.go#L92-L145), with its encoding found [here](https://github.com/filecoin-project/go-hamt-ipld/blob/8cf7cf9309c8f38dbc15d3673b2354c041817884/pointer_cbor.go). ## Design Rationale @@ -107,7 +107,11 @@ This should have a positive effect on the ecosystem as a whole, because implemen ## Implementation -[Here](https://github.com/filecoin-project/specs-actors/pull/1496) +A prototype of this implementation on top of the v7 actors release can be seen [here](https://github.com/filecoin-project/specs-actors/commit/c755c0402f62e049c96cc855b46cf123f0958c5c). The actual implementation will have to be on a new version (v8) of actors, but this is useful to test backwards compatibility. + +## Future considerations + +In a subsequent network upgrade, we can consider a FIP that replaces this type with simple byte strings (the ideal design described in the Design Rational section). With sufficient communication, we can announce that support for String labels is being deprecated ahead of time, allowing all users to start using byte strings instead. ## Copyright Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/). From abdfdceaccc45d8e23057d78fcbe8bf4c93d542c Mon Sep 17 00:00:00 2001 From: Aayush Date: Thu, 17 Mar 2022 12:48:21 -0400 Subject: [PATCH 3/3] Address review --- FIPS/fip-0027.md | 27 +++++++++++---------------- 1 file changed, 11 insertions(+), 16 deletions(-) diff --git a/FIPS/fip-0027.md b/FIPS/fip-0027.md index 580cc5813..675500dad 100644 --- a/FIPS/fip-0027.md +++ b/FIPS/fip-0027.md @@ -16,11 +16,12 @@ spec-sections: ## Simple Summary -Makers of deals on Filecoin can specify a "label" for their deals. This label is stored on the Filecoin blockchain, so it is important that there be no risk that a "bad" label can cause issues in nodes on the Filecoin network. Today, Filecoin does not enforce that this label meet UTF-8 encoding, even though it is increasingly the case that systems assume all strings are UTF-8 (see [here](https://en.wikipedia.org/wiki/Popularity_of_text_encodings#Popularity_internally_in_software) for more). This FIP proposes making a change to remove this abnormality. +Makers of deals on Filecoin can specify a "label" for their deals. This label is stored on the Filecoin blockchain, so it is important that there be no risk that a "bad" label can cause issues in nodes on the Filecoin network. Today, Filecoin does not enforce that this label meet UTF-8 encoding, even though valid CBOR-encoded strings must be UTF-8. +It is also increasingly the case that systems assume all strings are UTF-8 (see [here](https://en.wikipedia.org/wiki/Popularity_of_text_encodings#Popularity_internally_in_software) for more). This FIP proposes making a change to remove this abnormality. ## Abstract -The market state's `DealProposal`'s `Label` field is currently a Golang `String`, which is not enforced to be UTF-8. This is outside the CBOR string specification, and can also be a source of bugs and difficulties in client implementations due to variations in String libraries. Rather than enforce it as UTF-8 everywhere, this FIP changes the type to be a Union type that can be either Strings or raw bytes. This meets the goal of CBOR compliance and safety, while still allowing for users to have arbitrary bytes as the `Label`. +The market state's `DealProposal`'s `Label` field is currently a `String` that is not enforced to be UTF-8. This is outside the CBOR string specification, and can also be a source of bugs and difficulties in client implementations due to variations in String libraries. Rather than enforce it as UTF-8 everywhere, this FIP changes the type to be a Union type that can be either Strings or raw bytes. This meets the goal of CBOR compliance and safety, while still allowing for users to have arbitrary bytes as the `Label`. ## Change Motivation @@ -32,27 +33,21 @@ Summary: @ec2 noted that the non-UTF8 strings in this field are causing problems ## Specification -The proposal is to change the `Label` to be a Union type, something like +The proposal is to change the `Label` to be a Union type, with the following IPLD schema: ``` -struct Label { - label_string string - label_bytes []byte -} +type Label union { + | String string + | Bytes bytes +} representation kinded ``` -We apply the following rules: - -- Assert that exactly one of `label_string` and `label_bytes` is set. -- De/serialize this object as though it were just one field -- "If label_string is set, serialize it as a string, if not, serialize it as bytes" (and the reverse for decoding). This is NOT the default CBOR encoding of such a struct, custom encoding must be written. -- Assert that if `label_string` is set, it is valid UTF-8. This check would automatically happen in a language like (safe) Rust, and can be enforced at the CBOR level for other languages. - -An analogous type can be seen [here](https://github.com/filecoin-project/go-hamt-ipld/blob/8cf7cf9309c8f38dbc15d3673b2354c041817884/hamt.go#L92-L145), with its encoding found [here](https://github.com/filecoin-project/go-hamt-ipld/blob/8cf7cf9309c8f38dbc15d3673b2354c041817884/pointer_cbor.go). +String-type labels will be enforced to be UTF-8. An analogous type can be seen [here](https://github.com/filecoin-project/go-hamt-ipld/blob/8cf7cf9309c8f38dbc15d3673b2354c041817884/hamt.go#L92-L145), with its encoding found [here](https://github.com/filecoin-project/go-hamt-ipld/blob/8cf7cf9309c8f38dbc15d3673b2354c041817884/pointer_cbor.go). ## Design Rationale -Arguably, the ideal design would have just been to use bytes. However, naively changing the `Label` field in the `DealProposal` from a string to bytes changes the serialization of the `DealProposal`. The current dealmaking flow on Filecoin is as follows (some aspects of this flow aren't protocol-specific, but might as well be, because they reflect the reality of dealmaking on Filecoin today): +For simplicity and largest possible user-functionality, the ideal design would have just been to use a byte array. However, naively changing the `Label` field in the `DealProposal` from a string to bytes changes the serialization of the `DealProposal`. The current dealmaking flow on Filecoin is as follows (some aspects of this flow aren't protocol-specific, but might as well be, because they reflect the reality of dealmaking on Filecoin today): 1) Client creates a `DealProposal`, signs it, wraps the proposal and signature into a `ClientDealProposal`, and sends that to the Storage Provider (SP) 2) SP deserializes the `ClientDealProposal`, validates the signature, adds it to a batch, and eventually `PublishStorageDeals` (sending the serialized `ClientDealProposal` to the chain) @@ -61,7 +56,7 @@ Arguably, the ideal design would have just been to use bytes. However, naively c - serialize the `DealProposal` - validate that the signature is valid for the serialized `DealProposal` -The issue is that all of the de/serializaton steps fail if they were done on opposite sides of the switch from string to bytes. It is not clear when clients should start using bytes instead of strings, or when SPs should start expecting clients to be using bytes instead of strings. Further, all `PublishStorageDeals` messages in node message pools (waiting to be included in a block) at the time of the network upgrade will fail when they land on chain. +The issue is that all the de/serializaton steps fail if they were done on opposite sides of the switch from string to bytes. It is not clear when clients should start using bytes instead of strings, or when SPs should start expecting clients to be using bytes instead of strings. Further, all `PublishStorageDeals` messages in node message pools (waiting to be included in a block) at the time of the network upgrade will fail when they land on chain. The Union type solves most of the problems described above. A large refactor may still be needed for client implementations if they don't abstract over possible `DealProposal`, but we avoid a messy period over the upgrade where large swathes of `PublishStorageDeals` messages fail on-chain and large numbers of clients are seeing SPs reject their deal proposals as invalid.