FIP-0027: Use a Union type for the Label field #318

arajasek · 2022-03-10T18:49:21Z

Instead of going string -> bytes, go string -> Union(byte,string)

jennijuju · 2022-03-10T22:17:58Z

@ribasushi I cant add you as a reviewer for some reason but would be good if you 👁️

ribasushi · 2022-03-10T22:25:23Z

For golang this will work, but I am not sure what the effect of a double-stringlike-union is in rust-land ( I'd defer to @vmx and @dignifiedquire )
Instinctively this seems way overcomplicated, but I also do not know enough about the nature of difficulties that got us here.

ribasushi · 2022-03-10T22:28:03Z

To be even more specific: I feel that this part of the proposal is a step in the wrong direction in terms of maintainability:

New deals can be created with UTF-8 string labels, and can be encoded and decoded

New deals can be created with byte string labels, and can be encoded and decoded

dignifiedquire · 2022-03-10T23:11:18Z

FIPS/fip-0027.md

@@ -1,7 +1,7 @@
 ---
 fip: "0027"
-title: Change type of DealProposal Label field from String to byte array
-author: Laudiacay (@laudiacay)
+title: Change type of DealProposal Label field from String to a Union


maybe first clarify that the original definition is not actually a String, but rather a golang String, which in turn means a fixed size byte array, with no actual validation for being characters asfaik

That's a good idea, thanks

ZenGround0 · 2022-03-11T00:06:12Z

For golang this will work, but I am not sure what the effect of a double-stringlike-union is in rust-land

This was motivated by a desire to make everything sane in rust. Rust's big "problem" is that it really doesn't want to make invalid CBOR Major Type 3 fields containing non UTF-8 bytes. Implementing a field in DealProposal struct that is either CBOR major type 2 or 3 is not a problem in rust. Rust will force a failure if someone tries to pack non-UTF8 bytes into major type 3 and that will be that.

Instinctively this seems way overcomplicated, but I also do not know enough about the nature of difficulties that got us here.

True it's more complicated than just allowing a random byte string. What it gets us is no collateral damage of deals made across the upgrade boundary. I don't think living with this is significant maintenance burden. As the FIP mentions the HAMT has a field that is a union of cbor types and this has not caused us any maintenance problem over the past year. But if we decide we don't like this we can encourage people to move to major type 2 label fields, by changing go-fil-markets and messaging other deal makers more broadly, and then phase out major type 3 label fields in a later upgrade once the risk to deal making interruption is reduced.

anorth · 2022-03-11T01:26:54Z

This makes sense to me as a workable direction, but that's not a strong endorsement. I'm still keen to hear other feedback.

It could help a lot to see code snippets of how this would manifest in Go and Rust.

vmx · 2022-03-11T08:57:27Z

Disclaimer: I know very little about blockchains or this part of Filecoin. So please read this as a comment from someone who is new to this and kind of an outsider perspective.

The current problem is, that currently invalid non-spec compliant CBOR is produced. This means that third party CBOR tooling might not work with that data. We all agree that this should be fixed. The question is how.

If I'd create something for Filecoin, being it an implementation, some analysis tool or whatever, I'd be more than happy, if I could just keep things simple and have as little moving parts as possible. Ideally I wouldn't need to put extra thought into how to process some label field. So if this field would just have a single CBOR type, that would be great.

Now we come to the lack of my Filecoin knowledge. I don't know how much work/time/processing a migration of label fields take. If I understand correctly, with this proposal, we would need to go over every label field and check whether it is valid utf-8 or not. Then decide whether to keep the CBOR text string type, or change to CBOR byte string type. Would it make much difference to switch everything to the CBOR byte string type without any checking?

Again, I don't know what is involved in this and it might just not be possible/too much effort: what if new deals would only use the CBOR byte string type from a certain point on? We then wait until there are no deals with the CBOR byte string type left and only then do the migration above.

This surely is operational wise more effort. But it's a one time thing. Once done we have nice valid CBOR on-chain and never have to bother about it again. No additional complexity is added to neither protocol implementations, nor third party tools.

arajasek · 2022-03-12T19:07:19Z

Instinctively this seems way overcomplicated, but I also do not know enough about the nature of difficulties that got us here.

@ribasushi Is the Design Rationale section not clear, or are you not convinced that a real issue exists? I'm happy to elaborate (as I indicated here, I may be hallucinating problems that don't actually exist).

Now we come to the lack of my Filecoin knowledge. I don't know how much work/time/processing a migration of label fields take. If I understand correctly, with this proposal, we would need to go over every label field and check whether it is valid utf-8 or not. Then decide whether to keep the CBOR text string type, or change to CBOR byte string type. Would it make much difference to switch everything to the CBOR byte string type without any checking?

Again, I don't know what is involved in this and it might just not be possible/too much effort: what if new deals would only use the CBOR byte string type from a certain point on? We then wait until there are no deals with the CBOR byte string type left and only then do the migration above.

This surely is operational wise more effort. But it's a one time thing. Once done we have nice valid CBOR on-chain and never have to bother about it again. No additional complexity is added to neither protocol implementations, nor third party tools.

@vmx Thanks for the feedback!

Migrating all the labels with checking is not too much work. We have performed much more complicated migrations, so we are not constrained by that.
I think we should definitely have a Phase 2 to this FIP, in which we move to only using byte strings. The challenge is primarily that we cannot do so atomically without breaking deal-making for an extended period. I think there has to be an overlap period in which both UTF-8 strings and bytes are accepted, and then in a future network upgrade, support for strings gets deprecated entirely.

I will add a note about this to the FIP itself, thank you!

arajasek · 2022-03-12T19:21:51Z

@anorth A prototype in go-actors can be found here, as well as its integration in the go-fil-markets module and Lotus client.

ribasushi · 2022-03-12T20:40:54Z

are you not convinced that a real issue exists?

@arajasek no, I am not convinced you are actually solving the issue. Refer to the sentence from the original roadblock

The issue is that all of the de/serializaton steps fail if they were done on opposite sides of the switch from string to bytes. It is not clear when clients should start using bytes instead of strings, or when SPs should start expecting clients to be using bytes instead of strings. Further, all PublishStorageDeals messages in node message pools (waiting to be included in a block) at the time of the network upgrade will fail when they land on chain.

At present, as @vmx very correctly noted, the on-chain protocol encodes an expectation that the client signature within PSD is done over invalid CBOR ( non-utf8 bytes within a type 3 CBOR message ). You will have the problem either way since:

under old proposal
the receiver will de/re-serialize from type 3 to type 2, ending up with different CBOR and invalid signature and different deal cid
under new proposal
the receiver will de/re-serialize from type 3 to the union, ending up with different CBOR and invalid signature and different deal cid

What I am getting at is that what is on chain is not relevant, rather what is a signature verification made against is. Any change there will be equally painful. Moreover: a PSD message is very long lived ( potentially months ), and the CID of the serialization is also very tightly coupled with the FSM ( e.g. lotus-miner storage-deal import-data {{that-very-cid-of-broken-cbor}} )

I again would like to reiterate my original proposal: fix the chain-storage asap to unblock lowest-level rust-tools, but keep everything Sig/CID-affecting using the status quo invalid CBOR representation, until we figure out a much more comprehensive way forward.

arajasek · 2022-03-12T22:14:47Z

@ribasushi Thanks for the reply! So, I agree -- we still have the problem for non-UTF-8 bytes. The chief difference between the old and new proposal is that in the old proposal all deals would fail (even those with empty labels!), whereas in the new proposal only non-UTF-8 labels will cause deals to fail.

We can (and should) broadcast to users that this will be the case. To my knowledge, most tooling does not try to set non-UTF-8 strings.

Does this make more sense now? Please push back if you're unconvinced.

ribasushi · 2022-03-12T23:18:31Z

whereas in the new proposal only non-UTF-8 labels will cause deals to fail

Thank you for clarifying, I missed that in my mental model / the FIP does not make this super clear. Let me rephrase this in riba-speak:

At present a logical inconsistency exists between the expected on-chain type ( 3/ utf8 ) and the main reference implementation of filecoin where the value is effectively a random-byte type ( the programming language underlying said implementation does not possess a checked utf8 type ). While the current active chain state does not contain values where this inconsistency manifests, historic state does, and so could a potential future state.

The most straightforward modification of switching the chain type to ( 2 / byte ) is problematic as all signatures and tracking CIDs of deal proposals / PCD messages are made over the serialized CBOR, thus making type 3 a part of the off-chain spec.

Proposed resolution is to use an IPLD logical union of byte/utf8, which manifests during CBOR encoding and decoding as (exclusively) one of type 2 or type 3 on-wire tuple. As most ( at the time of writing all ) deal proposals in the state carry valid utf8 payloads, their encoding will remain unchanged, thus keeping the signatures/cids valid.

I no longer have strong objections to the proposal, with minor unease being:

This corners us even harder into supporting the IPLD union going forward, potentially complicating non go/rust implementations which need to learn how to parse sluightly differing CBOR
A utf8/byte union is slightly undefined within IPLD as it does not cleanly roundtrip via DAG-JSON. This is unlikely to be a big issue, since signatures are defined exclusively over the DAG-CBOR representation of the proposal. Nevertheless I continue having a bad feeling of using such a dark and unexplored corner of IPLD in such a critical spot ( @rvagg might have some more thoughts on this point )

If neither of these raise alarm:

arajasek · 2022-03-13T16:04:12Z

@ribasushi Yup, Riba-speak aligns well with my understanding here. I will add more detail about the custom CBOR-encoding, and the purpose it serves here (to mimic the existing string labels as much as possible).

This corners us even harder into supporting the IPLD union going forward, potentially complicating non go/rust implementations which need to learn how to parse sluightly differing CBOR

I'm not sure I share your concern here. The union is really a temporary migration-type in my head, that will exist for the span of one network upgrade. In the network upgrade after the one that introduces this, we can migrate everything to bytes (see the Future Considerations section).

Stebalien · 2022-03-14T19:04:17Z

FIPS/fip-0027.md

@@ -16,38 +16,78 @@ spec-sections:

 ## Simple Summary
 <!--"If you can't explain it simply, you don't understand it well enough." Provide a simplified and layman-accessible explanation of the FIP.-->
-DealProposal's Label field is currently a String, but is not enforced to be UTF-8. This is not only outside the CBOR string specification, but can also be a source of bugs and difficulties in client implementations due to variations in String libraries. Rather than enforce it as UTF-8 everywhere, this FIP changes the type to be raw bytes for ease of implementation, bug avoidance, and CBOR compliance.
+Makers of deals on Filecoin can specify a "label" for their deals. This label is stored on the Filecoin blockchain, so it is important that there be no risk that a "bad" label can cause issues in nodes on the Filecoin network. Today, Filecoin does not enforce that this label meet UTF-8 encoding, even though it is increasingly the case that systems assume all strings are UTF-8 (see [here](https://en.wikipedia.org/wiki/Popularity_of_text_encodings#Popularity_internally_in_software) for more). This FIP proposes making a change to remove this abnormality.


I think it's important to keep the note about non-utf8 strings being invalid CBOR.

@Stebalien It's in the abstract below, but I don't think it belongs in the "Simple Summary" ("a simplified and layman-accessible explanation of the FIP")

I agree with @Stebalien here. I don't think the point is

even though it is increasingly the case that systems assume all strings are UTF-8

Even if it wouldn't be the case this should be fixed, because it is not a good idea to create invalid CBOR. It's independent of whether systems tend to use UTF-8 for strings or not.

@vmx Yeah, that's a fair point. I'll add the detail in simple language.

FIPS/fip-0027.md

FIP-0027: Use a Union type for the Label field

77ed337

arajasek requested review from momack2, jennijuju, kaitlin-beegle and anorth as code owners March 10, 2022 18:49

jennijuju requested a review from ZenGround0 March 10, 2022 22:17

dignifiedquire reviewed Mar 10, 2022

View reviewed changes

Add more detail to the Union label proposal

e854378

This was referenced Mar 12, 2022

Change Label to be a Union of strings and bytes filecoin-project/specs-actors#1580

Merged

FIP 0027 integration filecoin-project/go-fil-markets#694

Merged

DO NOT MERGE: Integrate proposed change to make deal Label a Union filecoin-project/lotus#8304

Closed

Stebalien reviewed Mar 14, 2022

View reviewed changes

Stebalien approved these changes Mar 17, 2022

View reviewed changes

Address review

abdfdce

arajasek force-pushed the asr/fip-0027 branch from 4febaaf to abdfdce Compare March 17, 2022 22:45

kaitlin-beegle approved these changes Mar 17, 2022

View reviewed changes

kaitlin-beegle merged commit 227b753 into master Mar 17, 2022

kaitlin-beegle deleted the asr/fip-0027 branch March 17, 2022 23:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIP-0027: Use a Union type for the Label field #318

FIP-0027: Use a Union type for the Label field #318

arajasek commented Mar 10, 2022 •

edited

jennijuju commented Mar 10, 2022

ribasushi commented Mar 10, 2022

ribasushi commented Mar 10, 2022

dignifiedquire Mar 10, 2022

arajasek Mar 12, 2022

ZenGround0 commented Mar 11, 2022 •

edited

anorth commented Mar 11, 2022

vmx commented Mar 11, 2022

arajasek commented Mar 12, 2022

arajasek commented Mar 12, 2022

ribasushi commented Mar 12, 2022

arajasek commented Mar 12, 2022

ribasushi commented Mar 12, 2022

arajasek commented Mar 13, 2022

Stebalien Mar 14, 2022

arajasek Mar 17, 2022

vmx Mar 17, 2022

arajasek Mar 17, 2022

FIP-0027: Use a Union type for the Label field #318

FIP-0027: Use a Union type for the Label field #318

Conversation

arajasek commented Mar 10, 2022 • edited

jennijuju commented Mar 10, 2022

ribasushi commented Mar 10, 2022

ribasushi commented Mar 10, 2022

dignifiedquire Mar 10, 2022

Choose a reason for hiding this comment

arajasek Mar 12, 2022

Choose a reason for hiding this comment

ZenGround0 commented Mar 11, 2022 • edited

anorth commented Mar 11, 2022

vmx commented Mar 11, 2022

arajasek commented Mar 12, 2022

arajasek commented Mar 12, 2022

ribasushi commented Mar 12, 2022

arajasek commented Mar 12, 2022

ribasushi commented Mar 12, 2022

arajasek commented Mar 13, 2022

Stebalien Mar 14, 2022

Choose a reason for hiding this comment

arajasek Mar 17, 2022

Choose a reason for hiding this comment

vmx Mar 17, 2022

Choose a reason for hiding this comment

arajasek Mar 17, 2022

Choose a reason for hiding this comment

arajasek commented Mar 10, 2022 •

edited

ZenGround0 commented Mar 11, 2022 •

edited