Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preferred Serialization and Canonical encoding in CBOR #28

Open
Sekenre opened this issue Oct 23, 2018 · 7 comments
Open

Preferred Serialization and Canonical encoding in CBOR #28

Sekenre opened this issue Oct 23, 2018 · 7 comments

Comments

@Sekenre
Copy link
Collaborator

Sekenre commented Oct 23, 2018

The CBOR mailing list has been discussing the definition of Canonical in the standard and have been making changes. I wanted to document these and maybe discuss how they might be implemented in cbor2.

Updated draft standard: https://datatracker.ietf.org/doc/draft-ietf-cbor-7049bis

What constitutes a true canonical encoding can be redefined at will by protocol implementers, CBOR standard provides guidelines.

A general encoder/decoder like cbor2 will need to support a number of variations and validate them.

Constraints which have been discussed on the ietf mailing list and in the updated draft:

  • Shortest possible float representations
  • Fixed float representations
  • Fixed integer representations
  • Maps sorted by lexicographical ordering of encoded value (DRAFT)
  • Maps sorted by ascending length then lexicographic (RFC7049)

There may be tagged arrays created for fixed length binary encodings of float values. (Tag values TBD)

See: https://datatracker.ietf.org/doc/draft-ietf-cbor-array-tags/

Decoders may need to validate these by raising errors if the following conditions are met:

  • Indefinite length types
  • Floating point values not in shortest form
  • Floating point values not in fixed representation
  • Integers not in shortest form
  • Integers not in Fixed form
  • Unsorted maps
  • Maps sorted with the wrong algorithm
  • Maps with duplicate keys
  • Incorrect tag type

Instead of a single canonical=True argument there needs to be separate flags for each potential constraint.

For example, if a device expects only 16bit floating point data you could create the encoder like this:

encoder = CBOREncoder(f, float_format="binary16")
encoder.encode(data)

Or for a minimal float encoding and sorted maps using the encoded length

encoder = CBOREncoder(f, float_format="minimal", sort_maps=True, sort_by_length=True)
encoder.encode(data)

On the decoding side:

decoder = CBORDecoder(f, validate_floats_as="binary16")
result = decoder.decode()
decoder = CBORDecoder(f, validate_floats_as="minimal",
    validate_map_order=True,
    ordered_by_length=True,
    ignore_duplicate_keys=False)
result = decoder.decode()

Of course these argument names and the way they are set up are just intended as an example.

@agronholm
Copy link
Owner

At this point I think you should become the maintainer of cbor2. How about it?

@Sekenre
Copy link
Collaborator Author

Sekenre commented Oct 24, 2018

I'd be honoured! I am inexperienced but very keen 😀

@fsssosei
Copy link

fsssosei commented Nov 2, 2018

The code seems to be a long integer time complexity O (n ^ 2)?
Let's say I have a 10,000 bit integer

@agronholm
Copy link
Owner

What code where?

@Sekenre
Copy link
Collaborator Author

Sekenre commented Nov 5, 2018

The code seems to be a long integer time complexity O (n ^ 2)?
Let's say I have a 10,000 bit integer

Hi @fsssosei could you open a new issue for this and I will look at it when I get the chance?

@henryk
Copy link

henryk commented May 18, 2022

To add some food for thought: Here is an article discussing the different "canonical" encodings in CBOR: https://www.imperialviolet.org/2022/04/17/canonsofcbor.html
It also proposes a naming scheme. RFC 7049 seems to describe "three-step" ordering (but could be read ambigously), RFC 8949 describes "one-step" ordering.

Best as I can tell, cbor2 currently implements three-step ordering. For starters, the documentation could point out the different ways a "canonical" CBOR can be canonical, and document the current state in the library.

@Sekenre
Copy link
Collaborator Author

Sekenre commented May 23, 2022

Absolutely. Thanks @henryk for bringing that article to my attention. I have been playing around with splitting the canonical settings into their own options, i.e. the 3 options for map ordering, fixed or variable-sized floating point. Etc. Then have a backwards compatible default.

Of course it's easy in python, harder in C. I don't think I will implement validating whether something is canonical, but will document how someone could do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants