Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement attribute to disallow deserialization of struct and struct variant with named fields from sequence #2639

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

Toromyx
Copy link

@Toromyx Toromyx commented Oct 28, 2023

Fixes #1587

  • Implement attribute for struct variants
  • Review code
    • Support in_place deserialization?
  • Decide on attribute name
  • Document new attribute

@Toromyx Toromyx marked this pull request as draft October 28, 2023 00:42
@Toromyx Toromyx changed the title Draft: implement attribute to disallow deserialization of struct with named fields from sequence Implement attribute to disallow deserialization of struct with named fields from sequence Oct 28, 2023
@Toromyx
Copy link
Author

Toromyx commented Oct 28, 2023

Documentation probably needs to be added to https://github.com/serde-rs/serde-rs.github.io, anywhere else?

@Mingun
Copy link
Contributor

Mingun commented Oct 28, 2023

I would think, that we should deny deserialization of maps from sequences (at least by default, but I prefer at all). It is responsibility of a deserializer to call the correct Visitor method to deserialize the type. If deserialize_struct / deserialize_map was requested, and the data type stores only field values in form of sequence, it does not mean that deserializer should call visit_seq. visit_seq is intended to provide sequence and deserializer should not expect that map types could be constructed from it. Instead, the deserializer should provide a compatible MapAccess implementation, that could return correct keys from existing data.

The important point that should be explicitly stressed is that this MapAccess implementation should not try to use fields data to decide what key you should return. You can think that it would be a great idea to return fields[0], fields[1] and so on for keys, that that's not. It is worst idea, because of two problems:

  • serde conversion mechanisms can change calls of deserialize_struct to calls of deserialize_map (in case of flatten structs);
  • the fields argument of a deserialize_struct has no guaranties that it contains fields of a struct in order of declaration, and moreover, (in the last serde versions?) it also contains field aliases (i.e. its length even does not reflect the number of fields in a struct).

Fortunately, map keys deserialized using internally created special enums each variant of which represents a field and they can be deserialized from numbers 0, 1, 2 in the order of declaration. Aliases exists only for symbolic names, so each number uniquely represent a field. Moreover, this enum is deserialized by requesting deserialize_identifier, so the deserializer implementation could (and probably must) use different rules from usual to provide data to Visitor::visit_u64 method.

So we need several fixes in various serde crates:

  • serde_derive should stop to generate visit_seq for structs
  • various crates for binary formats should stop to call Visitor::visit_seq when deserialize_struct / deserialize_map was requested. Instead they should call Visitor::visit_map and deserialize keys using special internal deserializer for keys
  • serde_json and serde_yaml probably shouldn't do anything

In the summary:

  1. it is responsibility of a Visitor to return errors. Deserializers should avoid to return they own errors, they should instead call most appropriate Visitor method, which would return an error
  2. it is responsibility of a Deserializer to perform type conversions. For example, in XML is no difference in syntax between numbers or strings, so the XML deserializer will perform conversion when number was requested. That means, that result of deserialize_any can be different from result of deserialize_u32 and that is normal. JSON, for example, has a different syntax for strings and numbers and it MUST NOT parse strings into numbers
  3. it is responsibility of a type to implement only those of Visitor methods that describes it much closely. They should implement the minimal subset to be as much specific as possible. It shouldn't to do conversions from one type to another (such as sequences to maps or strings to numbers). A special types could do such conversions, but that should be done purposely, and not by an automatic derive from serde_derive

@Mingun
Copy link
Contributor

Mingun commented Oct 28, 2023

Some observations for crates from main serde page:

Status Format Links
⚠️ JSON Incorrectly uses visit_seq when deserialize_struct was requested
⚠️ postcard Incorrectly uses visit_seq when deserialize_struct was requested
CBOR uses visit_map
YAML uses visit_map
Msgpack uses visit_map (deserialize_map / deserialize_struct forwarded into deserialize_any)
TOML uses visit_map (deserialize_map / deserialize_struct forwarded into deserialize_any)
⚠️ Pickle can use visit_seq (deserialize_map / deserialize_struct forwarded into deserialize_any)
RON uses visit_map
⚠️ BSON can use visit_seq (deserialize_map / deserialize_struct forwarded into deserialize_any)
Avro uses visit_map
⚠️ JSON5 can use visit_seq (deserialize_map / deserialize_struct forwarded into deserialize_any)
URL uses visit_map
Envy uses visit_map (via serde's MapDeserializer)
Envy Store delegates deserialization to Envy
S-expressions uses visit_map
⚠️ DBUS can use visit_seq
⚠️ FlexBuffers can use visit_seq
Bencode uses visit_map
Token streams uses visit_map
DynamoDB Items uses visit_map (deserialize_struct forwarded into deserialize_map)
Hjson uses visit_map
⚠️ CSV can use visit_seq

@Toromyx
Copy link
Author

Toromyx commented Oct 28, 2023

  • serde_derive should stop to generate visit_seq for structs

I created a new branch and PR which should implement this: #2640

But since this would be a breaking change, an attribute as solution for version 1 still makes sense in my opinion. One could maybe display a deprecation warning when Deserialize is derived without that attribute?

@Toromyx Toromyx changed the title Implement attribute to disallow deserialization of struct with named fields from sequence Implement attribute to disallow deserialization of struct and struct variant with named fields from sequence Oct 28, 2023
@Toromyx Toromyx marked this pull request as ready for review October 29, 2023 13:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Struct with named fields can be deserialized from sequence
2 participants