Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix incorrect representation of tuples with skipped fields #2520

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

Mingun
Copy link
Contributor

@Mingun Mingun commented Jul 21, 2023

This PR fixes #2105 and give some insights about the current situation with skipped fields.

You can see what changed in the generated code by running the following commands:

  • place skip.txt into test_suite/tests folder and change it extension to .rs
  • run the following command:
    ...\serde\test_suite> cargo expand --all-features --test skip > skip-N.rs
    where N is some number.

When commit message contains words "Changes in generated code" that means that output of the command above was changed. I recommend to diff it against the previous result and explore the difference. There will be 4 files -- baseline and three changed.

Overall changes over all commits:

  • Tuple1as0, Tuple1as0Default, Tuple1as0With:
    • removed visit_newtype_struct
    • changed *_newtype_struct -> *_tuple_struct(0)
  • Tuple2as1, Tuple2as1Default, Tuple2as1With:
    • added visit_newtype_struct

The table below summarizes the current behavior and the new behavior, introduced in this PR. Empty cells in the last column means that the behavior wasn't changed.

The behavior columns shows which methods are used to serialize or deserialize a type. The number in parenthesis is a count of fields which passed to the corresponding method.

CodeCurrent behaviorFixed behavior
Unit representations
struct Unit;

serialize_unit_struct
deserialize_unit_struct
visit_unit

Tuple representations
struct Tuple0();

serialize_tuple_struct(0)
deserialize_tuple_struct(0)
visit_seq

ℹ️ It would be better to use Unit representation (see below)

// Can be named
// Newtype(u8)
struct Tuple1(u8);

serialize_newtype_struct
deserialize_newtype_struct
visit_newtype_struct
visit_seq

struct Tuple2(u8, u8);

serialize_tuple_struct(2)
deserialize_tuple_struct(2)
visit_seq

// The same as Tuple0
struct Tuple1as0(
    #[serde(skip)] u8,
);

#2105

serialize_newtype_struct
deserialize_newtype_struct
visit_newtype_struct
visit_seq

serialize_tuple_struct(0)
deserialize_tuple_struct(0)
visit_seq
ℹ️ It would be better to use Unit representation (see below)

// The same as Tuple0
struct Tuple2as0(
    #[serde(skip)] u8,
    #[serde(skip)] u8,
);

serialize_tuple_struct(0)
deserialize_tuple_struct(0)
visit_seq

ℹ️ It would be better to use Unit representation (see below)

// The same as Tuple1
struct Tuple2as1(
    #[serde(skip)] u8,
    u8,
);

serialize_tuple_struct(1)
deserialize_tuple_struct(1)
visit_seq

serialize_tuple_struct(1)
deserialize_tuple_struct(1)
visit_newtype_struct
visit_seq
ℹ️ It would be better to use Newtype representation (see below)

Struct representations
struct Struct0 {}

serialize_struct
deserialize_struct
visit_seq
visit_map

struct Struct1 {
    a: u8,
}

serialize_struct
deserialize_struct
visit_seq
visit_map

struct Struct2as0 {
    #[serde(skip)]
    a: u8,
    #[serde(skip)]
    b: u8,
}

serialize_struct
deserialize_struct
visit_seq
visit_map

struct Struct2as1 {
    #[serde(skip)]
    a: u8,
    b: u8,
}

serialize_struct
deserialize_struct
visit_seq
visit_map

Enum variant representations

The following changes also was implemented by I plan to ship them in separate PR due to some debating changes. I left my finding here to have the overall picture about skipped fields in one place.

The first debating change is to how to represent tuple(0) tuples? Should we represent them as a sequences with zero fields, or as a unit structs? Motivation for the unit structs is that that is the only way to skip all fields in a tuple and still represent it as a unit. This is currently done for Enum::Tuple1as0, but only for it. Enum::Tuple2as0 no longer follows this agreement as well as all tuple structs, which introduces more confusion. I am inclined to this option, since to have an always empty sequence is somewhat strange.

The second debating change, maybe we should make the behavior configurable? Introduce a new attribute like #[serde(tuple)] to produce tuple(0) and tuple(1) forms from TupleXas0 and TupleXas1 tuples?

CodeCurrent behaviorSuggested behavior
Enum::Unit representations
enum Enum {
  Unit,
}

serialize_unit_variant
unit_variant

Enum::Tuple representations
enum Enum {
  Tuple0(),
}

serialize_tuple_variant(0)
tuple_variant(0)
visit_seq

serialize_unit_variant
unit_variant

// Can be called
// Enum::Newtype(u8)
enum Enum {
  Tuple1(u8),
}

serialize_newtype_variant
newtype_variant

enum Enum {
  Tuple2(u8, u8),
}

serialize_tuple_variant(2)
tuple_variant(2)
visit_seq

// The same as
// Enum::Tuple0
enum Enum {
  Tuple1as0(
    #[serde(skip)] u8,
  ),
}

serialize_unit_variant
unit_variant

// The same as
// Enum::Tuple0
enum Enum {
  Tuple2as0(
    #[serde(skip)] u8,
    #[serde(skip)] u8,
  ),
}

serialize_tuple_variant(0)
tuple_variant(0)
visit_seq

serialize_unit_variant
unit_variant

// The same as
// Enum::Tuple1
enum Enum {
  Tuple2as1(
    #[serde(skip)] u8,
    u8,
  ),
}

#2224

serialize_tuple_variant(1)
tuple_variant(1)
visit_seq

serialize_newtype_variant
newtype_variant

Enum::Struct representations
enum Enum {
  Struct0 {},
}

serialize_struct_variant(0)
struct_variant
visit_seq
visit_map

enum Enum {
  Struct1 {
    a: u8,
  },
}

serialize_struct_variant(1)
struct_variant
visit_seq
visit_map

enum Enum {
  // The same as
  // Enum::Struct0
  Struct2as0 {
    #[serde(skip)]
    a: u8,
    #[serde(skip)]
    b: u8,
  },
}

serialize_struct_variant(0)
struct_variant
visit_seq
visit_map

enum Enum {
  // The same as
  // Enum::Struct1
  Struct2as1 {
    #[serde(skip)]
    a: u8,
    b: u8,
  },
}

serialize_struct_variant(1)
struct_variant
visit_seq
visit_map

List of inconsistencies

As I said before, the current behavior contains a bunch of inconsistencies.

For example, an enum tuple variant with zero fields represented as a tuple with zero fields, but tuple variant with one skipped field represented as a unit. At the same time, tuple variant with 2 or more fields, all skipped, again represented as a tuple with zero fields. The same is applied to tuples with one effective field -- sometimes it is represented as a newtype, sometimes as a tuple with one field. I suggest to unify behavior across all cases that I summarized in the following table:

ValueSerialized formSuggested form
Types with zero fields in serialized form

Tuple0()

Represented as a tuple with zero fieldsunit

Tuple1as0(Skipped)

⚠️ Incorrectly represented as a newtype struct (bug #2105)unit

Tuple2as0(Skipped, Skipped)

Represented as a tuple with zero fieldsunit
Types with 1 field in serialized form

Tuple1(x)

Represented as a newtype x

Tuple2as1(Skipped, x)

❗ Represented as a tuple with 1 fieldnewtype
Enum variants with zero fields in serialized form

Enum::Tuple0()

Represented as a tuple with zero fieldsunit

Enum::Tuple1as0(Skipped)

❗ Represented as a unit struct

Enum::Tuple2as0(Skipped, Skipped)

Represented as a tuple with zero fieldsunit
Enum variants with 1 field in serialized form

Enum::Tuple1(x)

Represented as a newtype x

Enum::Tuple2as1(Skipped, x)

❗ Represented as a tuple with 1 field (#2224)newtype

Legend:

  • ⚠️ Bugs that leads to compilation errors
  • ❗ Inconsistency in a block of similar table rows

Copy link
Member

@dtolnay dtolnay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR and exemplary writeup.

Have you thought about how this direction relates to skip_serializing_if? For example in the following case:

#[derive(serde_derive::Serialize)]
struct Tuple(
    #[serde(skip_serializing_if = "...")]
    Option<usize>,
    #[serde(skip_serializing_if = "...")]
    Option<usize>,
    #[serde(skip_serializing_if = "...")]
    Option<usize>,
);

would you consider that the correct behavior needs to serialize using serialize_newtype_struct if exactly one of the functions returns false, and serialize_unit_struct if all true, and serialize_tuple_struct otherwise?

@Mingun
Copy link
Contributor Author

Mingun commented Jul 31, 2023

It is hard to say. I never used skip_serializing_if. At first glance I think, that we should serialize that as a tuple in all cases. That behavior will also allow to keep ability to serialize tuples with all skipped fields / only one non-skipped field as tuple(0) / tuple(1) by using skip_serializing_if = "always_true", if default behavior by serializing them as units / newtypes would be undesirable.

In any case the behavior of skip should be carefully documented -- current documentation is unclear about expected results.

@dtolnay
Copy link
Member

dtolnay commented Jul 31, 2023

I am definitely on board with the Tuple1as0 change to fix #2105.

Not yet convinced about the Tuple2as1 change however — I think I need to revisit this later to make up my mind about that, especially if we are going to be making serde(skip) intentionally inconsistent with serde(skip_serializing_if = "always_true").

@Mingun
Copy link
Contributor Author

Mingun commented Jul 31, 2023

The problem that right now this changes are coupled and decoupling them requires to replace simple and straightforward code with something more complex. This will be especially noticeable on the part that fixes enum representations (coming soon)

Copy link
Member

@dtolnay dtolnay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not interested in changing the Tuple2as1 case at this time. Individual data formats can opt to handle that by treating serialize_tuple_struct(1) the same as serialize_newtype_struct, and deserialize_tuple_struct(1) as deserialize_newtype_struct, if they prefer.

Do you see any way to separate that from the Tuple1as0 fix?

@Mingun
Copy link
Contributor Author

Mingun commented Jul 31, 2023

Yes, that should be possible, just should introduce some checks and possible code duplication.

Just to clarify: you don't like changing behavior at all, or visit_newtype_struct could be added (it is side-effect from correct counting of non-skipped fields)?

@Mingun
Copy link
Contributor Author

Mingun commented Aug 2, 2023

  • I dropped the last commit that was changing Tuple2as1 representation. Such structs, however, now supports deserialization using visit_newtype_struct method, which technically a side-effect of fixing bug and will allow to deserializers to handle deserialize_tuple_struct(1) requests as deserialize_newtype_struct, if they prefer
  • corrected tests accordingly
  • add a new commit that construct all skipped fields after all stored fields. So if construction default objects is expensive they will not be constructed until needed

@dtolnay
Copy link
Member

dtolnay commented Aug 7, 2023

I am not convinced about the last "this should increase performance" commit. Interleaving the skipped and non-skipped fields to construct them in the original source order still seems fine to me. Default objects which are expensive to construct seems like a very uncommon case to begin with, and the commit only makes any difference in the error case if I understand correctly, which is not something we micro-optimize like this.

Mingun and others added 6 commits August 7, 2023 15:47
…tion

(review this with whitespace changes ignored)
…n_place`

Those functions too small and used only once

(review this with whitespace changes ignored)
Changes in generated code (see the file attached to PR):
  Tuple1as0:
  Tuple1as0Default:
  Tuple1as0With:
    fixed visit_newtype_struct: use default value instead of deserializing it

This fixes compilation error for Deserialize side of Tuple1as0(Skipped) tuple
Enums out of scope for now, because they have too many problems

failures (1):
    tuple_struct::tuple2as1

compilation error (commented):
    tuple_struct::tuple1as0
…r to Tuple0() struct

A side-effect: Tuple2as1(Skipped, x) now also can be deserialized using visit_newtype_struct

Changes in generated code (see the file attached to PR):
  Tuple1as0:
  Tuple1as0Default:
  Tuple1as0With:
    removed visit_newtype_struct, *_newtype_struct -> *_tuple_struct(0)

  Tuple2as1:
  Tuple2as1Default:
  Tuple2as1With:
    added visit_newtype_struct

This commit fixes compilation error and actually fixes the issue as it was reported in serde-rs#2105
@Mingun
Copy link
Contributor Author

Mingun commented Aug 7, 2023

Ok, I removed that commit. I hope compiler will able to reorder operations.

Default objects which are expensive to construct seems like a very uncommon case to begin with

I think the code is generated also because to shift to generation taking into account all these boring uncommon cases and be sure that everything will implemented efficiently. Shifting this to the user seems to me bad practice, especially since it costs nothing.

@Mingun
Copy link
Contributor Author

Mingun commented Aug 11, 2023

I am not convinced about the last "this should increase performance" commit. Interleaving the skipped and non-skipped fields to construct them in the original source order still seems fine to me.

Actually, this commit just unifies current behavior, because for structs deserialization using visit_map already does this:

use serde_derive::Deserialize;

#[derive(Deserialize)]
struct Skip {
    #[serde(skip_deserializing)]
    skipped: i32,
    #[serde(alias = "c")]
    aliased: i32,
}

will generate

#[inline]
fn visit_seq<__A>(
    self,
    mut __seq: __A,
) -> _serde::__private::Result<Self::Value, __A::Error>
where
    __A: _serde::de::SeqAccess<'de>,
{
    let __field0 = _serde::__private::Default::default();
    let __field1 = //...;
    _serde::__private::Ok(Skip {
        skipped: __field0,
        aliased: __field1,
    })
}
#[inline]
fn visit_map<__A>(
    self,
    mut __map: __A,
) -> _serde::__private::Result<Self::Value, __A::Error>
where
    __A: _serde::de::MapAccess<'de>,
{
    let mut __field1: _serde::__private::Option<i32> = _serde::__private::None;
    //...
    _serde::__private::Ok(Skip {
        skipped: _serde::__private::Default::default(),
        aliased: __field1,
    })
}

@Mingun
Copy link
Contributor Author

Mingun commented Aug 15, 2023

@dtolnay , all your review comments was resolved, could you look again? I just want to remind you that all changes in the generated code are easy to see if you follow the instructions in the first message with comparing the generated code

@Mingun
Copy link
Contributor Author

Mingun commented Sep 3, 2023

@dtolnay, yet another gentleman's reminder...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

#[serde(skip)] is ignored on a 1-field tuple struct
2 participants