Skip to content

Field is not serializable with binary formats #3082

Closed
@bjchambers

Description

@bjchambers

This one took some time to track down. I was working on snapshotting some state in my system and it kept throwing errors while deserializing. I eventually determined this is because of the skip_serializing_if on the Field::metadata. Specifically, it seems to be triggering serde-rs/serde#1732.

As best I can tell, the issue is that the binary format needs to have something written to say "this is None" so it can know to move on to other things.

Sure enough, with both bincode and postcard (didn't test others) the following tests fail 2 out of 3:

    #[cfg(feature = "serde")]
    fn assert_binary_serde_round_trip(field: Field) {
        let serialized = postcard::to_stdvec(&field).unwrap();
        let deserialized: Field = postcard::from_bytes(&serialized).unwrap();
        assert_eq!(field, deserialized)
    }

    #[cfg(feature = "serde")]
    #[test]
    fn test_field_without_metadata_serde() {
        let field = Field::new("name", DataType::Boolean, true);
        assert_binary_serde_round_trip(field)
    }

    #[cfg(feature = "serde")]
    #[test]
    fn test_field_with_empty_metadata_serde() {
        let field = Field::new("name", DataType::Boolean, false)
            .with_metadata(Some(BTreeMap::new()));

        let field = Field::new("name", DataType::Boolean, true);
        assert_binary_serde_round_trip(field)
    }

    #[cfg(feature = "serde")]
    #[test]
    fn test_field_with_nonempty_metadata_serde() {
        let mut metadata = BTreeMap::new();
        metadata.insert("hi".to_owned(), "".to_owned());
        let field =
            Field::new("name", DataType::Boolean, false).with_metadata(Some(metadata));

        let field = Field::new("name", DataType::Boolean, true);
        assert_binary_serde_round_trip(field)
    }

This may be "working as intended" if the only use of serde on Fields is supposed to be JSON and other "self describing" formats. But I thought I would at least file this for discussion to see if it would be better to drop the skip_serializing_if from the metadata (although given that the empty case also fails, that may not be enough).

Activity

tustvold

tustvold commented on Nov 10, 2022

@tustvold
Contributor

Removing the skip_serializing_if changes the output from

{"Struct":[{"name":"first_name","data_type":"Utf8","nullable":false,"dict_id":0,"dict_is_ordered":false,"metadata":{"k":"v"}},{"name":"last_name","data_type":"Utf8","nullable":false,"dict_id":0,"dict_is_ordered":false},{"name":"address","data_type":{ ...

To

{"Struct":[{"name":"first_name","data_type":"Utf8","nullable":false,"dict_id":0,"dict_is_ordered":false,"metadata":{"k":"v"}},{"name":"last_name","data_type":"Utf8","nullable":false,"dict_id":0,"dict_is_ordered":false,"metadata":null},{"name":"addres ...``

I see no issue with this

tustvold

tustvold commented on Nov 10, 2022

@tustvold
Contributor

Also possibly related, filed #3086

added a commit that references this issue on Nov 21, 2022
alamb

alamb commented on Nov 25, 2022

@alamb
Contributor

label_issue.py automatically added labels {'arrow'} from #3126

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      Participants

      @bjchambers@alamb@tustvold

      Issue actions

        Field is not serializable with binary formats · Issue #3082 · apache/arrow-rs