Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tagged enums should support #[serde(other)] #912

Open
zimond opened this issue May 2, 2017 · 37 comments
Open

Tagged enums should support #[serde(other)] #912

zimond opened this issue May 2, 2017 · 37 comments
Labels

Comments

@zimond
Copy link

zimond commented May 2, 2017

I think tagged enums should also allow #[serde(other)] to mark a enum field as default. This will make the tagged enums much more useful.

Or is there already a way to do this that I don know?

@dtolnay
Copy link
Member

dtolnay commented May 9, 2017

Could you give some example of how you would expect this to work? Please show the enum definition, the input data, and the output value you expect to get from deserializing.

@zimond
Copy link
Author

zimond commented May 10, 2017

Say I have an enum

enum Target {
   A(TargetA),
   B(TargetB),
   Others(TargetOthers)
}

When deserializing, { "tag": "A" } will be deserialized to Target::A, { "tag": "blahblah" } should be deserialized to Target::Others

As the deserialization input could be indeed anything, it would be nice to have a default (or fallback) choice.

@dtolnay
Copy link
Member

dtolnay commented May 10, 2017

Cool, I am on board with this. I would love a PR to implement it!

@tikue
Copy link

tikue commented May 11, 2017

This would be really great for reaching feature-parity with protocol buffers. In proto3 Java, unknown variants deserialize to a special UNRECOGNIZED variant. This allows new variants to be added backwards-compatibly. Without this feature, it's pretty dangerous to use enums at all in more complex distributed systems.

Example

enum TrafficClass {
    #[serde(other)]
    Unknown,
    Undesired,
    BestEffort,
    Sensitive,
}

In the future I might want to add TrafficClass::Critical, in which case I wouldn't want to break existing systems that haven't been redeployed. In this case those systems would see TrafficClass::Unknown, and they could then decide to handle that however they need to -- maybe defaulting to TrafficClass::BestEffort, or maybe replying with an application-level error that the new traffic class isn't supported yet.

Open Questions

I would assume the most sensible thing would be to only allow this annotation on data-less variants; can you think of a way it would work otherwise? I don't think it'd be wise to try to deserialize the unknown tag's data into some other variant's data.

@wfraser
Copy link

wfraser commented Jul 25, 2017

I would assume the most sensible thing would be to only allow this annotation on data-less variants

I don't know if it makes sense for all deserializers, but for JSON at least, it would be nice to be able to capture either the raw JSON (as a string), or better yet, the serde_json::value::Value not-strongly-typed but still deserialized data.

@golddranks
Copy link

My use case is something like this:

#[serde(rename_all = "snake_case")]
pub enum FieldType {
    Noop,
    Hash,
    MaskHost,
    MaskAgent,
    DropField,
    Plugin(String),
}

Where, when parsing JSON, the unrecognised variant would be parsed as Plugin, and the text would be the value of the field.

Also, now that I think about it, a stretch goal would be as parsing with many unrecognised variants, deciding between them using similar, structure and type-based logic as #[serde(untagged)] parsing.

@iddm
Copy link

iddm commented Feb 16, 2018

Any news on that? This would be extremely useful. I have needed that for such a long time already.

@galich
Copy link

galich commented Apr 5, 2018

We are using this workaround until it's supported

#[derive(Serialize, Deserialize, Debug)]
pub enum FieldKind {
    date,
    email,
    menu,

    #[serde(skip_deserializing)]
    unknown,
}

fn deserialize_field_kind<'de, D>(deserializer: D) -> Result<FieldKind, D::Error>
where
    D: Deserializer<'de>,
{
    Ok(FieldKind::deserialize(deserializer).unwrap_or(FieldKind::Unknown))
}

#[derive(Serialize, Deserialize, Debug)]
pub struct Field {
    #[serde(deserialize_with = "deserialize_field_kind")]
    kind: FieldKind,
}

hope that helps

@iddm
Copy link

iddm commented Apr 5, 2018

@galich thanks! I'd try implementing it in serde-aux crate.

@braincore
Copy link

@dtolnay: I might have some bandwidth to implement this.

After a quick look through the serde source, some questions:

  1. There's already a serde(other) from cdfd445 that's minimally documented. Could you explain what it's used for along with what field_identifier is for? Document the field_identifier and variant_identifier attributes serde-rs.github.io#45

  2. Since serde(other) is taken, I propose using serde(open) to indicate an open union/enum:

enum E {
    A,
    B(u32),
    #[serde(open)]
    C,
}

Are you onboard with that name?

@dtolnay
Copy link
Member

dtolnay commented Aug 17, 2018

  1. A field identifier is the enum Field shown in https://serde.rs/deserialize-struct.html i.e. some Deserialize impl for deserializing keys in a map when deserializing that map as a struct. When there is an other variant it matches any other map key that is not matched to a field of the struct being deserialized. This is the default behavior of derived Deserialize impls for structs with named fields when not using serde(deny_unknown_fields).

  2. Use serde(other) because there is no conflict, the attribute means the same thing in this context as it does in the existing implemented context.

@kjeremy
Copy link

kjeremy commented Sep 21, 2018

I am fighting this too.

I have:

#[derive(Debug, Eq, PartialEq, Clone, Deserialize, Serialize)]
#[serde(tag = "kind", rename_all="lowercase" )]
pub enum DocumentChangeOperation {
    Create(CreateFile),
    Rename(RenameFile),
    Delete(DeleteFile),

    #[serde(other)]
    Edit(TextDocumentEdit),
}

Where the Edit variant doesn't actually have a kind tag so it fails to deserialize properly.

@timvisee
Copy link

timvisee commented Feb 28, 2019

I am also interested in this feature. I would like this to choose a common default variant when no tag field is available, to only require users to specify a type when using a non-common variant.

I would want the following:

[[binds]]
address = "0.0.0.0:80"

[[binds]]
type = "tls"
address = "0.0.0.0:443"
cert = "/some/file/path"
#[derive(Deserialize)]
#[serde(tag = "type", rename_all = "snake_case")]
pub enum Bind {
    #[serde(other)]
    Http {
        address: SocketAddr,
    },
    
    #[serde(alias = "tls")]
    HttpsTls {
        address: SocketAddr,
        cert: PathBuf,
    }
}

but with #[serde(other)] it doesn't currently compile with the error message: #[serde(other)] must be on a unit variant

Is there any progress on this issue?

@mathstuf
Copy link
Contributor

mathstuf commented Mar 1, 2019

Hmm, maybe #[serde(tag = "type", rename_all = "snake_case", default = "http")] would be a better location for that?

@kunicmarko20
Copy link

kunicmarko20 commented Nov 10, 2019

any suggestion how I would do workaround for a Vec<FieldKind>?

I tried:

fn deserialize_vec_field_kind<'de, D>(deserializer: D) -> Result<Vec<FieldKind>, D::Error>
    where
        D: Deserializer<'de>,
{
    struct VecFieldKind(PhantomData<fn() -> Vec<FieldKind>>);

    impl<'de> Visitor<'de> for VecFieldKind {
        type Value = Vec<FieldKind>;

        fn expecting(&self, formatter: &mut Formatter) -> fmt::Result {
            formatter.write_str("Array of field kind.")
        }

        fn visit_seq<S>(self, mut seq: S) -> Result<Vec<FieldKind>, S::Error>
            where
                S: SeqAccess<'de>,
        {
            let mut field_kinds: Vec<FieldKind> = Vec::new();

            while let element = seq.next_element().unwrap_or(Some(FieldKind::Unknown)) {
                field_kinds.push(element.unwrap())
            }

            Ok(field_kinds)
        }
    }

    deserializer.deserialize_seq(VecFieldKind(PhantomData))
}

but I end up in an infinite loop.

@mathstuf
Copy link
Contributor

            while let element = seq.next_element().unwrap_or(Some(FieldKind::Unknown)) {
                field_kinds.push(element.unwrap())
            }

You should match on the error state, not just the Option-ness. Something like this structure is what you want:

loop {
    match seq.next_element() {
        Ok(Some(element)) => field_kinds.push(element),
        Ok(None) => break, // end of sequence
        Err(_) => field_kinds.push(FieldKind::Unknown),
    }
}

@kunicmarko20
Copy link

My bad didn't read that documentation correctly, but even if I add a break on None I still have an infinite loop

@mathstuf
Copy link
Contributor

Hmm. Well, something is making next_element not end on its own. I suggest debugging what all of the next_element returns are. Something has to change when it gets to the end of the element list (assuming there is one, but then, there's your problem :) ).

@tv42
Copy link

tv42 commented May 13, 2021

I'm just a newbie but it seems to me #912 (comment) will use the default enum even when the input is clearly tagged for another branch of the enum, and just failed to parse, e.g. due to trying to deserialize a string into an integer.

@piegamesde
Copy link

Instead of talking about workarounds: what would be a viable path forward to get this feature into serde itself?

@jplatte
Copy link
Contributor

jplatte commented Aug 10, 2021

I don't think there is one. If you have a look at unmerged PRs and activitiy on those and on recently-closed ones you will see that that serde is effectively feature-frozen.

@TeXitoi
Copy link

TeXitoi commented Nov 29, 2021

Another featureful workaround: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=aabb4e581775d2f3b82d4725ab85448f

@Qix-
Copy link

Qix- commented Aug 29, 2022

In my case, I'm hitting this same issue with an Option<SomeEnumType> where the deserialized value doesn't match an enum constant. It'd be nice to tell Serde just to default to None.

@michaeljones
Copy link

michaeljones commented Dec 19, 2022

I'm hitting up against this and have tried:

#[derive(Debug, Clone, Serialize, Deserialize)]
#[serde(rename_all = "kebab-case")]
pub enum ValueEntry {
    A(Vec<String>),
    B(Vec<String>),

    // Provide fallback in case we're deserialising from a newer release
    // of a subcommand that include json that we don't recognise
    //
    // 'other' means that we use this variant if none of the others match.
    // 'ignore_contents' is used as 'other' only works if the entry deserializes to
    // a unit value so we override the deserialization to ignore everything and
    // return a unit value.
    #[serde(other, deserialize_with = "ignore_contents")]
    Unknown,
}

fn ignore_contents<'de, D>(deserializer: D) -> Result<(), D::Error>
where
    D: Deserializer<'de>,
{
    // Ignore any content at this part of the json structure
    deserializer.deserialize_ignored_any(serde::de::IgnoredAny);

    // Return unit as our 'Unknown' variant has no args
    Ok(())
}

I am new-ish to rust and serde so I've no reason to have confidence in this but it seems promising from some early tests so I thought I'd record it before I forget. I only have one kind of entry to test it on at the moment so it might fail for other variations.

IgnoredAny only works for structured input like JSON where it is possible to tell where entries end without needing to know the exact details and expected content of the values themselves.

@clarkmcc
Copy link

clarkmcc commented Jan 24, 2023

In my case, I'm getting this problem deserializing to HashMap<MyEnum, f64>. Ideally, if there's not a matching variant for MyEnum, it just wouldn't get loaded into the map.

@axelmagn
Copy link

I just ran into this problem and can confirm that the workaround provided by @michaeljones worked in my case. Many thanks!

@LunarLambda
Copy link

Also running into this exact thing. My configuration format deals with program names and configurations, and certain programs should be detected and deserialized especially, or fallback to a standard representation, like so:

#[derive(Deserialize)]
#[serde(rename_all = "kebab-case")]
enum Program {
    Bash(BashConfig),
    Vim(VimConfig),
    // ...
    // What goes here?
   Other(String, GenericConfig)
}

Not sure what the most appropriate workaround is, if any

@NobodyXu
Copy link

NobodyXu commented Jul 6, 2023

@LunarLambda You can create a newtype ProgramWrapper and manually implement Deserialize for it, with fallback to Other.

@LunarLambda
Copy link

@LunarLambda You can create a newtype ProgramWrapper and manually implement Deserialize for it, with fallback to Other.

Not really sure how to do that easiest. Since I'm using an externally tagged enum, the serialized form I essentially need is a HashMap<String, GenericConfigt> with a single element

e.g. in json:

{
  "vim": { -- VimConfig contents -- },
  "other": { -- GenericConfig contents -- }
}

ideally, the datatype I would like to work with would be something more like HashSet<Program> however. Not sure how this is best served by the Deserializer trait.

@cdbennett
Copy link

Just wanted to mention the serde-enum-str crate here. It seems to solve at least some of this issue.

GitHub: https://github.com/bk-rs/serde-ext
Docs.rs: https://docs.rs/serde-enum-str/latest/serde_enum_str/

example:

use serde_enum_str::{Deserialize_enum_str, Serialize_enum_str};

#[derive(Deserialize_enum_str, Serialize_enum_str, Debug, PartialEq, Eq)]
enum Foo {
    A,
    B,
    #[serde(other)]
    Other(String),
}

@rsethc
Copy link

rsethc commented Sep 5, 2023

Since I ran into this problem I might as well also throw a solution here.

  • Use a separate enum which is untagged.
  • The first variant contains a value of your 'known' enum variants in the inner enum.
  • The last variant could contain a type that catches everything in a generic way (such as serde_json::value), as far as your use case.

Rust Playground example: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=ad002072c10d3055f57c467972ac7aca

#[derive(Deserialize,Debug)]
#[serde(tag = "event_type", content = "event_data")]
enum AnyEvent {
    Int(IntEvent),
    Bool(BoolEvent),
}

#[derive(Deserialize,Debug)]
#[serde(untagged)]
enum EventOrUnknown {
    Event(AnyEvent),
    // The order matters. The most permissive variant (i.e. the catch-all)
    // should be tried last during deserialization.
    Unknown(serde_json::Value),
}

If you don't care about any of the fields, you could just use an empty struct:

#[derive(Deserialize,Debug)]
struct DontCare {}

...
    Unknown(DontCare),
...

Or a struct that contains certain fields if you expect all of your 'catch-all' cases to still fit some kind of pattern (for example if they are internally tagged and you want to print out the name of the tag that wasn't recognized, that can just be a field in the struct Unknown contains).

Compared to the creative solution with #[serde(other, deserialize_with = "ignore_contents")], this

  • Doesn't require implementing a deserialize_with of your own
  • Allows you to not have to handle the Unknown case in match arms where you use the inner enum
  • Allows you to capture the contents that would have been discarded by that solution in case this is of use for even just printing error log messages etc.

@kurtbuilds
Copy link

kurtbuilds commented Dec 25, 2023

For posterity, this is implemented in serde.

The attribute to use is #[serde(untagged)] on the desired enum variant.

The docs do mention that variants can take the attribute. However, it seems like people are still getting stuck on this problem (myself included), so maybe it'd helpful to update the documentation.
It could be most helpful for end users if the "unit" warning from #[serde(other)] recommended using the #[serde(untagged)] attribute.

use serde::{Deserialize, Serialize};

#[derive(Debug, Serialize, Deserialize, PartialEq)]
pub enum OtherString {
    Foo,
    #[serde(untagged)]
    Other(String)
}

#[derive(Debug, Serialize, Deserialize, PartialEq)]
pub enum OtherValue {
    Bar,
    #[serde(untagged)]
    Other(serde_json::Value)
}

fn main() {
    let s: OtherString = serde_json::from_str(r#""Foo""#).unwrap();
    assert_eq!(s, OtherString::Foo);
    let s: OtherString = serde_json::from_str(r#""not an enum value""#).unwrap();
    assert_eq!(s, OtherString::Other("not an enum value".to_string()));

    let s: OtherValue = serde_json::from_str(r#""Bar""#).unwrap();
    assert_eq!(s, OtherValue::Bar);

    let s: OtherValue = serde_json::from_str(r#"{"a": 5}"#).unwrap();
    assert_eq!(s, OtherValue::Other(serde_json::json!({"a": 5})));
}

I did some quick sleuthing, and it was implementing 6 months ago in 48e5753 , so in the scheme of things it's a quite recent addition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests