Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow backtracking in deserialize_option method #2712

Open
kurtbuilds opened this issue Mar 14, 2024 · 0 comments
Open

Allow backtracking in deserialize_option method #2712

kurtbuilds opened this issue Mar 14, 2024 · 0 comments

Comments

@kurtbuilds
Copy link

kurtbuilds commented Mar 14, 2024

I wrote a serde implementation of the X12 file format, which is a enterprise-y text file format somewhat similar to CSVs. A document has a header, footer, and a set of transactions. Transactions contain segments, segments contain elements, and elements can be single-valued or be a composite of multiple values.

Deserializing into a generic Document (which is a vec of segments, which themselves are vecs of elements, which is an enum of values or a composite vec of values) works great. Each segment begins with a "header" element, so a DMG segment has different semantic meaning from a REF segment and also contains different quantities and types of fields, and so on.

Within a document, some segments are required, some can be repeated, so a Vec<DmgSegment>, and some are optional, Option<DmgSegment>.

The Problem

To deserialize into transactions with semantic meaning, ones that contain fields like Vec<DmgSegment>, the deserializer needs to be able to backtrack and stop visiting if it's expecting a different segment (e.g. expects a DmgSegment but encounters a REF header.) For a transaction with an Option<DmgSegment> segment, the visitor don't know if the transaction contains that record until we see that DMG header element.

To implement this, I have deserialize_struct implemented, which calls visit_seq. Because visit_seq can return actual values like Ok(None) for the next element, I can catch an UnexpectedSegment error, and in doing so, implement backtracking:

pub struct BufferDeserializer<'de> {
    pub(crate) buffer: Option<&'de str>,
    pub(crate) formatter: X12Formatter,
    pub(crate) delimiter: u8,
}

impl<'de, 'a> Deserializer<'de> for &'a mut BufferDeserializer<'de> {
    fn deserialize_struct<V>(mut self, name: &'static str, fields: &'static [&'static str], visitor: V) -> Result<V::Value, Self::Error> where V: Visitor<'de> {
        let Some(buf) = self.buffer else {
            return Err(X12DeserializerError::EarlyEndOfDocument);
        };
        let (el, rest) = split_string(buf, self.delimiter as char);
        self.buffer = rest;
        visitor.visit_seq(SizedBufferDeserializer {
            de: &mut BufferDeserializer {
                buffer: Some(el),
                delimiter: self.formatter.next_delimiter(self.delimiter),
                formatter: self.formatter,
            },
            remaining: fields.len(),
        })
    }
}

impl<'de> SeqAccess<'de> for BufferDeserializer<'de> {
    type Error = X12DeserializerError;

    fn next_element_seed<T>(&mut self, seed: T) -> Result<Option<T::Value>, Self::Error> where T: DeserializeSeed<'de> {
        if self.buffer.is_none() {
            return Ok(None);
        }
        match seed.deserialize(&mut *self).map(Some) {
            Ok(value) => Ok(value),
            Err(X12DeserializerError::EarlyEndOfDocument) => Ok(None),
            // this is effectively where backtracking happens
            Err(X12DeserializerError::UnexpectedSegment { .. }) => Ok(None),
            Err(e) => Err(e),
        }
    }
}

The above code works great.

However, in the deserialize_option method, the only way to return the equivalent of Ok(None) is by calling visit_none. But we don't know whether it's Some or None valued until we do the visiting. So this code is broken:

    // This code fails to compile
    fn deserialize_option<V>(self, visitor: V) -> Result<V::Value, Self::Error> where V: Visitor<'de> {
        let mut s = visitor.visit_some(self);
        if s.is_ok() {
            s
        } else {
            // EarlyEndOfDocumentError or a UnexpectedSegment error
            // Error is the next line: we can't call `visit_none` because both visit_none and visit_some take ownership of visitor.
            visitor.visit_none() // Compiler error
        }
    }

I tried using visit_seq from deserialize_option, but that fails with an unexpected type error - apparently serde doesn't treat Option as a seq of length zero or one.

Hacky Solution

I hacked a solution using unsafe, which only works because every visitor I've seen is a ZST. Unfortunately, I don't think there's a way to declare to the compiler that a generic V is ZST.

    // this code actually works
    fn deserialize_option<V>(self, visitor: V) -> Result<V::Value, Self::Error> where V: Visitor<'de> {
        // this only works because all the visitors ive seen are ZSTs
        // if they did have state, this would be a disaster
        let insanely_unsafe_copy_of_visitor = unsafe {
            std::ptr::read(&visitor as *const V)
        };
        let mut s = visitor.visit_some(self);
        if s.is_ok() {
            s
        } else {
            insanely_unsafe_copy_of_visitor.visit_none()
        }
    }

The Question

Is there a better way to solve this problem, not using unsafe? Basically, I need a way to return None from deserialize_option, similar to how I'm able to in SeqAccess::next_element_seed.

The most straight forward & backwards compatible way that I see would be if deserialize_option could call visit_seq - perhaps guaranteeing that it's a seq of max one length. But that feels like more indirection than if deserialize_option could return None somehow.

This issue supercedes #2705.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant