Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to Deserialize Data Serialized with Typeless Resolver after Migrating to Attributed Objects #1806

Closed
mihail-brinza opened this issue Apr 22, 2024 · 16 comments
Assignees

Comments

@mihail-brinza
Copy link

Description:

Previously, I was serializing all my objects using the TypelessContractlessStandardResolver resolver. However, I am looking to migrate all objects to attributed. I've added the attributes [MessagePackObject] and [Key(..)] to all the classes.

Everything works fine, except for the data serialized with typeless that now needs to be deserialized. To check how the data was serialized, I have the following method:

static bool IsTypeless(MemoryStream serialized)
{
    var reader = new MessagePackReader(serialized.ToArray());
    if (reader.NextMessagePackType != MessagePackType.Extension)
    {
        return false;
    }

    return reader.CreatePeekReader().ReadExtensionFormatHeader().TypeCode == 100;
}

If it returns true, I try to deserialize it with MessagePack.MessagePackSerializer.Typeless.Deserialize. However, I always get Unexpected msgpack code 133 (fixmap) encountered. I suspect that this is happening because now the classes are annotated, whereas at the time of the serialization, the classes didn't have any attributes. If I remove the annotations, the deserialization works.

Question:

Is there any way to simply ignore the attributes so I can deserialize the old data?

@AArnott
Copy link
Collaborator

AArnott commented Apr 22, 2024

The only way to ignore the attributes is to explicitly invoke a different formatter. You could maybe do this with a custom resolver, but since the formatter you had been using was also dynamically created by the TypelessContractlessStandardResolver, you'd have to defer to that resolver, and if it has been coded to ignore types that are attributed, it wouldn't work and you'd have to write your own resolver and/or formatter.

Seeing the callstack for the exception would help. And knowing which path you took (IsTypeless or !IsTypeless) too.

@mihail-brinza
Copy link
Author

for data that was previously serialized with Typeless, it always takes the "IsTypeless" path, that part works correctly.
The issue seems to be that if you serialize your data with Typeless, then store the data, add the attributes, you can no longer deserialize it with typeless.

MessagePack.MessagePackSerializationException : Failed to deserialize System.Object value.
  ----> MessagePack.MessagePackSerializationException : Unexpected msgpack code 133 (fixmap) encountered.
   at MessagePack.MessagePackSerializer.Deserialize[T](MessagePackReader& reader, MessagePackSerializerOptions options)
   at MessagePack.MessagePackSerializer.Deserialize[T](ReadOnlyMemory`1 buffer, MessagePackSerializerOptions options, Int32& bytesRead, CancellationToken cancellationToken)
   at MessagePack.MessagePackSerializer.TryDeserializeFromMemoryStream[T](Stream stream, MessagePackSerializerOptions options, CancellationToken cancellationToken, T& result)
   at MessagePack.MessagePackSerializer.Deserialize[T](Stream stream, MessagePackSerializerOptions options, CancellationToken cancellationToken)
   at MessagePack.MessagePackSerializer.Typeless.Deserialize(Stream stream, MessagePackSerializerOptions options, CancellationToken cancellationToken)

@mihail-brinza
Copy link
Author

mihail-brinza commented Apr 23, 2024

I've created the following test case:

    const string PathPrefix = "C:\\SomePath";

    // [MessagePackObject]
    public class TestObject
    {
        // [Key(0)]
        public string Name { get; set; }

        // [Key(1)]
        public int Age { get; set; }
    }

    readonly TestObject _testObject = new()
    {
        Name = "John",
        Age = 42
    };


    [Test]
    public void CreateBinary()
    {
        var bytes = MessagePack.MessagePackSerializer.Typeless.Serialize(_testObject);

        File.WriteAllBytes(Path.Join(PathPrefix, "test.bin"), bytes);
    }

    [Test]
    public void DeserializeBinary()
    {
        var bytes = File.ReadAllBytes(Path.Join(PathPrefix, "test.bin"));
        var obj = MessagePack.MessagePackSerializer.Typeless.Deserialize(bytes);

        obj.Should().BeEquivalentTo(_testObject);
    }

If I run both tests they pass, with and without Attributes. But if I run CreateBinary and then add the attributes and try to deserialize, it fails:

MessagePack.MessagePackSerializationException : Failed to deserialize System.Object value.
  ----> MessagePack.MessagePackSerializationException : Unexpected msgpack code 130 (fixmap) encountered.
   at MessagePack.MessagePackSerializer.Deserialize[T](MessagePackReader& reader, MessagePackSerializerOptions options)
   at MessagePack.MessagePackSerializer.Deserialize[T](ReadOnlyMemory`1 buffer, MessagePackSerializerOptions options, CancellationToken cancellationToken)
   at MessagePack.MessagePackSerializer.Typeless.Deserialize(Memory`1 bytes, MessagePackSerializerOptions options, CancellationToken cancellationToken)

And since I've added attributes to my models, my idea was to deserialize the stored data in the exact same way as before when "IsTypeless" returned true.
I also tried to play around with custom resolvers, but without much success.

@AArnott
Copy link
Collaborator

AArnott commented Apr 24, 2024

Thanks for the added detail.
The callstack surprises me, since the MessagePackSerializer.Deserialize method doesn't have any code that expects any msgpack code at all... I would have expected the exception thrown from a formatter.

As this is a specialized request that will likely take 2-3 hours to investigate, further investigation goes beyond the time I have to offer for free support.
If you're interested in paid support, please let me know.

@AArnott AArnott self-assigned this Apr 26, 2024
@AArnott
Copy link
Collaborator

AArnott commented Apr 26, 2024

💵 Thank you for helping to sponsor development of this project. 💵

The problem turns out to have nothing directly to do with typeless. It has to do with contractless.
In the serialized format, Typeless is the outermost wrapper. It records the full name of the type that was serialized. Peel that layer, and you then have this msgpack (base64 encoded): gqROYW1loWGjQWdlAg==. If you convert this msgpack to json, you get this:

{
    "Name": "a",
    "Age": 2
}

This means your original object that was serialized with typeless+contractless was formatted as a map with named properties.

But when you added attributes to your class, you used ordinal keys in the [Key] attribute, which means the type should be serialized as an array, where each element is a value. Property names are not recorded. Such an attributed type would have a msgpack structure that looks like this (when represented as JSON):

["a", 2]

When you get the "Unexpected msgpack code 130 (fixmap) encountered" error, it's because in deserializing it expected an array but encountered a map instead.

So, you can fix compatibility simply by changing the attributes you're adding:

    [MessagePackObject]
    public class TestObject
    {
-       [Key(0)]
+       [Key("Name")]
        public string Name { get; set; }

-       [Key(1)]
+       [Key("Age")]
        public int Age { get; set; }
    }

By adding these attributes, and continuing to use the Typeless front-end, MessagePackSerializer will correctly parse the typeless header (with the full name, which hasn't changed), and when it finds the TestObject type, it'll see the attributes and invoke the DynamicObjectResolver instead of the ContractlessStandardResolver. And because the schema of the object in msgpack is the same, whether you use contractless or the attributes, it'll deserialize just fine.

So I think that should get you going.
Unless you had your heart set on formatting with arrays instead of maps. You might want this for a smaller save file or faster serialization. If you do, then you have two unique and incompatible schemas. I can imagine a couple possible ways to address this, but it'll take longer to describe and probably warrants a prototype of each way to go so you can evaluate what will fit better.

@AArnott
Copy link
Collaborator

AArnott commented Apr 26, 2024

And BTW, I wrote this test that reproduces the original failure without having to write to a file, recompile the test and rerun:

[MessagePackObject]
public class Issue1806Map
{
    [Key("Name")]
    public string Name { get; set; }

    [Key("Age")]
    public int Age { get; set; }
}

[MessagePackObject]
public class Issue1806Array
{
    [Key(0)]
    public string Name { get; set; }

    [Key(1)]
    public int Age { get; set; }
}

[Fact]
public void DeserializeTypelessWithAttributedTypes()
{
    var mapType = new Issue1806Map { Name = "a", Age = 2 };
    byte[] bytes = MessagePackSerializer.Serialize(mapType);
    this.logger.WriteLine(Convert.ToBase64String(bytes));
    Issue1806Array arrayType = MessagePackSerializer.Deserialize<Issue1806Array>(bytes);
}

So theoretically if you wanted a way to use arrays when serializing but support deserializing both arrays and maps, this code could help you easily test progress toward your goal.

@mihail-brinza
Copy link
Author

Thanks for looking into this issue.

Avoiding keys for names was the goal due to size/speed concerns.
I now realize MessagePack infers deserialization based on the object type, not data content. I initially thought it worked the other way around.

Do you think creating a custom formatter for this behavior would be complex?

@mihail-brinza
Copy link
Author

Ideally this behavior would be enabled in the MessagePackOptions

@AArnott
Copy link
Collaborator

AArnott commented Apr 26, 2024

Ideally this behavior would be enabled in the MessagePackOptions

I agree that would make it super easy to access. But this is a formatter-level incompatibility, and the Options are serializer-wide. What you really need is a custom formatter that will detect whether a map or an array header is the next to be read, and invoke the appropriate formatter based on that. But the formatters you're using (both before and after) are both dynamically created at runtime and aren't directly accessible. Their resolvers are, and you might be able to detect and delegate to them, but only if the contractless resolver isn't "smart enough" to see that the type is attributed and reject it or forward it on to the other resolver.

If the contractless resolver won't accept an attributed type, you'd have to code up the formatter yourself. You could do this manually, or you could use our mpc tool (or the newer, unreleased source generator) to do it for you and then check the generated code in as your own.
Or we could maybe make the contractless resolver configurable so that you can tell it to ignore that a type is attributed and do its thing anyway.

Another option that doesn't require any special MessagePack related code is to have two sets of data types. One is on ice, and uses the maps schema. You'll never change it again. Your other set of data types will use the array schema, and each have a copy constructor that accepts the older data type and copies from it. This way, you can evolve your data type (the one that uses arrays) all you want, and you'll always be able to deserialize the map-based schema so long as you keep the old types around and keep your copy constructors current.

@mihail-brinza
Copy link
Author

Or we could maybe make the contractless resolver configurable so that you can tell it to ignore that a type is attributed and do its thing anyway.

Would the effort for this solution be significant? Eventually other people will run into this limitation, so it would make sense to be a "feature".

The last option, would be the easiest in theory but it would require duplicating a large number of data models.

@AArnott
Copy link
Collaborator

AArnott commented Apr 27, 2024

It looks like you can bypass the check for attributes and specify the DynamicContractlessObjectResolver directly. Here is a very rough and broken demonstration:

[Fact]
public void DeserializeTypelessWithAttributedTypes()
{
    var mapType = new Issue1806Map { Name = "a", Age = 2 };
    byte[] bytes = MessagePackSerializer.Serialize(mapType);
    this.logger.WriteLine(Convert.ToBase64String(bytes));
    Issue1806Array arrayType = MessagePackSerializer.Deserialize<Issue1806Array>(
        bytes,
        MessagePackSerializerOptions.Standard.WithResolver(DynamicContractlessObjectResolver.Instance));
}

Notice how we're deserializing into the array-schema'd type, but using contractless anyway, which lets us deserialize a map instead of an array.
This code fails, but it gets further than the fixmap error. It fails because it needs more resolvers than just this one.
Also, if you want the deserializer to automatically switch between one maps and arrays, you'll need your own formatter front-end for the two dynamic ones. I could whip up a prototype of this with another hour of work. There would be no need for any change to the messagepack library.

@mihail-brinza
Copy link
Author

I managed to get your unit test working, by using the following resolver:

readonly MessagePackSerializerOptions _msgPackOptions = MessagePackSerializerOptions.Standard.WithResolver(
    CompositeResolver.Create([
        NativeGuidFormatter.Instance,
        NativeDateTimeFormatter.Instance,
    ], [
        NativeDateTimeResolver.Instance,                       
        BuiltinResolver.Instance,                             
        AttributeFormatterResolver.Instance, 
        DynamicContractlessObjectResolverAllowPrivate.Instance
    ]));

However, the unit test you proposed has both classes with [MessagePackObject], but in my case, when the data was serialized the classes had no messagepack attribute whatsoever, we serialized the data with MessagePackSerializer.Typeless.Serialize(obj), which will then include the type info in the result. And this scenario does not work with the options that I just put here, returning Unexpected msgpack code 199 (ext 8) encountered..
I assume this is because messagepack finds the unexpected type information.

Do you think a solution for this is still feasible?

@AArnott
Copy link
Collaborator

AArnott commented Apr 27, 2024

Yes, I still think it's feasible. Your attempt is along the right lines as far as chaining resolvers go.
To engage the Typeless formatter, you'll need to call Deserialize with object as the type argument, where in my example I passed in Issue1806Array as the generic type argument.

@AArnott
Copy link
Collaborator

AArnott commented Apr 29, 2024

Following is a fully working sample that meets your requirements, as I understand them.
The TypeSubstitutingOptions class is purely an artifact of the trick I'm using to be able to run a single test that emulates both attributed and non-attributed types. In your code, you should not include the custom Options class, and you should only have the attributed data type.

// The important thing about this resolver is that the 
// custom ContractlessOrAttributedResolver appears instead of (or before) the
// DynamicGenericResolver, DynamicObjectResolver or DynamicContractlessObjectResolver resolvers.
static readonly IFormatterResolver CustomResolver = CompositeResolver.Create(
    new[]
    {
        BuiltinResolver.Instance,
        AttributeFormatterResolver.Instance,
        ImmutableCollectionResolver.Instance,
        CompositeResolver.Create(ExpandoObjectFormatter.Instance),
        ContractlessOrAttributedResolver.Instance,
        TypelessObjectResolver.Instance,
    });

// This represents all your data types, which should have attributes on them
// and should use ordinals rather than strings for their Key attribute arguments.
[MessagePackObject]
public class Issue1806Array
{
    [Key(0)]
    public string Name { get; set; }

    [Key(1)]
    public int Age { get; set; }
}

[Fact]
public void DeserializeTypelessWithAttributedTypes()
{
    MessagePackSerializerOptions options = new TypeSubstitutingOptions(CustomResolver);

    // Serialize in the old way, with maps and the Typeless annotation.
    var mapType = new Issue1806Map { Name = "a", Age = 2 };
    byte[] bytes = MessagePackSerializer.Typeless.Serialize(mapType);
    this.logger.WriteLine(Convert.ToBase64String(bytes));
    this.logger.WriteLine(MessagePackSerializer.ConvertToJson(bytes));

    // Deserialize using the new type, and verify the result.
    Issue1806Array arrayType = MessagePackSerializer.Deserialize<Issue1806Array>(bytes, options);
    Assert.Equal(mapType.Name, arrayType.Name);
    Assert.Equal(mapType.Age, arrayType.Age);

    // Serialize using the new format, and verify that it's a shorter msgpack representation
    // since it's lacking typeless annotations and uses arrays instead of maps.
    byte[] bytes2 = MessagePackSerializer.Serialize(arrayType);
    this.logger.WriteLine(Convert.ToBase64String(bytes2));
    this.logger.WriteLine(MessagePackSerializer.ConvertToJson(bytes2));
    Assert.True(bytes2.Length < bytes.Length);

    // Finally, demonstrate deserialization of the new format works as expected.
    Issue1806Array arrayType2 = MessagePackSerializer.Deserialize<Issue1806Array>(bytes2, options);
    Assert.Equal(arrayType.Name, arrayType2.Name);
    Assert.Equal(arrayType.Age, arrayType2.Age);
}

// This is where most of the magic is.
class ContractlessOrAttributedResolver : IFormatterResolver
{
    internal static readonly ContractlessOrAttributedResolver Instance = new();

    private ContractlessOrAttributedResolver() { }

    public IMessagePackFormatter<T> GetFormatter<T>()
    {
        // If this activates the formatter too much, we can filter here and return null.
        // For example, if our custom formatter were activated for Int32, we could avoid that like this:
        if (typeof(T) == typeof(int))
        {
            return null;
        }

        return ContractlessOrAttributedFormatter<T>.Instance;
    }

    class ContractlessOrAttributedFormatter<T> : IMessagePackFormatter<T>
    {
        internal static readonly ContractlessOrAttributedFormatter<T> Instance = new();

        private static readonly IMessagePackFormatter<T> AttributedFormatter = DynamicGenericResolver.Instance.GetFormatter<T>() ?? DynamicObjectResolver.Instance.GetFormatterWithVerify<T>();

        private static readonly IMessagePackFormatter<T> ContractlessFormatter = DynamicContractlessObjectResolver.Instance.GetFormatterWithVerify<T>();

        private ContractlessOrAttributedFormatter() { }

        public T Deserialize(ref MessagePackReader reader, MessagePackSerializerOptions options)
        {
            if (IsTypeless(reader))
            {
                return (T)TypelessFormatter.Instance.Deserialize(ref reader, options);
            }

            IMessagePackFormatter<T> formatter = reader.NextMessagePackType switch
            {
                MessagePackType.Array => AttributedFormatter,
                MessagePackType.Map => ContractlessFormatter,
                _ => throw new MessagePackSerializationException("Unexpected msgpack code: " + reader.NextMessagePackType),
            };
            return formatter.Deserialize(ref reader, options);
        }

        public void Serialize(ref MessagePackWriter writer, T value, MessagePackSerializerOptions options)
        {
            // Always serialize using the attributed formatter.
            AttributedFormatter.Serialize(ref writer, value, options);
        }

        private static bool IsTypeless(MessagePackReader reader) => reader.NextMessagePackType == MessagePackType.Extension && reader.ReadExtensionFormatHeader().TypeCode == 100;
    }
}

// The rest of the code is only important for the test, which is specially contrived
// to use *two* classes (one attributed, one not) instead of just one attributed class.
// In your own product code, omit all the code below,
// and anywhere that `Issue1806Map` had been referenced above,
// change to `Issue1806Array` (or actually, your own attributed data types).
class TypeSubstitutingOptions : MessagePackSerializerOptions
{
    public TypeSubstitutingOptions(IFormatterResolver resolver)
        : base(resolver)
    {
    }

    protected TypeSubstitutingOptions(MessagePackSerializerOptions copyFrom)
        : base(copyFrom)
    {
    }

    public override Type LoadType(string typeName)
    {
        if (typeName == typeof(Issue1806Map).AssemblyQualifiedName)
        {
            return typeof(Issue1806Array);
        }

        return base.LoadType(typeName);
    }

    protected override MessagePackSerializerOptions Clone() => new TypeSubstitutingOptions(this);
}

public class Issue1806Map
{
    public string Name { get; set; }

    public int Age { get; set; }
}

@AArnott
Copy link
Collaborator

AArnott commented Apr 29, 2024

The output of the above test, BTW, is:

x5Jk2YNNZXNzYWdlUGFjay5UZXN0cy5EeW5hbWljT2JqZWN0UmVzb2x2ZXJUZXN0cytJc3N1ZTE4MDZNYXAsIE1lc3NhZ2VQYWNrLlRlc3RzLCBWZXJzaW9uPTIuNS4wLjAsIEN1bHR1cmU9bmV1dHJhbCwgUHVibGljS2V5VG9rZW49bnVsbIKkTmFtZaFho0FnZQI=
{"$type":"MessagePack.Tests.DynamicObjectResolverTests+Issue1806Map, MessagePack.Tests, Version=2.5.0.0, Culture=neutral, PublicKeyToken=null","Name":"a","Age":2}
kqFhAg==
["a",2]

The first two lines are your 'before' schema and the second two lines are your 'after' schema, demonstrating the changes you want have taken place, and that both formats can be read, but only the newer one written by the serializer.

@mihail-brinza
Copy link
Author

Thank you so much. I've tested your solution and it works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants