Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow protojson to decode legacy maps into messages with shortcut map notation #1511

Open
advanderveer opened this issue Jan 5, 2023 · 4 comments

Comments

@advanderveer
Copy link

advanderveer commented Jan 5, 2023

Is your feature request related to a problem? Please describe.
We're dealing with a service architecture that describes maps in different ways in different service (for legacy reasons). Some old services emit messages with the "legacy" structure for encoding maps inside of protobuf messages. Other services use modern protobuf definitions with the "shortcut" notation for maps (see: https://developers.google.com/protocol-buffers/docs/proto3#maps).

As mentioned in the documentation the map<T,T> notation is sugar for what is in effect a list of entry messages as it is encoded on the wire. This is good, and works for us so we don't have to care about which service uses what notation. But this breaks when sent it in the canonical json format.

To explain this further, let's consider the following proto file (in reality, for our use case this would be in two different files but the explanation stays the same)

syntax = "proto3";
package examples.v1;

// ShortcutMap message use the map shortcut for our map field
message ShortcutMap {
    map<string, string> map_field = 1;
}

// LegacyMap uses the legacy setup to describe a map
message LegacyMap {
    // We repeat entries to make a map
    message MapFieldEntry {
        string key = 1;
        string value = 2;
    }
    repeated MapFieldEntry map_field = 1;
}

Now let's consider the scenarios that one service is sending the legacy LegacyMap message to a service
that is only aware of the new ShortcutMap definition. Both in JSON and in the proto wire format:

package main

import (
	"fmt"

	examplesv1 "github.com/crewlinker/some-internal-protobuf-go-dir/v1"
	"google.golang.org/protobuf/encoding/protojson"
	"google.golang.org/protobuf/proto"
)

func main() {

	// send messages from our legacy code
	send1 := &examplesv1.LegacyMap{MapField: []*examplesv1.LegacyMap_MapFieldEntry{{Key: "foo", Value: "bar"}}}
	json1, _ := protojson.Marshal(send1)
	fmt.Printf("legacy json: %s\n", json1)
	proto1, _ := proto.Marshal(send1)
	fmt.Printf("legacy wire: %x\n", proto1)

	// receive messages in our new code
	var rcv1, rcv2 examplesv1.ShortcutMap
	proto.Unmarshal(proto1, &rcv1)
	fmt.Println("receive 1:", rcv1.MapField) // prints the map as expected
	protojson.Unmarshal(json1, &rcv2) // proto: syntax error (line 1:13): unexpected token [
	fmt.Println("receive 2:", rcv2.MapField) // prints an empty map (because it couldn't parse)
} 

The inconsistency is that the legacy notation for maps can be decoded transparently into a message with
the shortcut notation for the protowire format. But not when the json format is used.

Describe the solution you'd like
I would like it to be possible that the protojson unmarshal logic allows for decoding maps from the legacy format
into a message with the shortcut notation. For my use case it would be ok if this is an option in the https://pkg.go.dev/google.golang.org/protobuf/encoding/protojson#UnmarshalOptions struct but I do believe it
can also be implemented transparently (without breaking any backwards compatibility)

Describe alternatives you've considered
Some alternatives:

  • Modify the json message before it hits protojson.Unmarshal, but this requires parsing the whole message once, in a lossy format, then encoding it again into the json format that protojson accepts for our map. This is very ugly, and has big performance implications.
  • Implement a custom unmarshal method for protojson to just handle the map type (like https://pkg.go.dev/encoding/json#Unmarshaler). But this is impossible for protojson as far as I'm aware
  • Other options seems to require us mandating that the whole code base follows either one or the other way to define maps. This requires a big development undertaking
  • Disallow protojson encoding for messages that have a map, this defeats the purpose of protojson being a reliable message format.

Additional context
To hopefully give this improvement a bit more weight I just want to emphasise:

@puellanivis
Copy link
Collaborator

This seems to be a request relevant to the protobuf standard itself. This module has been bitten before by going beyond the standards as they are defined, and thus is shy to implement something unilaterally that is not already standardized.

Namely: even if we make this change, will C++/Java/python support the same features? Having the Go implementation alone able to parse [{"key":"foo","value":"bar"}] into a map<string, string> means that the Go implementation will be unexpectedly different from other implementations.

@advanderveer
Copy link
Author

advanderveer commented Jan 7, 2023

Right, that makes sense. But it do think it is fair to raise this with the standard itself (or at least the canonical json encoding)! What would be the right place to create an issue for that?

@dsnet
Copy link
Member

dsnet commented Jan 7, 2023

As @puellanivis mentioned, the Go implementation will only adopt this if the wider protobuf project deems this as standard behavior.

The issue tracker for the wider protobuf project is https://github.com/protocolbuffers/protobuf/issues.

@advanderveer
Copy link
Author

@dsnet I understand. I've opened the following issue over here: protocolbuffers/protobuf#11505

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants