Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend the canonical json encoding spec for maps to allow the "non-shortcut" structure #11505

Closed
advdv opened this issue Jan 9, 2023 · 3 comments
Assignees
Labels

Comments

@advdv
Copy link

advdv commented Jan 9, 2023

Protobuf allows for defining map fields using a shortcut notation: map<key_type, value_type> map_field = N;. It is mentioned in the language guide that the on-wire format for this is designed in such a way that it is backwards compatible with the following structure:

message MapFieldEntry {
  key_type key = 1;
  value_type value = 2;
}

repeated MapFieldEntry map_field = N;

It is explicitly mentioned that: "Any protocol buffers implementation that supports maps must both produce and accept data that can be accepted by the above definition."

Unfortunately this is currently not the case when using the canonical json encoding. That is: if an old service sends the non-shortcut map structure to a service that uses the shortcut notation instead, it will break.

This problem is best described with some code. It is tested in Go, not sure about other languages. Given the following proto definitions:

syntax = "proto3";
package examples.v1;

// ShortcutMap message use the map shortcut for our map field
message ShortcutMap {
    map<string, string> map_field = 1;
}

// LegacyMap uses the legacy setup to describe a map
message LegacyMap {
    // We repeat entries to make a map
    message MapFieldEntry {
        string key = 1;
        string value = 2;
    }
    repeated MapFieldEntry map_field = 1;
}

And the following Go code:

package main

import (
	"fmt"

	examplesv1 "github.com/crewlinker/some-internal-protobuf-go-dir/v1"
	"google.golang.org/protobuf/encoding/protojson"
	"google.golang.org/protobuf/proto"
)

func main() {

	// send messages from our legacy code
	send1 := &examplesv1.LegacyMap{MapField: []*examplesv1.LegacyMap_MapFieldEntry{{Key: "foo", Value: "bar"}}}
	json1, _ := protojson.Marshal(send1)
	fmt.Printf("legacy json: %s\n", json1)
	proto1, _ := proto.Marshal(send1)
	fmt.Printf("legacy wire: %x\n", proto1)

	// receive messages in our new code
	var rcv1, rcv2 examplesv1.ShortcutMap
	proto.Unmarshal(proto1, &rcv1)
	fmt.Println("receive 1:", rcv1.MapField) // prints the map as expected
	protojson.Unmarshal(json1, &rcv2) // proto: syntax error (line 1:13): unexpected token [
	fmt.Println("receive 2:", rcv2.MapField) // prints an empty map (because it couldn't parse)
} 

It shows that the Json encoded message will fail to be received while the binary wire format is received successfully.

What language does this apply to?
This applies to any version of the proto spec that supports the shorthand map<T,T> notation.

Describe the problem you are trying to solve.
The summary above describes the problem shortly. For a full description of my use case please check the following issue in the Go project: golang/protobuf#1511 (comment)

Describe the solution you'd like
Ideally, I would like to change the map<K,V> specification for protobuf's canonical json encoding.

Similar to how a float may be encoded as a number or a string. A map should allow encoding as {"k":"v"} OR as [{"key": "k", "value":"v"}].

I'm aware that this is a big change. But as far as I understand it can be added transparently without breaking existing messages. Json decoders in various languages can first check for the current encoding, if it fails it can check for the "non-shortcut" encoding format.

Describe alternatives you've considered
Please see the alternatives in the issue linked above. They are specific to the go implementation.

@advdv advdv added the untriaged auto added to all issues by default when created. label Jan 9, 2023
@jorgbrown jorgbrown self-assigned this Jan 14, 2023
@jorgbrown
Copy link
Contributor

Bringing it up next week with the team.

@bentekkie
Copy link

bentekkie commented Feb 2, 2023

I would also add that this would provide compatibility with other systems like Big Query that already transform proto maps to a schema with lists of [{"key":...,"value":...}] https://github.com/GoogleCloudPlatform/protoc-gen-bq-schema. I am not saying that it should be a driving factor to be compatible with other json formats but it would be a huge bonus

@shaod2 shaod2 assigned mcy and unassigned jorgbrown Feb 2, 2023
@shaod2 shaod2 added json and removed untriaged auto added to all issues by default when created. labels Feb 2, 2023
@shaod2
Copy link
Member

shaod2 commented Feb 2, 2023

Unfortunately, we currently don't support schema evolution for JSON at the same level of wire format. And wire format compatibility isn't guaranteed to extend to text/json format.

@shaod2 shaod2 closed this as completed Feb 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants