Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to JSON marshal / unmarshal: StripEnumPrefix / AddEnumPrefix #1598

Open
dorner opened this issue Mar 6, 2024 · 10 comments
Open

Comments

@dorner
Copy link

dorner commented Mar 6, 2024

Is your feature request related to a problem? Please describe.
Currently, all enums are prefixed in order to avoid name collisions:

enum PurchaseMethod {
  PURCHASE_METHOD_UNSPECIFIED = 0;
  PURCHASE_METHOD_ONLINE = 1;
  PURCHASE_METHOD_IN_STORE = 2;
}

When using a project like grpc-gateway and using string enums, this results in additional bytes being sent around when the messages are translated to JSON, as well as "stuttering" since you have the messages littered with patterns like purchaseMethod: "PURCHASE_METHOD_IN_STORE".

Describe the solution you'd like
An option to protojson encoding that will strip the prefix from the enum, and to decoding that will add the prefix back. In this case we can calculate the prefix as the smallest string that is common to all enum value names.

Describe alternatives you've considered
I can create my own marshaller for grpc-gateway, but the only way to do that right now is to copy/paste the majority of the code. Since this isn't really related to gRPC but to Protobuf JSON marshalling, this seems like a better place to implement it long-term.

I'm willing to give it a stab as a PR, but wanted to make sure the approach made sense.

@dorner
Copy link
Author

dorner commented Mar 6, 2024

...I thought protojson was in this project, but it might not be? Should it go here? https://github.com/protocolbuffers/protobuf-go/

Looks like this is the issue tracker for that project I guess?

@puellanivis
Copy link
Collaborator

This is the Issue board for both github.com/golang/proto but also google.golang.org/protobuf as the people maintaining both are the same.

This seems to be the same issue as #513 which is still open.

@dorner
Copy link
Author

dorner commented Mar 6, 2024

#513 would definitely solve the underlying problem entirely. But given that it's been sitting for around six years without any progress, and solving it would result in a larger change (such as generated Go code), I was hoping to use this to bypass the meatiness and limit it just to marshalling and unmarshalling JSON rather than changing the Protobuf semantics / limitations.

@neild
Copy link
Contributor

neild commented Mar 6, 2024

#513 is about the generated code. If I follow, this is about the JSON serialization.

The protojson package implements the standard proto/JSON mapping defined at:
https://protobuf.dev/programming-guides/proto3/#json

Any changes to that mapping need to be coordinated across all protobuf implementations. We do not want the Go implementation to generate JSON that can't be consumed by C++, Java, etc. A primary purpose of protobufs is to enable cross-system transfer of data, and implementation-specific serializations work directly against that goal.

The place to raise cross-implementation issues is https://github.com/protocolbuffers/protobuf/issues, but I doubt a change like this would happen; it requires too much coordination for too little benefit. The usual way to decrease the wire size of JSON data is to compress it. (Or, of course, you can use the protobuf wire format, which can generally represent a small enum field value as ~3 bytes.)

@dorner
Copy link
Author

dorner commented Mar 6, 2024

Ah interesting - didn't know this was a general standard. Thanks for the link.

Obviously if we had the option of using gRPC we wouldn't need to use Gateway :) Not everything can talk gRPC. Generally speaking, if I unmarshal my Protobuf file using this option, the reader isn't going to then marshal it back into Protobuf - it most likely can only speak JSON.

I'm OK if this is unlikely to happen - like I said, I can focus more on the gateway side and look at the marshaller there. I can try getting it to work and see if it makes sense to contribute as an option specifically on the grpc-gateway project. Does that seem more likely to work?

@puellanivis
Copy link
Collaborator

OH, I get what you’re saying now. Sorry for misunderstanding you at first.

Yeah, we run into language interop problems then, as mentioned.

@dsnet
Copy link
Member

dsnet commented Mar 6, 2024

A little bit of history:

  • Enums in protobuf were inspired by C++, where the enum values themselves are in the same namespace as the enum type declaration itself. This was probably a mistake, but every language is inspired by languages before it, often adopting some of it's flaws and hopefully more of its strengths.
  • Since enum values are in the same namespace as the enum type, the convention was developed to always prefix each enum value with the enum type name to avoid a name collision in generated code. (Note that this is just a convention and not a language guarantee.)
  • The above convention has the unfortunate side effect of making text-serialized enums verbose (e.g., "PURCHASE_METHOD_UNSPECIFIED" instead of "UNSPECIFIED"). This is unfortunate since the convention of adding a prefix was to avoid conflicts among various protobuf declarations, but not within the enum itself. When unmarshaling a particular enum value, it is obvious from the context that we're scoped down to just the possible set of enum values for a given enum type.
  • To address this, we'd need consensus from the protobuf project itself. I don't think the Go implementation should do something adhoc that other language implementations may not do. One strawman idea:
    • The protobuf schema language should provide a way to mark enum values as being under the namespace of the enum type itself. (It is the responsibility of per-language implementations for how to avoid conflicts in generated code).
    • With the above marker present, you could theoretically do something like:
      enum PurchaseMethod {
        option features.enum_namespace = UNIQUE;
        UNSPECIFIED = 0;
        ONLINE = 1;
        IN_STORE = 2;
      }
      

Note that #513 is related problem, but one that's specific to just the generated Go code. The generated behavior of having enum values be prefixed with the enum type came about because the convention of always manually prepending enum values with the enum type name had not yet become common practice when the Go generator was first written. Ironically, not having enums being in a unique namespace under the type itself has led to two layers of pragmatic solutions that interacts poorly with each other:

  • If you don't manually prefix enum values in the proto code, then the Go generator does that for you and there's no conflicts in generated Go code, but you might run into problems in other language implementations.
  • If you do manually prefix enum values in the proto code, then the Go generator still does it for you and you're left with two layers of prefixes (e.g., PurchaseMethod_PURCHASE_METHOD_ONLINE) and the JSON enum value also has the prefix (which is what this issue is about).

@dorner
Copy link
Author

dorner commented Mar 6, 2024

Yeah. I worry less about the generated code because autocomplete handles that for you - it's not pretty, but it's not a pain. But on the JSON side, especially when interacting with projects that aren't in Go and don't understand Protobuf, it looks really weird.

Thanks very much for the detailed history lesson - I'd known some of it but not all. Appreciate your taking the time.

@dorner
Copy link
Author

dorner commented Mar 6, 2024

Oof... looks like grpc-gateway just flat out proxies to protojson. So I wouldn't be able to achieve this without forking the project, or doing a second, fairly in-depth pass to tease out the enums and values after the fact. :(

It might take a while, but I think adding an option to the protojson spec to change how it should marshal/unmarshal this might be worth doing. I probably have no idea what I'm getting myself into though. :)

@puellanivis
Copy link
Collaborator

The first step would probably be opening up an issue on the main protobuf issue board, leading at least some discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants