Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

customizible encoder to hide/redact sensitive information #1541

Open
ofen opened this issue May 3, 2023 · 18 comments
Open

customizible encoder to hide/redact sensitive information #1541

ofen opened this issue May 3, 2023 · 18 comments

Comments

@ofen
Copy link

ofen commented May 3, 2023

Is your feature request related to a problem? Please describe.
We need ability to hide sensitive information during pb to json encoding.

Describe the solution you'd like
Customizible encoder that will allow to work with options (see https://go.dev/blog/protobuf-apiv2).

Describe alternatives you've considered
Make most of internal parts importable to minimize copy-paste.

Additional context
None

@lfolger
Copy link
Contributor

lfolger commented May 15, 2023

Hi ofen,

Sorry for the late reply.

I think to design a generic solution for such a problem is a large task and requires a lot of design work to make sure it works properly for all cases. Unfortunately, I don't think we have the capacity to design and implement such a solution at the moment.

It might be possible for you to implement the Redact function in the linked proposal yourself and use it in your code base in a wrapper around the protojson.Marshal.

func marshalToJson(m proto.Message) ([]byte, err) {
  // m = proto.Clone(m) // in case you cannot modify `m`
  redact(m)
  return protojson.Marshal(m)
}

If you do not want to modify the message, you might be able to proto.Clone() it before redacting the information.

I'm not sure I understand what you mean by the alternatives you described. To which internal parts would you need access to implement this?

@ofen
Copy link
Author

ofen commented May 20, 2023

Hi, sorry for late reply.

Protobuf redaction with following marshaling performs poorly due to double iteration over nested elements.

Internal part that could be used for redaction functionality, but lacks public interface to modify it's behaviour: https://github.com/protocolbuffers/protobuf-go/blob/b8fc7706010499f46982c883add4351b12e30c0b/encoding/protojson/encode.go#L259

@lfolger
Copy link
Contributor

lfolger commented May 22, 2023

I agree that the double iteration approach is more expensive than if there were hooks directly in the encoder.
If performance is critical to you, json might not be the best choice for you anyway (although I don't know your complete setup and the constraints you are working with).

Do you have an idea how large the performance overhead is/would be?
Did you do an experiment to find out by how much the performance could be improved if hooks were available?
You could do such an experiment by modifying the package locally and running some benchmarks.

While double iteration can be cause performance issues, it is generally fairly fast. Are the messages you are dealing with deeply nested and have many fields? Are there other reasons that make the iteration expensive?

All that said, I want to be honest and I don't think we have the capacity to design this API extension in the near future.
However, you are welcome to contribute such a feature. Either post your ideas here or provide a proof of concept change with the extensions you would like to see and we can evaluate this during the review.

@Olex1313
Copy link

Olex1313 commented Sep 23, 2023

@lfolger, Hello, I've came across this issue with same questions as the @ofen. My proposal is to extract the visitFunction and make it default visitFunction for MarshalOptions. So it will look like this:

func WriteJsonField(fd protoreflect.FieldDescriptor, v protoreflect.Value) bool {
        // Current logic from line 228
}

type MarshalOptions struct {
	pragma.NoUnkeyedLiterals
        // all-the-current-options fileds
        VisitFn VisitField
}

// Sample usage
func writeAndClear(fd protoreflect.FieldDescriptor, v protoreflect.Value) bool {
    clear(fd, v)
    return WriteJsonField(fd, v)
}
opt := protojson.MarshalOptions{VisitFn: writeAndClear}

If it looks good enough I can make a PR with the implementation.

However if it is too open realisation or too customisable, I can figure out something else

@puellanivis
Copy link
Collaborator

Would writeAndClear even work without mutating the original? Or otherwise, wouldn’t writeAndClear clear the value and then output it? So “clearAndWrite`?

@Olex1313
Copy link

The clearAndWrite sounds better, didn't think about it :)

About the mutating the original object, I think it is up to library user to mutate the values so general usage would look like

clonedMessage := myMessage.clone()
opt := protojson.MarshalOptions{VisitFn: clearAndWrite}

logger.logStructured(opt.Marshall(clonedMessage)

@lfolger
Copy link
Contributor

lfolger commented Sep 25, 2023

@Olex1313 this would require us to exposse the encoder because it is used in the code 228+.

This is a quite large change and I don't think we can do this right now without sufficient evidence that it has a significant impact.

Again, if you can share some pprof profiles or benchmarks that show the difference, it would help us to evaluate such a change.

@Olex1313
Copy link

Olex1313 commented Sep 25, 2023

@lfolger, you are right, I'm writing clone-based solution right now, I will provide benchmarks when I finish to see if it is really that necessary, but for repeated fields or maps I think it would be great to have a cheap masking option

@yonesko
Copy link

yonesko commented Oct 6, 2023

@Olex1313 have you started the project? мне тоже нужно)

@Olex1313
Copy link

Olex1313 commented Oct 9, 2023

@ofen, sorry for late answer, been quiet busy.
Sample gist with proto clearup: https://gist.github.com/Olex1313/88ff52b1f4cad1af15c6733b4553b4cb
As for benchmarks I'll post it today evening, after switching from work to personal laptop, right now can say that clearing takes the same time as the marshalling, where for both the most time-consuming is the iterating over proto fields

@Mahes2
Copy link

Mahes2 commented Dec 12, 2023

Hi @Olex1313 do you mind to share the progress?

Recently i'm looking for a solution of this issue. I also try to do some research and update. If you guys don't mind i can try to update, but need your help to review it.

My solution is that:

  1. adding HideSensitiveMessage bool into MarshalOption struct
  2. updating the RangeFields into:
order.RangeFields(fields, order.IndexNameFieldOrder, func(fd protoreflect.FieldDescriptor, v protoreflect.Value) bool {
		name := fd.JSONName()
		if e.opts.UseProtoNames {
			name = fd.TextName()
		}

		if fd.IsSensitiveField() && e.opts.HideSensitiveMessage {
			return true
		}

		if err = e.WriteName(name); err != nil {
			return false
		}
		if err = e.marshalValue(v, fd); err != nil {
			return false
		}
		return true
	})
  1. this step in on_progress, adding new tag is_sensitive_field for the proto file

please let me know if this update is not suitable or can have issue.
cc @puellanivis

@Olex1313
Copy link

Olex1313 commented Dec 12, 2023

@Mahes2, I can not share the benchmarks (did it for work), but as the @puellanivis said it's very complex design issue in protobuff serialization, and this doesn't go well with the purpose of Proto to JSON, because it works only in one way, but obviously you can't convert json->proto afterwards.

I've shared a gist https://gist.github.com/Olex1313/88ff52b1f4cad1af15c6733b4553b4cb, it contains sample approach to clearing the message for logs/etc.

For implementation in https://github.com/golang/protobuf I think it should be a third-party library, but not the part of it

@Mahes2
Copy link

Mahes2 commented Dec 12, 2023

I see. but is it ok if i give contribution to add new tag and using the approach i have given above?

For implementation in https://github.com/golang/protobuf I think it should be a third-party library, but not the part of it

why it should be a 3rd party library?

@Olex1313
Copy link

Olex1313 commented Dec 12, 2023

@Mahes2

I see. but is it ok if i give contribution to add new tag and using the approach i have given above?

It's up to maintainers, if they are ok with such approach I would like to help w code/review

why it should be a 3rd party library?

As @lfolger said upwards, designing custom deserialization api is out of context of encoder api, because it provides a simple way of ser/des of proto objects. Hiding sensible data is not the single feature that should be provided, and api should be extendable for more than that.
My personal opinion that if you really have to hide data in serialized protos you can even use a encoding/json package with json:"-" annotation, or clear proto message and there are more ways how it can be implemented. The main question is should this really be part of golang/protobuf package, because it is kinda out of context and more specific usage.

@ofen
Copy link
Author

ofen commented Dec 12, 2023

I think implementing redaction logic inside protojson serialiser is bad idea, but it still good idea to have some internal parts accessible/public to let developers build their custom logic upon it.

@puellanivis
Copy link
Collaborator

I see. but is it ok if i give contribution to add new tag and using the approach i have given above?

Unfortunately, this is—as I said in your first issue—not going to be accepted. This package needs to stick fairly strictly to the protobuf standard, and this feature is definitely something that we don’t want to walk into. (Even adding the json tag itself is a well-known mis-step of the package, because it was just so easy to add them, and get JSON, right? And once protobuf actually established a proper JSON mapping, it was entirely incompatible with the new JSON mapping.)

@Mahes2
Copy link

Mahes2 commented Dec 17, 2023

Hi @puellanivis can you share me any doc or blog that has explaination about the protobuf standard? because i still don't know which standard that you are talking about.

Also, is the proper JSON mapping already in progress?

Finally, i thank you guys for giving the tips on how to handle the sensitive message. I did a little update with @Olex1313's suggestion and add it to my libs.

I have made a PR and it's also included with the benchmark
Really appreciate it if you guys @puellanivis @Olex1313 @ofen @lfolger can help me review it as well. thank you

https://github.com/Mahes2/go-libs/pull/9/files

@puellanivis
Copy link
Collaborator

https://protobuf.dev/programming-guides/proto3/ is the proto3 standard.

The proper JSON mapping is already complete, and done. It was done for v1 API with jsonpb and for the v2 API in protojson.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants