Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

switch the json backend to json-iterator #19

Closed
wants to merge 1 commit into from

Conversation

neolit123
Copy link
Member

@neolit123 neolit123 commented Apr 26, 2019

add a couple of json interator configurations:

  1. jsonIterator should be compatible with the old encoding/json
    there is a breaking change here because the "JSONOpt" is now
    a no-op, although the options in encoding/json are not very useful.

  2. caseSensitiveStrictJSONIterator is compatible with
    econding/json WRT to DisallowUnknownFields, but it also has
    CaseSensitive. This is a breaking change for strict unmarshal users
    that previously allowed case-insensitive fields.

Something else to note is that json-iter does not seem to tolerate
field keys called "true", which the one from stdlib does.
Unit tests had to be adapted because of that.

Other changes:

  • add a new public method UnmarshalWithConfig that allows passing
    a json-iter configuration
  • update/add unit tests
  • update go.mod/sum
  • remove yaml_go110*.go
    These files had the purpose to handle DisallowUnknownFields for
    json.Decoder which is no longer needed.

/kind feature
/priority important-longterm

fixes #17
fixes #15

/assign @dims
/cc @sttts @errordeveloper @stealthybox
/hold

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels Apr 26, 2019
@k8s-ci-robot
Copy link

@neolit123: GitHub didn't allow me to request PR reviews from the following users: stealthybox, errordeveloper.

Note that only kubernetes-sigs members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

add a couple of json interator configurations:

  1. jsonIterator should be compatible with the old encoding/json
    there is a breaking change here because the "JSONOpt" is now
    a no-op, although the options in encoding/json are not very useful.

  2. caseSensitiveStrictJSONIterator is compatible with
    econding/json WRT to DisallowUnknownFields, but it also has
    CaseSensitive. This is a breaking change for strict unmarshal users
    that previously allowed case-insensitive fields.

Something else to note is that json-iter does not seem to tolerate
field keys called "true", which the one from stdlib does.
Unit tests had to be adapted because of that.

Other changes:

  • add a new public method UnmarshalWithConfig that allows passing
    a json-iter configuration
  • update/add unit tests
  • update go.mod/sum
  • remove yaml_go110*.go
    These files had the purpose to handle DisallowUnknownFields for
    json.Decoder which is no longer needed.

/kind featute
/priority important-longterm

fixes #17
fixes #15

/assign @dims
/cc @sttts @errordeveloper @stealthybox
/hold

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot requested a review from sttts April 26, 2019 17:02
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. kind/feature Categorizes issue or PR as related to a new feature. labels Apr 26, 2019
// Marshal marshals the object into JSON then converts JSON to YAML and returns the
// YAML.
func Marshal(o interface{}) ([]byte, error) {
j, err := json.Marshal(o)
j, err := caseSensitiveStrictJSONIterator.Marshal(o)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smarterclayton, didn't you see performance regressions in json-iter marshaling?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would they really be relevant for YAML encoding? It's not that we do that in critical paths, do we?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this doesn't affect all of YAML code paths in the current form.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw json-iterator isn't fully compatible with the standard library right now, created json-iterator/go#371. PTAL.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

json-iterator/go#371 has merged.

@martina-if
Copy link

Thanks for making these changes! LGTM Is there anything we can do to help get this merged? :)

@errordeveloper
Copy link

Hey @neolit123, as Martina said - we'd be happy to help :)

@neolit123
Copy link
Member Author

@errordeveloper up to the reviewers at this point. :)

@nikhita
Copy link
Member

nikhita commented Jun 3, 2019

@smarterclayton, didn't you see performance regressions in json-iter marshaling?

Ref: kubernetes/kubernetes#70574 (comment), kubernetes/kubernetes#63242 (comment)

The second PR referenced above did have high allocations:

Running go test ./vendor/k8s.io/apiserver/pkg/endpoints/ -test.bench='Benchmark(Get|WatchHTTP)' -test.run=XXX -benchmem -test.memprofile=/tmp/mem2 -test.benchtime 5s

On k/k master right now:

goos: linux
goarch: amd64
pkg: k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints
BenchmarkGet-4              	   30000	    256834 ns/op	   15794 B/op	     127 allocs/op
BenchmarkWatchHTTP-4        	  100000	     92040 ns/op	    1523 B/op	      24 allocs/op
BenchmarkWatchHTTP_UTF8-4   	  100000	     96523 ns/op	    1829 B/op	      26 allocs/op
PASS
ok  	k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints	31.248s

k/k right now with changes in the second PR (kubernetes/kubernetes#63242):

goos: linux
goarch: amd64
pkg: k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints
BenchmarkGet-4                     10000	    691435 ns/op	   19122 B/op	     163 allocs/op
BenchmarkWatchHTTP-4        	  200000	     48193 ns/op	    5495 B/op	      64 allocs/op
BenchmarkWatchHTTP_UTF8-4   	  200000	     53738 ns/op	    5909 B/op	      69 allocs/op
PASS
ok  	k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints	28.604s

k/k with changes in this PR:

goos: linux
goarch: amd64
pkg: k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints
BenchmarkGet-4              	   50000	    139204 ns/op	   15793 B/op	     127 allocs/op
BenchmarkWatchHTTP-4        	  200000	     56627 ns/op	    1522 B/op	      24 allocs/op
BenchmarkWatchHTTP_UTF8-4   	  100000	     62922 ns/op	    1829 B/op	      26 allocs/op
PASS
ok  	k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints	27.378s

So it should probably be fine...

@smarterclayton
Copy link

You should also run the serialization benchmarks in pkg/api/testing/serialization_test.go to get a before and after round trip time (that tests several variants of objects, whereas BenchmarkGet tests a single known object).

@nikhita
Copy link
Member

nikhita commented Jun 4, 2019

You should also run the serialization benchmarks in pkg/api/testing/serialization_test.go to get a before and after round trip time (that tests several variants of objects, whereas BenchmarkGet tests a single known object).

Running existing benchmarks (before and after) yielded similar results...but that isn't surprising because none of the benchmarks actually look for yaml marshaling and unmarshaling.

Created kubernetes/kubernetes#78688 for adding benchmarks for yaml marshaling and unmarshaling as well. Using these benchmarks, incorporating changes in this PR into k/k yield higher allocations... :(

Running go test k8s.io/kubernetes/pkg/api/testing/ -test.bench='BenchmarkEncodeYAMLMarshal|BenchmarkDecodeIntoYAML' -test.run=XXX -benchmem -test.memprofile=/tmp/mem2 -test.benchtime 5s

Before:

goos: linux
goarch: amd64
pkg: k8s.io/kubernetes/pkg/api/testing
BenchmarkEncodeYAMLMarshal-4   	   10000	    781873 ns/op	  256102 B/op	    1438 allocs/op
BenchmarkDecodeIntoYAML-4      	   10000	    627891 ns/op	  167788 B/op	    1485 allocs/op
PASS
ok  	k8s.io/kubernetes/pkg/api/testing	14.354s

After:

goos: linux
goarch: amd64
pkg: k8s.io/kubernetes/pkg/api/testing
BenchmarkEncodeYAMLMarshal-4   	   10000	    867119 ns/op	  260776 B/op	    1443 allocs/op
BenchmarkDecodeIntoYAML-4      	   10000	    692040 ns/op	  181813 B/op	    1609 allocs/op
PASS
ok  	k8s.io/kubernetes/pkg/api/testing	15.842s

@smarterclayton
Copy link

What is the underlying goal of this PR? Use json-iterator because ? Consistency with Kube is important, but it's not all. Was there a discussion about the why? I don't see many comments in #17

@neolit123
Copy link
Member Author

What is the underlying goal of this PR? Use json-iterator because ? Consistency with Kube is important, but it's not all. Was there a discussion about the why? I don't see many comments in #17

main driver was discussed in #15
the current library's strict unmarshal is not case-sensitive.

@neolit123
Copy link
Member Author

/assign @smarterclayton

@dims
Copy link
Member

dims commented Jul 8, 2019

/unassign

@neolit123
Copy link
Member Author

here is an example of this problem in the wild:
kubernetes/kubernetes#80582

kubeadm has strict unmarshaling to show warnings and then perform regular unmarshall (two pass).
however the upper case ClusterConfiguration field was not caught; should have been clusterConfiguration.

@neolit123
Copy link
Member Author

so going back to the OP and the discussion there are at lease 3 concerns here:

  • benchmark results
    the overhead after the change is not high and fixing the bug for StrictUnmarshal users can justify the small decrease in performance. on the other hand there aren't that many StrictUnmarshal users out there, so the decrease in performance is probably making a more apparent tilt of the scale.

  • breaking change 1:

This is a breaking change for strict unmarshal users
that previously allowed case-insensitive fields.

we enabled strict unmarshal in this library not so long ago, and given strict unmarshal is optional it seems that this might not affect that many users. but this is still probably going to break someone, somewhere once they move to the latest SHA of this repository.

  • breaking change 2:

there is a breaking change here because the "JSONOpt" is now
a no-op, although the options in encoding/json are not very useful.

not sure if this will break anyone, probably not.


i will update this PR and leave it for a couple of more weeks.

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: neolit123
To complete the pull request process, please assign smarterclayton
You can assign the PR to them by writing /assign @smarterclayton in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

add a couple of json interator configurations:

1) jsonIterator should be compatible with the old encoding/json
there is a breaking change here because the "JSONOpt" is now
a no-op, although the options in encoding/json are not very useful.

2) caseSensitiveStrictJSONIterator is compatible with
econding/json WRT to DisallowUnknownFields, but it also has
CaseSensitive. This is a breaking change for strict unmarshal users
that previously allowed case-insensitive fields.

Something else to note is that json-iter does not seem to tolerate
field keys called "true", which the one from stdlib does.
Unit tests had to be adapted because of that.

Other changes:
- add a new public method UnmarshalWithConfig that allows passing
a json-iter configuration
- update/add unit tests
- update go.mod/sum
- remove yaml_go110*.go
These files had the purpose to handle DisallowUnknownFields for
json.Decoder which is no longer needed.
@neolit123
Copy link
Member Author

clearing some pending PRs from my backlog.
if someone feels like they want to take this over i will leave the branch.
thanks!

@neolit123
Copy link
Member Author

we are seeing more cases of this in the wild:
kubernetes/kubeadm#2167

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/feature Categorizes issue or PR as related to a new feature. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Should this library use json-iterator? Strict unmarshalling should be case-sensitive
9 participants