Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to json-iterator for v1 api. #3536

Merged
merged 5 commits into from
Mar 21, 2018
Merged

Switch to json-iterator for v1 api. #3536

merged 5 commits into from
Mar 21, 2018

Conversation

brian-brazil
Copy link
Contributor

This makes queries ~15% faster and cuts cpu
time spent on json encoding by ~40%.

@@ -586,6 +586,7 @@ func respond(w http.ResponseWriter, data interface{}) {
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(http.StatusOK)

json := jsoniter.ConfigCompatibleWithStandardLibrary
b, err := json.Marshal(&response{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One issue we had here too was serializing a byte slice and then writing it to w rather than directly writing into w. With multi-megabyte sized responses, that's probably quite a bit of slowdown. Can we fix this while we are at it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't appear possible with the APIs.

Copy link
Contributor

@fabxc fabxc Dec 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mh, I just checked the godoc and am a bit puzzled. The jsoniter.ConfigCompatibleWithStandardLibrary variable is a Config object. From all I can see this object does not have a Marshal method as it is being used here and shouldn't compile. It seems Config.Froze must be called to retrieve an object with the Marshal method. Is the vendored dependency possibly out of date?

The API the froze method would return then also has a NewEncoder method which we can use with the response writer to do it without the extra byte slice.

@fabxc
Copy link
Contributor

fabxc commented Dec 3, 2017

Did you do any manual comparisons on how much speed boost this gives us?

@brian-brazil
Copy link
Contributor Author

Did you do any manual comparisons on how much speed boost this gives us?

See the PR description :)

@fabxc
Copy link
Contributor

fabxc commented Dec 3, 2017

He, oops my bad. See my comment regarding streaming. Would be curious how much extra speed boost that could give.

@brian-brazil
Copy link
Contributor Author

brian-brazil commented Dec 4, 2017

Streaming actually seems to increase CPU usage ~10% compared to marshall, which is a little surprising.

@brian-brazil
Copy link
Contributor Author

@fabxc do you have any further comments?

@jacksontj
Copy link
Contributor

There are larger performance gains in prometheus/common#112

@brian-brazil
Copy link
Contributor Author

Have you performed an end-to-end benchmark for a fair comparison?

@jacksontj
Copy link
Contributor

@brian-brazil end-to-end meaning the http API response? The PR already includes data serializing an entire matrix response (which is pretty much end-to-end). I have pasted in my other issue on prom (#3601 (comment)) with a complete end-to-end number (3m -> 16s).

@jacksontj
Copy link
Contributor

@brian-brazil I've also submitted a PR (#3608) which has data on a full end-to-end API calls.

@brian-brazil brian-brazil force-pushed the json-perf branch 3 times, most recently from dfb9231 to 3bc08cf Compare February 8, 2018 18:16
@brian-brazil
Copy link
Contributor Author

I've worked a bit more on this:

Before:
BenchmarkRespond-4           100          16317782 ns/op         3020949 B/op      60009 allocs/op

With this PR:
BenchmarkRespond-4           500           3915364 ns/op          692664 B/op      20010 allocs/op

With this PR and improved WriteFloat64:
BenchmarkRespond-4           500           2975517 ns/op          213472 B/op         10 allocs/op

json-iterator/go#234 is the change to improve json-iterator, if that gets in we're looking at a total improvement of ~5.5x on JSON encoding time and an elimination of per-Point allocations.
For very simple PromQL queries that should be a ~40% reduction in latency, as roughly equal time is currently spent on PromQL and encoding.

@jacksontj
Copy link
Contributor

@brian-brazil I can't seem to find the code for the benchmark BenchmarkRespond that you referenced. Is that in another repository? Or to be added to the PR?

@brian-brazil
Copy link
Contributor Author

It's in the PR

@jacksontj
Copy link
Contributor

jacksontj commented Feb 9, 2018

It seems that the structs involved have moved a decent amount since my last benchmarks. I've created a branch on my fork with an implementation of this in easyjson for comparison.

To start off, I'll show the comparison of baseline and this json iter one:

benchmark              old ns/op     new ns/op     delta
BenchmarkRespond-4     12054905      2277690       -81.11%

benchmark              old allocs     new allocs     delta
BenchmarkRespond-4     90021          20010          -77.77%

benchmark              old bytes     new bytes     delta
BenchmarkRespond-4     4629942       692666        -85.04%

Here are the compared results for baseline and easyjson:

benchmark              old ns/op     new ns/op     delta
BenchmarkRespond-4     12054905      1785230       -85.19%

benchmark              old allocs     new allocs     delta
BenchmarkRespond-4     90021          10120          -88.76%

benchmark              old bytes     new bytes     delta
BenchmarkRespond-4     4629942       42369         -99.08%

And for simpler comparison here is the data comparing jsoniter to easyjson:

benchmark              old ns/op     new ns/op     delta
BenchmarkRespond-4     2277690       1785230       -21.62%

benchmark              old allocs     new allocs     delta
BenchmarkRespond-4     20010          10120          -49.43%

benchmark              old bytes     new bytes     delta
BenchmarkRespond-4     692666        42369         -93.88%

The benchmarks here show that there is ~21% performance improvement, and ~50% fewer allocs as well as ~93% fewer bytes allocated.

I then pulled down your jsoniter patch and compared easyjson to jsoniter (with the float improvements):

benchmark              old ns/op     new ns/op     delta
BenchmarkRespond-4     1785230       1552121       -13.06%

benchmark              old allocs     new allocs     delta
BenchmarkRespond-4     10120          10             -99.90%

benchmark              old bytes     new bytes     delta
BenchmarkRespond-4     42369         213474        +403.84%

With this wee see that it is ~13% faster. The tradeoff being that it allocates significantly more memory (~5x) albeit in fewer allocations. To be fair the easyjson implementation of mine here was done in ~30m, so there's definitely room for some improvement-- so at this point I'd say the performance between the 2 seems about the same (which is quite impressive!).

The main difference remaining between the 2 is that jsoniter is still doing a full marshal and then write (meaning the marshal can't be canceled). IMO this can still be a large concern, as some responses take quite a while to marshal out. That being said, speeding it up dramatically like this will definitely alleviate that concern (even if it doesn't solve it outright).

jsoniter.RegisterTypeEncoderFunc("promql.Point", MarshalPointJSON, MarshalPointJSONIsEmpty)
}

func MarshalPointJSON(ptr unsafe.Pointer, stream *jsoniter.Stream) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brian-brazil This marshal function seems terribly out of place. The struct is defined in another package (promql) and thats also where all the tests etc. reside. Leaving this function here means that these 2 packages have to be kept up-to-date (and that someone making changes in the promql package needs to know about testing over here). It seems that a better place to put this would be in the promql package (where the struct is defined).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't feel it appropriate to tie jsoniter into the promql package. This is not a struct that is going to change, as that would break all users.

@brian-brazil
Copy link
Contributor Author

@jacksontj I have discussed this with some of the other developers and while the gains of easyjson are nice, the maintenance overhead is just too high. json-iterator provides us basically with the same gains and no notable maintenance overhead.

@brian-brazil
Copy link
Contributor Author

Upstream has taken my PR (and then improved upon it), so I've updated my PR to re-vendor. I think this is now ready to be taken in.

@brancz
Copy link
Member

brancz commented Feb 14, 2018

Do you think this is safe to land in 2.2? I think I would feel more comfortable with this being RCed as well.

This makes queries ~15% faster and cuts cpu
time spent on json encoding by ~40%.
Point has a non-standard marshalling, and is also
where the vast majority of CPU time is spent so
it is worth optimising.
Copy link
Member

@SuperQ SuperQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jacksontj
Copy link
Contributor

@brian-brazil do you have benchmarks from this PR? I'm doing work over on prometheus/client_golang#570 and from my testing there I'm not seeing this ~40% reduction, as a matter-of-fact I'm getting on-par performance between encoding/json and jsoniter:

$ go test -run=x -bench=. -benchmem
goos: linux
goarch: amd64
pkg: github.com/prometheus/client_golang/api/prometheus/v1
BenchmarkSerialization/10/10/marshal/encoding/json-8         	    3000	    398908 ns/op	   55704 B/op	    1972 allocs/op
BenchmarkSerialization/10/10/marshal/jsoniter-8              	    3000	    432072 ns/op	   56143 B/op	    1932 allocs/op
BenchmarkSerialization/10/10/unmarshal/encoding/json-8       	    5000	    274671 ns/op	   26698 B/op	     764 allocs/op
BenchmarkSerialization/10/10/unmarshal/jsoniter-8            	    5000	    237565 ns/op	   33371 B/op	    1050 allocs/op
BenchmarkSerialization/10/100/marshal/encoding/json-8        	     300	   3930002 ns/op	  549498 B/op	   19975 allocs/op
BenchmarkSerialization/10/100/marshal/jsoniter-8             	     300	   4278254 ns/op	  554065 B/op	   19934 allocs/op
BenchmarkSerialization/10/100/unmarshal/encoding/json-8      	    1000	   2325803 ns/op	  256444 B/op	    5964 allocs/op
BenchmarkSerialization/10/100/unmarshal/jsoniter-8           	     500	   2332108 ns/op	  328020 B/op	    8950 allocs/op
BenchmarkSerialization/10/1000/marshal/encoding/json-8       	      30	  41352309 ns/op	 6008460 B/op	  200016 allocs/op
BenchmarkSerialization/10/1000/marshal/jsoniter-8            	      30	  46626714 ns/op	 5561751 B/op	  199967 allocs/op
BenchmarkSerialization/10/1000/unmarshal/encoding/json-8     	      50	  24766228 ns/op	 2571484 B/op	   59970 allocs/op
BenchmarkSerialization/10/1000/unmarshal/jsoniter-8          	      50	  23100204 ns/op	 3287492 B/op	   89955 allocs/op
BenchmarkSerialization/100/10/marshal/encoding/json-8        	     300	   4334550 ns/op	  570430 B/op	   19706 allocs/op
BenchmarkSerialization/100/10/marshal/jsoniter-8             	     300	   4557554 ns/op	  561875 B/op	   19305 allocs/op
BenchmarkSerialization/100/10/unmarshal/encoding/json-8      	     500	   2861914 ns/op	  265226 B/op	    7606 allocs/op
BenchmarkSerialization/100/10/unmarshal/jsoniter-8           	     500	   2473652 ns/op	  333875 B/op	   10502 allocs/op
BenchmarkSerialization/100/100/marshal/encoding/json-8       	      30	  43987262 ns/op	 6039315 B/op	  199747 allocs/op
BenchmarkSerialization/100/100/marshal/jsoniter-8            	      30	  46285685 ns/op	 5558747 B/op	  199339 allocs/op
BenchmarkSerialization/100/100/unmarshal/encoding/json-8     	      50	  26428125 ns/op	 2576977 B/op	   59650 allocs/op
BenchmarkSerialization/100/100/unmarshal/jsoniter-8          	     100	  23110238 ns/op	 3283907 B/op	   89514 allocs/op
BenchmarkSerialization/100/1000/marshal/encoding/json-8      	       3	 409665279 ns/op	59081770 B/op	 1999900 allocs/op
BenchmarkSerialization/100/1000/marshal/jsoniter-8           	       3	 430603410 ns/op	58996882 B/op	 1999499 allocs/op
BenchmarkSerialization/100/1000/unmarshal/encoding/json-8    	       5	 233062181 ns/op	26713731 B/op	  658270 allocs/op
BenchmarkSerialization/100/1000/unmarshal/jsoniter-8         	       5	 229206061 ns/op	33464878 B/op	  957805 allocs/op
BenchmarkSerialization/1000/10/marshal/encoding/json-8       	      30	  44881487 ns/op	 6123904 B/op	  197052 allocs/op
BenchmarkSerialization/1000/10/marshal/jsoniter-8            	      30	  45518786 ns/op	 5661321 B/op	  193044 allocs/op
BenchmarkSerialization/1000/10/unmarshal/encoding/json-8     	      50	  26742119 ns/op	 2596950 B/op	   56224 allocs/op
BenchmarkSerialization/1000/10/unmarshal/jsoniter-8          	      50	  23756123 ns/op	 3282841 B/op	   85182 allocs/op
BenchmarkSerialization/1000/100/marshal/encoding/json-8      	       3	 388438656 ns/op	57343797 B/op	 1997168 allocs/op
BenchmarkSerialization/1000/100/marshal/jsoniter-8           	       3	 434228624 ns/op	59020432 B/op	 1993193 allocs/op
BenchmarkSerialization/1000/100/unmarshal/encoding/json-8    	       5	 234770845 ns/op	27129120 B/op	  600611 allocs/op
BenchmarkSerialization/1000/100/unmarshal/jsoniter-8         	       5	 220862249 ns/op	33679800 B/op	  897405 allocs/op
BenchmarkSerialization/1000/1000/marshal/encoding/json-8     	       1	3830350089 ns/op	623496024 B/op	19997237 allocs/op
BenchmarkSerialization/1000/1000/marshal/jsoniter-8          	       1	4229907983 ns/op	689501776 B/op	19993300 allocs/op
BenchmarkSerialization/1000/1000/unmarshal/encoding/json-8   	       1	2277582499 ns/op	311490592 B/op	 6199036 allocs/op
BenchmarkSerialization/1000/1000/unmarshal/jsoniter-8        	       1	2082645407 ns/op	361136920 B/op	 9180019 allocs/op
PASS
ok  	github.com/prometheus/client_golang/api/prometheus/v1	69.265s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants