New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DON'T MERGE] Try to replace all json4s
with Jackson
#37604
Conversation
json4s
with Jackson
cc @JoshRosen FYI |
@JoshRosen I am not sure whether this draft is helpful for future work, but I hope it is useful to a certain extent. This draft pr does not focus on forward compatibility, so I use Jackson The change involves 5 modules For exporting HTTP APIs(org.apache.spark.deploy.JsonProtocol), A problem found in the rewriting process is that if (!jsonNode.isMissingNode) {
node.set[JsonNode](name, jsonNode)
} mima checks the following forward incompatibilities:
This is similar to the problem described in SPARK-39658, |
It seems that the main problem is that If the above problem is not a problem, I think spark may can replace |
2ea9dcf add a simple benchmark, will update result later |
* Results will be written to "benchmarks/JsonSerDeBenchmark-results.txt". | ||
* }}} | ||
* */ | ||
object JsonSerDeBenchmark extends BenchmarkBase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1017-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Test to Json with out nested: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Use Json4s 111 170 58 0.9 1105.5 1.0X
Use Scala object 175 177 2 0.6 1748.0 0.6X
Use JsonNode 60 61 2 1.7 597.7 1.8X
Use Json Generator 55 56 1 1.8 552.8 2.0X
OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1017-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Test to Json with nested: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Use Json4s 1574 1575 1 0.1 15739.5 1.0X
Use Scala object 395 398 2 0.3 3952.5 4.0X
Use JsonNode 285 288 2 0.4 2854.5 5.5X
Use Json Generator 151 153 2 0.7 1511.1 10.4X
OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1017-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Test from Json without nested: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Use Json4s 103 104 2 1.0 1032.5 1.0X
Use Jackson 95 96 1 1.0 954.2 1.1X
OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1017-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Test from Json with nested: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Use Json4s 1258 1260 2 0.1 12584.5 1.0X
Use Jackson 539 541 2 0.2 5394.5 2.3X
From the results of JsonSerDeBenchmark
:
Jackson
is faster thanJson4s
, andJackson
has greater advantages when the data is nested- Using
JsonGenerator
for serialization is more efficient than building aJsonNode
first and then serializing it
I'm happy to see this PR! Jackson is much safer than json4s @LuciferYang Could you, please, check jsoniter-scala with your benchmarks if it worth to have an official integration with Spark ;) |
@plokhotnyuk Let me learn about jsoniter-scala first |
2566a16
to
9fce83c
Compare
What changes were proposed in this pull request?
Why are the changes needed?
Does this PR introduce any user-facing change?
How was this patch tested?