Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement buffer recycling for CharacterReader #1800

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

chibenwa
Copy link

@chibenwa chibenwa commented Jul 1, 2022

Before

Benchmark                                                  Mode  Cnt      Score      Error   Units
JMHBenchmark.benchmarkSmall                                avgt    5      7.382 ±    0.940   us/op
JMHBenchmark.benchmarkSmall:·gc.alloc.rate                 avgt    5   8572.806 ± 1082.658  MB/sec
JMHBenchmark.benchmarkSmall:·gc.alloc.rate.norm            avgt    5  72952.001 ±    0.001    B/op
JMHBenchmark.benchmarkSmall:·gc.churn.G1_Eden_Space        avgt    5   8776.474 ± 1059.021  MB/sec
JMHBenchmark.benchmarkSmall:·gc.churn.G1_Eden_Space.norm   avgt    5  74688.242 ±  767.418    B/op
JMHBenchmark.benchmarkSmall:·gc.churn.G1_Old_Gen           avgt    5      0.195 ±    0.016  MB/sec
JMHBenchmark.benchmarkSmall:·gc.churn.G1_Old_Gen.norm      avgt    5      1.664 ±    0.174    B/op
JMHBenchmark.benchmarkSmall:·gc.count                      avgt    5    523.000             counts
JMHBenchmark.benchmarkSmall:·gc.time                       avgt    5    761.000                 ms
JMHBenchmark.benchmarkMedium                               avgt    5     29.383 ±    1.479   us/op
JMHBenchmark.benchmarkMedium:·gc.alloc.rate                avgt    5   2531.961 ±  127.589  MB/sec
JMHBenchmark.benchmarkMedium:·gc.alloc.rate.norm           avgt    5  85824.002 ±    0.001    B/op
JMHBenchmark.benchmarkMedium:·gc.churn.G1_Eden_Space       avgt    5   2522.477 ±  106.173  MB/sec
JMHBenchmark.benchmarkMedium:·gc.churn.G1_Eden_Space.norm  avgt    5  85506.478 ± 2628.227    B/op
JMHBenchmark.benchmarkMedium:·gc.churn.G1_Old_Gen          avgt    5      0.207 ±    0.036  MB/sec
JMHBenchmark.benchmarkMedium:·gc.churn.G1_Old_Gen.norm     avgt    5      7.031 ±    1.401    B/op
JMHBenchmark.benchmarkMedium:·gc.count                     avgt    5    372.000             counts
JMHBenchmark.benchmarkMedium:·gc.time                      avgt    5    371.000                 ms

After

Benchmark                                                  Mode  Cnt      Score     Error   Units
JMHBenchmark.benchmarkSmall                                avgt    5      2.038 ±   0.102   us/op
JMHBenchmark.benchmarkSmall:·gc.alloc.rate                 avgt    5   3205.214 ± 160.609  MB/sec
JMHBenchmark.benchmarkSmall:·gc.alloc.rate.norm            avgt    5   7536.000 ±   0.001    B/op
JMHBenchmark.benchmarkSmall:·gc.churn.G1_Eden_Space        avgt    5   3175.262 ± 132.074  MB/sec
JMHBenchmark.benchmarkSmall:·gc.churn.G1_Eden_Space.norm   avgt    5   7465.807 ± 139.856    B/op
JMHBenchmark.benchmarkSmall:·gc.churn.G1_Old_Gen           avgt    5      0.194 ±   0.016  MB/sec
JMHBenchmark.benchmarkSmall:·gc.churn.G1_Old_Gen.norm      avgt    5      0.455 ±   0.038    B/op
JMHBenchmark.benchmarkSmall:·gc.count                      avgt    5    439.000            counts
JMHBenchmark.benchmarkSmall:·gc.time                       avgt    5    397.000                ms
JMHBenchmark.benchmarkMedium                               avgt    5     21.085 ±   1.112   us/op
JMHBenchmark.benchmarkMedium:·gc.alloc.rate                avgt    5    833.438 ±  44.777  MB/sec
JMHBenchmark.benchmarkMedium:·gc.alloc.rate.norm           avgt    5  20272.002 ±   0.001    B/op
JMHBenchmark.benchmarkMedium:·gc.churn.G1_Eden_Space       avgt    5    827.537 ±  70.651  MB/sec
JMHBenchmark.benchmarkMedium:·gc.churn.G1_Eden_Space.norm  avgt    5  20126.922 ± 783.745    B/op
JMHBenchmark.benchmarkMedium:·gc.churn.G1_Old_Gen          avgt    5      0.206 ±   0.016  MB/sec
JMHBenchmark.benchmarkMedium:·gc.churn.G1_Old_Gen.norm     avgt    5      5.021 ±   0.162    B/op
JMHBenchmark.benchmarkMedium:·gc.count                     avgt    5    268.000            counts
JMHBenchmark.benchmarkMedium:·gc.time                      avgt    5    193.000                ms

Conclusion

This changeset significantly improves the memory efficiency of JSOUP which turns into massive performance gains.

@jhy
Copy link
Owner

jhy commented Jul 1, 2022

Good stuff, thanks! I may move some things around (not sure if the recycler should be a public contract API, etc).

Enjoy your vacation :)

@jhy jhy changed the title ISSUE-1773 Implement buffer recycling for CharacterReader Implement buffer recycling for CharacterReader Jul 1, 2022
@chibenwa
Copy link
Author

chibenwa commented Jul 1, 2022

The next big thing is improving the efficiency of the InputStream based APIs

    @Benchmark
    public void benchmarkMediumInputStream(Blackhole bh) throws Exception{
        bh.consume(Jsoup.parse(new ByteArrayInputStream(CONTENT_MEDIUM_AS_BYTES), StandardCharsets.UTF_8.name(), ""));
    }

Yields

Benchmark                                                             Mode  Cnt       Score      Error   Units
JMHBenchmark.benchmarkMediumInputStream                               avgt    5      35.285 ±    1.339   us/op
JMHBenchmark.benchmarkMediumInputStream:·gc.alloc.rate                avgt    5    3504.566 ±  133.319  MB/sec
JMHBenchmark.benchmarkMediumInputStream:·gc.alloc.rate.norm           avgt    5  142656.003 ±    0.001    B/op
JMHBenchmark.benchmarkMediumInputStream:·gc.churn.G1_Eden_Space       avgt    5    3497.228 ±  134.669  MB/sec
JMHBenchmark.benchmarkMediumInputStream:·gc.churn.G1_Eden_Space.norm  avgt    5  142359.431 ± 3459.922    B/op
JMHBenchmark.benchmarkMediumInputStream:·gc.churn.G1_Old_Gen          avgt    5       0.379 ±    0.356  MB/sec
JMHBenchmark.benchmarkMediumInputStream:·gc.churn.G1_Old_Gen.norm     avgt    5      15.420 ±   13.903    B/op
JMHBenchmark.benchmarkMediumInputStream:·gc.count                     avgt    5     416.000             counts
JMHBenchmark.benchmarkMediumInputStream:·gc.time                      avgt    5     423.000                 ms

Which memory wise is terrible compared to the strign version... I'd be better of creating the String prior parsing rather than supplying the imputStream....

@chibenwa
Copy link
Author

chibenwa commented Jul 1, 2022

Benchmark                                                             Mode  Cnt       Score      Error   Units
JMHBenchmark.benchmarkMedium                                          avgt    5      20.314 ±    1.621   us/op
JMHBenchmark.benchmarkMedium:·gc.alloc.rate                           avgt    5     654.901 ±   51.181  MB/sec
JMHBenchmark.benchmarkMedium:·gc.alloc.rate.norm                      avgt    5   15344.002 ±    0.001    B/op
JMHBenchmark.benchmarkMedium:·gc.churn.G1_Eden_Space                  avgt    5     650.845 ±   57.968  MB/sec
JMHBenchmark.benchmarkMedium:·gc.churn.G1_Eden_Space.norm             avgt    5   15248.663 ±  495.411    B/op
JMHBenchmark.benchmarkMedium:·gc.churn.G1_Old_Gen                     avgt    5       0.193 ±    0.031  MB/sec
JMHBenchmark.benchmarkMedium:·gc.churn.G1_Old_Gen.norm                avgt    5       4.533 ±    0.927    B/op
JMHBenchmark.benchmarkMedium:·gc.count                                avgt    5     244.000             counts
JMHBenchmark.benchmarkMedium:·gc.time                                 avgt    5     174.000                 ms
JMHBenchmark.benchmarkMediumInputStream                               avgt    5      32.855 ±    1.372   us/op
JMHBenchmark.benchmarkMediumInputStream:·gc.alloc.rate                avgt    5    3633.766 ±  150.956  MB/sec
JMHBenchmark.benchmarkMediumInputStream:·gc.alloc.rate.norm           avgt    5  137728.003 ±    0.001    B/op
JMHBenchmark.benchmarkMediumInputStream:·gc.churn.G1_Eden_Space       avgt    5    3632.474 ±  189.650  MB/sec
JMHBenchmark.benchmarkMediumInputStream:·gc.churn.G1_Eden_Space.norm  avgt    5  137676.632 ± 2596.732    B/op
JMHBenchmark.benchmarkMediumInputStream:·gc.churn.G1_Old_Gen          avgt    5       0.339 ±    0.368  MB/sec
JMHBenchmark.benchmarkMediumInputStream:·gc.churn.G1_Old_Gen.norm     avgt    5      12.879 ±   14.070    B/op
JMHBenchmark.benchmarkMediumInputStream:·gc.count                     avgt    5     420.000             counts
JMHBenchmark.benchmarkMediumInputStream:·gc.time                      avgt    5     427.000                 ms
JMHBenchmark.benchmarkSmall                                           avgt    5       1.656 ±    0.035   us/op
JMHBenchmark.benchmarkSmall:·gc.alloc.rate                            avgt    5    1699.708 ±   36.050  MB/sec
JMHBenchmark.benchmarkSmall:·gc.alloc.rate.norm                       avgt    5    3248.000 ±    0.001    B/op
JMHBenchmark.benchmarkSmall:·gc.churn.G1_Eden_Space                   avgt    5    1681.110 ±   63.848  MB/sec
JMHBenchmark.benchmarkSmall:·gc.churn.G1_Eden_Space.norm              avgt    5    3212.422 ±   74.478    B/op
JMHBenchmark.benchmarkSmall:·gc.churn.G1_Old_Gen                      avgt    5       0.186 ±    0.015  MB/sec
JMHBenchmark.benchmarkSmall:·gc.churn.G1_Old_Gen.norm                 avgt    5       0.355 ±    0.034    B/op
JMHBenchmark.benchmarkSmall:·gc.count                                 avgt    5     337.000             counts
JMHBenchmark.benchmarkSmall:·gc.time                                  avgt    5     270.000                 ms

@chibenwa
Copy link
Author

chibenwa commented Jul 1, 2022

(Force pushed to solve the conflict)

@chibenwa
Copy link
Author

Hello @jhy , any status on this work?

Also would it make sense to add a JVM system property to turn off the pooling behavior if not desired by the end user?

@chibenwa
Copy link
Author

Hello @jhy

What can I do to help moving on on this topic?

@jhy
Copy link
Owner

jhy commented Jan 6, 2023

My apologies for the radio silience @chibenwa, I was pulled away from this.

I have a refactoring of this to make the recyclers more generic, and will push an update for your review shortly.

@jhy
Copy link
Owner

jhy commented Jan 7, 2023

Apologies for the force-push, but I wanted to bring the branch current to HEAD.

Here's a WIP of the refactoring to use a generic pool approach. I haven't profiled it much yet. I'd be glad for your review and profiling.

A couple points:

  • I temporarily disabled the benchmark just on committing these, but please re-enable for your testing. When we land this, I think I will remove the benchmark from the commit, and move out to a distinct repo (so that versions of jsoup can be compared more readily)
  • I made the char buffer size constant vs allocating smaller ephemeral buffers if the input was smaller than the buffer size. I figure that because we are using a constant pool, it will be better to recycle somewhat larger objects than thrashing on smaller objects
  • I want to make the DataUtil buffers use this too - I think I can just remove the BufferedReader (which internally allocates a new buffer) because the CharacterReader implements its own buffer. Two is redundant.
  • It would be good if the small HTML test had more varied inputs. Otherwise the String Cache in CharacterReader is always going to have an artificially high hit rate, which might bias the performance tests.
  • I don't know that we need to make these pools optional via a tuning configuration. I would prefer to just get it to work well for all circumstances. Just one less thing for the developer to worry about.
  • In contrast to the last point however, I wonder if we should make the StringCache optional, or automatically switch it on/off depending on the platform (server or Android). I have found during perf testing that the overhead of maintaining it when not on Android is worse than the allocation cost.

@jhy
Copy link
Owner

jhy commented Jan 7, 2023

(Am getting some build failures which are from the FuzzFixes tests -- some are timing out. Those have been sensitive to the CPU allocation on the workers so am not sure if it's just GitHub actions running a bit slower, or if these changes caused any slowness. I don't believe it's the latter, as my local perf tests show improvements, but we need to dig in.)

Perf tests show this performs better when parsing larger docs; small strings still get the speed boost as the buffer is recycled.
In CharacterReader. This removes the redundant buffer allocation, and simplifies the read.

For small file reads (same as small string test), updated version is ~ 20% faster than original.
release.
*/
@Deprecated
public CharacterReader(Reader input, int sz) {

Check notice

Code scanning / CodeQL

Useless parameter

The parameter 'sz' is never used.
@jhy
Copy link
Owner

jhy commented Jan 12, 2023

(The Windows builds are failing - looks like for HTTP fetches the connection length is not getting fully read in some case. Need to investigate.)

Refactored so that it eats until a combinator is seen after non-combinator content, and returns it all.

And corrected unit tests that were incorrectly relying on that behavior.

Note that a leading combinator will combine against the root element, which is either the Document, or the context element.

Fixes jhy#1707
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CharacterReader always allocate a 32 KB buffer that can even exceed document size
2 participants