Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CharacterReader always allocate a 32 KB buffer that can even exceed document size #1773

Open
chibenwa opened this issue May 16, 2022 · 7 comments · May be fixed by #1800
Open

CharacterReader always allocate a 32 KB buffer that can even exceed document size #1773

chibenwa opened this issue May 16, 2022 · 7 comments · May be fixed by #1800

Comments

@chibenwa
Copy link

This might be overkill and we might be able to decrease the size of this buffer for small strings to be parsed.

This should thus reduce memory allocation.

Here is a small async profiler memory flame graph showing this:

Screenshot from 2022-05-16 15-36-50

@jhy
Copy link
Owner

jhy commented May 17, 2022

I think the difficulty will be in knowing upfront the expected content length of the response to parse. My understanding in practice is that if it is set, the Content Length header is often incorrect. And so browsers typically ignore it / handle larger sizes.

Have you actually profiled this to be a performance issue? I picked that number based on a bit of empirical benchmarking and testing and found it worked reasonably.

Another approach would be to reuse the buffer on subsequent reads, so the allocation doesn't need to happen. We use that pattern in e.g. StringBuilders.

@chibenwa
Copy link
Author

Have you actually profiled this to be a performance issue? I picked that number based on a bit of empirical benchmarking and testing and found it worked reasonably.

No not that critical, though you get many small HTLM emails in an email server.

@jhy
Copy link
Owner

jhy commented May 17, 2022

OK yes in that use-case I think it would make sense. A pattern of a threadlocal reusable buffer would work, similar to the Builders in StringUtil.

@chibenwa
Copy link
Author

Thinking more about this, is there something that prevent doing (potentially opt-in) buffer recyclingjust like Jackson JSON parsers does ?

I am working on a similar approach in MIME4J and it did so far yield a 40& allocation reduction that translated in micro-benchmarks into a 10-15% performance improvment.

I would be happy to cary other such a contribution.

@jhy
Copy link
Owner

jhy commented Jun 24, 2022

I'm not familiar with the implementation of Jackson parser. Recyling the buffer makes sense. I don't know that it should be opt-in as I would expect most people to miss it if it were.

Would be happy to review a PR!

@chibenwa
Copy link
Author

Question before getting started: are there some jmh benchmarks somewhere? Would you be interested in getting a set of jmh benchmarks as well?

Will be in vacations all of july so nothing would likely happen before august.

@chibenwa
Copy link
Author

chibenwa commented Jul 1, 2022

I did succeed to put together #1800 before leaving.

Looks like massive gains are ahead ;-)

@jhy jhy linked a pull request Jul 1, 2022 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants