Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Default Buffer Size of 8K Bytes #1112

Open
belugabehr opened this issue Sep 22, 2023 · 4 comments
Open

Use Default Buffer Size of 8K Bytes #1112

belugabehr opened this issue Sep 22, 2023 · 4 comments
Labels
need-test-case Needs a test case (unit test or such) to proceed

Comments

@belugabehr
Copy link

belugabehr commented Sep 22, 2023

Default buffer size is 8000 bytes which is not a power of 2. Update size to align with Java's default buffer size: 8192. The smaller buffers should be 2kb: 2048.

I've always heard that this number was chosen to be an multiple of disk sector size (4kb).

private final static int[] BYTE_BUFFER_LENGTHS = new int[] { 8000, 8000, 2000, 2000 };

@cowtowncoder
Copy link
Member

@belugabehr Before making the change I'd want some numbers. General idea that matching to a disk block might be more efficient isn't super convincing. Java memory layout adds couple of extra bytes too so exact size of 0x2000 may not align any better.

That is, while I'm not necessarily against different defaults, I'd want to know there is some actual measurable benefit; somehow to verify we are not changing things just based on vague feelings of them being sub-optimal.

@cowtowncoder cowtowncoder added the need-test-case Needs a test case (unit test or such) to proceed label Sep 23, 2023
@belugabehr
Copy link
Author

Yup. Fair enough.

The idea though is not that it aligns with memory boundaries, but that it aligns with disk boundaries - reading two full sectors is 8192 bytes.

Some discussion here:

I noticed this while stepping through the Jackson code that it doesn't try to determine if the incoming InputStream is already buffered (i.e., ByteArrayInputStream, BufferedInputStream, etc.) and will always copy incoming data from one buffer to another. There is room for real tangible optimization on that front as well.

@cowtowncoder
Copy link
Member

On trying to determine buffering: problem is that JDK types do not really expose access to their internal buffers.
So copying is necessary (byte by byte access has overhead and cannot quite be eliminated by HotSpot compiler) for good performance.

On aligning to disk block size: due to buffering at various level (OS, hardware) anticipating optimal sizes is tricky.
Block sizes are typically bigger than 8kB as well I think (esp. since SSDs took over spinning disks).

One possible way forward would be to make actual sizing more configurable so that developers who know their needs (for example preferring larger buffers when input size is known to be typically large) can tune their usage.

@cowtowncoder
Copy link
Member

Quick note: wrt changing of defaults, I'd accept benchmark runs that shows improvement on reading fromFile (for example).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need-test-case Needs a test case (unit test or such) to proceed
Projects
None yet
Development

No branches or pull requests

2 participants