Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] Large Object's Input Stream (BlobInputStream) takes a lot of time due to read one by one in application level #2375

Closed
wants to merge 4 commits into from

Conversation

inhabitantNewCity
Copy link

fix performance issue described below
#2374 (comment)

vlan0416 and others added 2 commits December 28, 2021 19:10
…jdbc#2374

change BlobInputStream behaviors to support buffered read. Large Object's Input Stream (BlobInputStream) takes a lot of time due to read one by one in application level

pgjdbc#2374
@vlsi
Copy link
Member

vlsi commented Dec 28, 2021

PS. It would be great if you could create PRs using a feature-branch.
PRs from a master branch do not really work great, especially, if you have multiple PRs.

@vlsi
Copy link
Member

vlsi commented Dec 28, 2021

@inhabitantNewCity , it looks like the PR won't handle the case when the client mixes .read() and .read(byte[]... calls.
Would you please add a test so the stream is processed with various APIs to ensure the buffers work properly?

There's already StrangeInputStream to cover that case, however, it looks like .read() is missing there.

@inhabitantNewCity
Copy link
Author

it is covered by Code (because I reuse the same variables in buffered read)
I will add additional test for check such case. thank you.

about Feather branch, Could I use master for this issue?
I will use separate branch for next features

@vlsi
Copy link
Member

vlsi commented Dec 28, 2021

Keeping this in master is just fine.

@davecramer
Copy link
Member

It's going to be quite complicated to keep both of these implementations.

Scenario 1 . read() is called first and there is data in buffer, next read(b,o,l) is called, one would have to copy the buffer in first and start reading from that point on adjusting the offset and length. (This may be all that is required)

Scenario 2). read(b,o,l) is called first, I guess all that is really necessary is to update apos.

I'm not sure exactly why there is a limit at all, that seems rather pointless.

@inhabitantNewCity
Copy link
Author

inhabitantNewCity commented Dec 28, 2021

yes, I found bug with mixed access, but I want to make it simplified if user call read() that system will redirect to previous approach like:

 if (bpos != 0) {
      return super.read(buf,off,len);
    }

@vlsi @davecramer What do you think?
Because such mixing looks like weird. it is really special case :) and I afraid that such case will bring huge CPU consumption to fully support such case .

about limit, yes I agree, as for me it is over complicated to support partial read and not so useful

vlan0416 added 2 commits December 28, 2021 20:38
change BlobInputStream behaviors to support buffered read. Large Object's Input Stream (BlobInputStream) takes a lot of time due to read one by one in application level

pgjdbc#2374
@davecramer
Copy link
Member

I'm thinking something more like PR #2376

@vlsi
Copy link
Member

vlsi commented Dec 5, 2023

I'm closing the PR as the fix will be available in 42.7.1, see #3044

@vlsi vlsi closed this Dec 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants