Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PgRowPublisherOperation should consider Subscriber demand #33

Open
mp911de opened this issue Dec 4, 2018 · 5 comments
Open

PgRowPublisherOperation should consider Subscriber demand #33

mp911de opened this issue Dec 4, 2018 · 5 comments

Comments

@mp911de
Copy link

mp911de commented Dec 4, 2018

PgRowPublisherOperation accepts Flow.Subscriber that requests a number of data signals n (typically rows) via Subscription.request(n). PgRowPublisherOperation should consider demand to:

  • Determine a reasonable fetch size (probably a good suggestion to introduce fetchSize(long rows) on RowPublisherOperation in ADBA spec)
  • Request only n rows from a cursor (when using cursored fetching)
  • Request nextn rows as soon as Subscriber requests additional rows

Typically, it's hard to fetch exactly n rows, but the point here is to fetch more or less n rows to satisfy demand and pre-buffer some additional rows (smart prefetching) to reduce latency between I/O and processing.

@alexanderkjall
Copy link
Collaborator

Do you think there is a need to tune the fetch size per request, or could it be a SessionParameter?

@mp911de
Copy link
Author

mp911de commented Dec 4, 2018

Both, assuming request maps to a statement execution and not to a single server req/resp exchange.

You want to be able to tune defaults for an application. Some queries require additional tuning (see JDBC's FetchSize) to optimize for e.g. large or small data sets.

@alexanderkjall
Copy link
Collaborator

I have thought some more on this:

  • Specifying a request limit will block the sending of other queries while this on is in flight, as adding a limit to the Execute message of the wire protocol puts the protocol in a "chatty" mode where you can't send and receive asynchronously, maybe not a bit problem but it might impact speed and will require some redesign of the network layer to expose this information to it.
  • The SubmissionPublisher class has an estimateMinimumDemand() function that looks like it would be suitable for this usecase.
  • There has been quite some debate about the need for backpressure on the jdbc-spec-discuss mailing list and my gut feeling is that there will be more feedback on that subject, the topic is quite complex.

Based on the above points I still thinks this is a valid feature request, but as it's nontrivial to implement and the specification might change again I will focus on lower hanging fruit for now.

@mp911de
Copy link
Author

mp911de commented Dec 8, 2018

I'd like to elaborate a bit on your first point:

Specifying a request limit will block the sending of other queries while this one is in flight

That's exactly the case. No demand will stop traffic on the connection and unblock the connection once the command is either canceled or registers additional demand. That is basically what should happen on non-multiplexed connections. In the first moment, this seems an odd thought because: Why would one want this to happen?

If you do not share the connection but use it linearly (send a statement, consume results, send next statement, …), this pattern makes totally sense as you do not read more data at a time than necessary. Thus, you don't allocate more memory than needed (based on subscriber demand) to process incoming data – you leave all data outside your application/JVM.

If you do share the connection or even want to run interleaved queries, requirements might change a bit. I'm not entirely familiar with the Postgres wire protocol, but I know from older SQL server versions that a single transport connection was bound to single query execution and didn't allow for interleaved SQL batches (until MARS was implemented). In such a scenario, you would not want to prevent interleaved queries, and thus you want to unblock the connection (consume a response) to issue the next query while a cursor/result is not fully consumed. The price for this feature is paid with higher memory consumption as you need to consume a response and keep it in memory until its subscriber asks for more data.

At this point, a driver might want to consider the actual demand and allow for optimizations. You want to reduce the number of roundtrips while aligning the fetch size to the actual demand.

but as it's nontrivial to implement

I concur. This feature needs additional thought.

@cretz
Copy link

cretz commented Dec 9, 2018

I'm not entirely familiar with the Postgres wire protocol, but I know from older SQL server versions that a single transport connection was bound to single query execution and didn't allow for interleaved SQL batches

This is the case with the Postgres protocol too, a connection cannot be multiplexed. See https://www.postgresql.org/docs/current/protocol.html.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants