make Cursor.stream() work in pipeline mode #409

dlax · 2022-10-10T10:35:50Z

This adds support for the pipeline for Cursor.stream(), aka single-row mode.

Due to how .stream() is implemented, it somehow interrupts the pipeline as it forces results fetch alongside query send so that's a bit suboptimal. But at least, people using the pipeline mode would not need to exit/re-enter it to use .stream().

There is a problem in the libpq when using single-row mode and pipeline mode together; this has been reported and fixed, but is worked around on our side.

Closes #406

We add a first test similar to test_singlerowmode() from Postgres test suite (src/test/modules/libpq_pipeline/libpq_pipeline.c). The second one, XFAIL, as it demonstrates a bug in Postgres.

When sending the query from stream() in pipeline mode, we now use the new pipeline_send() generator to also account for any command previously in the pipeline. Results fetch is done through _stream_fetchone_gen() method of Pipeline, which also possibly accounts for results of prior commands. The single-row query is identified in the results queue by an extra bool item in the tuple of PendingResult. As can be seen in test_cursor_stream_query_fetch_bug(), there is a problem when the pipeline contains a normal query after a streamed one, apparently due to a bug in the libpq. Cursor.stream() somehow interrupts the pipeline as it forces results fetch for the single-row query, and thus for any query in the pipeline queue (emitted before .stream()).

We enqueue an extra Sync() message after sending the streamed query; this will reset the single-row mode flag, which is not done presently on libpq side.

dvarrazzo · 2022-10-12T13:30:03Z

Hi @dlax,

are we sure that the work done here is valuable? It seems to me that streaming and batching go in two different directions, and there are other features that cannot be supported in pipeline mode (e.g. COPY).

Is there a concrete gain in adding this extra complexity and branching in the codebase?

dlax · 2022-10-12T15:18:56Z

Hi @dvarrazzo,

That's a good question and I understand the concern about possible extra maintenance burden.

I do remember that we stated that pipeline and streaming modes were not supposed to be used together in the initial implementation. But then I saw that there was code handling this and documentation in the libpq ,so I thought it could be worth.

Currently, it's a limitation we impose on our side (or an oversight), in contrast with COPY.

In fact, I'm not sure these two modes go in "different directions"; namely the pipeline mode helps when network latency matters, and single-row mode would help when, e.g., memory consumption on the client matters. I would suppose someone might be interested in sending a batch of queries, but receiving the results of some of them row-by-row.

Now, as mentioned above, the implementation proposed here does not decouple query sending from results fetch in .stream(), so this use case cannot be realized as is (unless the streaming query is the last one in the pipeline). But that's a limitation I'd like to remove in the future (if we decide to go further with this idea).

I also remember you raised the "silly" idea that we could make the connection always in pipelining, and I was thinking about the blockers for this. Maybe that's theoretical, maybe not?

dlax · 2023-01-24T16:01:41Z

It's probably worth doing some refactoring before proceeding with this feature so as to limit branching in the code base.

Marking as draft accordingly now; will come back later to this.

test(pq): check pipeline and single-row modes together

3cc2c0d

We add a first test similar to test_singlerowmode() from Postgres test suite (src/test/modules/libpq_pipeline/libpq_pipeline.c). The second one, XFAIL, as it demonstrates a bug in Postgres.

dlax force-pushed the pipeline-stream branch from 679f330 to 228dede Compare October 10, 2022 19:58

dlax added 2 commits October 11, 2022 09:05

fix: work around libpq issue in pipeline+single-row modes

5cb1948

We enqueue an extra Sync() message after sending the streamed query; this will reset the single-row mode flag, which is not done presently on libpq side.

dlax force-pushed the pipeline-stream branch from 228dede to 5cb1948 Compare October 11, 2022 07:08

dlax marked this pull request as ready for review October 12, 2022 09:47

dlax mentioned this pull request Nov 4, 2022

A couple of questions about pipeline communicate #431

Closed

dlax marked this pull request as draft January 24, 2023 16:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make Cursor.stream() work in pipeline mode #409

make Cursor.stream() work in pipeline mode #409

dlax commented Oct 10, 2022 •

edited

dvarrazzo commented Oct 12, 2022

dlax commented Oct 12, 2022 •

edited

dlax commented Jan 24, 2023

make Cursor.stream() work in pipeline mode #409

Are you sure you want to change the base?

make Cursor.stream() work in pipeline mode #409

Conversation

dlax commented Oct 10, 2022 • edited

dvarrazzo commented Oct 12, 2022

dlax commented Oct 12, 2022 • edited

dlax commented Jan 24, 2023

dlax commented Oct 10, 2022 •

edited

dlax commented Oct 12, 2022 •

edited