-
-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make Cursor.stream() work in pipeline mode #409
base: master
Are you sure you want to change the base?
Conversation
We add a first test similar to test_singlerowmode() from Postgres test suite (src/test/modules/libpq_pipeline/libpq_pipeline.c). The second one, XFAIL, as it demonstrates a bug in Postgres.
When sending the query from stream() in pipeline mode, we now use the new pipeline_send() generator to also account for any command previously in the pipeline. Results fetch is done through _stream_fetchone_gen() method of Pipeline, which also possibly accounts for results of prior commands. The single-row query is identified in the results queue by an extra bool item in the tuple of PendingResult. As can be seen in test_cursor_stream_query_fetch_bug(), there is a problem when the pipeline contains a normal query after a streamed one, apparently due to a bug in the libpq. Cursor.stream() somehow interrupts the pipeline as it forces results fetch for the single-row query, and thus for any query in the pipeline queue (emitted before .stream()).
We enqueue an extra Sync() message after sending the streamed query; this will reset the single-row mode flag, which is not done presently on libpq side.
Hi @dlax, are we sure that the work done here is valuable? It seems to me that streaming and batching go in two different directions, and there are other features that cannot be supported in pipeline mode (e.g. COPY). Is there a concrete gain in adding this extra complexity and branching in the codebase? |
Hi @dvarrazzo, That's a good question and I understand the concern about possible extra maintenance burden. I do remember that we stated that pipeline and streaming modes were not supposed to be used together in the initial implementation. But then I saw that there was code handling this and documentation in the libpq ,so I thought it could be worth. Currently, it's a limitation we impose on our side (or an oversight), in contrast with In fact, I'm not sure these two modes go in "different directions"; namely the pipeline mode helps when network latency matters, and single-row mode would help when, e.g., memory consumption on the client matters. I would suppose someone might be interested in sending a batch of queries, but receiving the results of some of them row-by-row. Now, as mentioned above, the implementation proposed here does not decouple query sending from results fetch in I also remember you raised the "silly" idea that we could make the connection always in pipelining, and I was thinking about the blockers for this. Maybe that's theoretical, maybe not? |
It's probably worth doing some refactoring before proceeding with this feature so as to limit branching in the code base. Marking as draft accordingly now; will come back later to this. |
This adds support for the pipeline for
Cursor.stream()
, aka single-row mode.Due to how
.stream()
is implemented, it somehow interrupts the pipeline as it forces results fetch alongside query send so that's a bit suboptimal. But at least, people using the pipeline mode would not need to exit/re-enter it to use.stream()
.There is a problem in the libpq when using single-row mode and pipeline mode together; this has been reported and fixed, but is worked around on our side.
Closes #406