Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't flush at each put_copy_data call, but flush at get_result #462

Merged
merged 1 commit into from Jun 14, 2022

Conversation

larskanis
Copy link
Collaborator

This better mimics, what libpq does internally.

put_copy_data is significantly faster when it doesn't flush at every call. This is by a factor of 4 on Linux and 10 on Windows when sending typical per-row blocks of 60 byte.

put_copy_end unconditionally calls flush in libpq, so it is not changed here.

Also adjust conn.block to send all enqueued data to mimic the behavior of get_result in libpq. With the change to put_copy_data, unsent data can happen, when largs blocks are sent. In this case get_result should catch up on flush. This is what the newly added spec verifies.

PQgetResult does flushing based on it's internal states that we don't have access to. Since conn.block is performance critical in case of single_row_mode, we don't flush at every call to conn.block, but only when it's about to wait for IO.

This better mimics, what libpq does internally.

put_copy_data is significantly faster when it doesn't flush at every call.
This is by a factor of 4 on Linux and 10 on Windows when sending typical per-row blocks of 60 byte.

put_copy_end unconditionally calls flush in libpq, so it is not changed here.

Also adjust conn.block to send all enqueued data to mimic the behavior of get_result in libpq.
With the change to put_copy_data, unsent data can happen, when largs blocks are sent.
In this case get_result should catch up on flush.
This is what the newly added spec verifies.

PQgetResult does flushing based on it's internal states that we don't have access to.
Since conn.block is performance critical in case of single_row_mode, we don't flush at every call to conn.block, but only when it's about to wait for IO.
@larskanis larskanis merged commit 8141b8d into ged:master Jun 14, 2022
@larskanis larskanis deleted the flush branch June 14, 2022 19:01
larskanis added a commit to larskanis/ruby-pg that referenced this pull request Aug 8, 2022
We had a blocking flush in pg-1.3.x at every call to put_copy_data.
This made sure, that all data is sent until the next put_copy_data.
In ged#462 (and pg-1.4.0 to .2) the behaviour was changed to rely on the non-blocking flushs libpq is doing internally.
This makes a decent performance improvement especially on Windows.
Unfortunately ged#473 proved that memory bloat can happen, when sending the data is slower than calls to put_copy_data happen.

As a trade-off this proposes to do a blocking flush only every 100 calls.

If libpq is running in blocking mode (PG::Connection.async_api = false) put_copy_data does a blocking flush every time new memory is allocated.
Unfortunately we don't have this kind of information, since we don't have access to libpq's PGconn struct and the return codes don't give us an indication when this happens.
So doing a flush at every fixed number of calls is a very simple heuristic.

Fixes ged#473
larskanis added a commit to larskanis/ruby-pg that referenced this pull request Aug 9, 2022
We had a blocking flush in pg-1.3.x at every call to put_copy_data.
This made sure, that all data is sent until the next put_copy_data.
In ged#462 (and pg-1.4.0 to .2) the behaviour was changed to rely on the non-blocking flushs libpq is doing internally.
This makes a decent performance improvement especially on Windows.
Unfortunately ged#473 proved that memory bloat can happen, when sending the data is slower than calls to put_copy_data happen.

As a trade-off this proposes to do a blocking flush only every 100 calls.

If libpq is running in blocking mode (PG::Connection.async_api = false) put_copy_data does a blocking flush every time new memory is allocated.
Unfortunately we don't have this kind of information, since we don't have access to libpq's PGconn struct and the return codes don't give us an indication when this happens.
So doing a flush at every fixed number of calls is a very simple heuristic.

Fixes ged#473
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant