Don't flush at each put_copy_data call, but flush at get_result #462

larskanis · 2022-06-14T12:45:35Z

This better mimics, what libpq does internally.

put_copy_data is significantly faster when it doesn't flush at every call. This is by a factor of 4 on Linux and 10 on Windows when sending typical per-row blocks of 60 byte.

put_copy_end unconditionally calls flush in libpq, so it is not changed here.

Also adjust conn.block to send all enqueued data to mimic the behavior of get_result in libpq. With the change to put_copy_data, unsent data can happen, when largs blocks are sent. In this case get_result should catch up on flush. This is what the newly added spec verifies.

PQgetResult does flushing based on it's internal states that we don't have access to. Since conn.block is performance critical in case of single_row_mode, we don't flush at every call to conn.block, but only when it's about to wait for IO.

This better mimics, what libpq does internally. put_copy_data is significantly faster when it doesn't flush at every call. This is by a factor of 4 on Linux and 10 on Windows when sending typical per-row blocks of 60 byte. put_copy_end unconditionally calls flush in libpq, so it is not changed here. Also adjust conn.block to send all enqueued data to mimic the behavior of get_result in libpq. With the change to put_copy_data, unsent data can happen, when largs blocks are sent. In this case get_result should catch up on flush. This is what the newly added spec verifies. PQgetResult does flushing based on it's internal states that we don't have access to. Since conn.block is performance critical in case of single_row_mode, we don't flush at every call to conn.block, but only when it's about to wait for IO.

We had a blocking flush in pg-1.3.x at every call to put_copy_data. This made sure, that all data is sent until the next put_copy_data. In ged#462 (and pg-1.4.0 to .2) the behaviour was changed to rely on the non-blocking flushs libpq is doing internally. This makes a decent performance improvement especially on Windows. Unfortunately ged#473 proved that memory bloat can happen, when sending the data is slower than calls to put_copy_data happen. As a trade-off this proposes to do a blocking flush only every 100 calls. If libpq is running in blocking mode (PG::Connection.async_api = false) put_copy_data does a blocking flush every time new memory is allocated. Unfortunately we don't have this kind of information, since we don't have access to libpq's PGconn struct and the return codes don't give us an indication when this happens. So doing a flush at every fixed number of calls is a very simple heuristic. Fixes ged#473

larskanis merged commit 8141b8d into ged:master Jun 14, 2022

larskanis deleted the flush branch June 14, 2022 19:01

This was referenced Aug 8, 2022

Do a blocking flush every 100 calls to put_copy_data #474

Merged

copy_data memory bloat in v1.4 #473

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't flush at each put_copy_data call, but flush at get_result #462

Don't flush at each put_copy_data call, but flush at get_result #462

larskanis commented Jun 14, 2022

Don't flush at each put_copy_data call, but flush at get_result #462

Don't flush at each put_copy_data call, but flush at get_result #462

Conversation

larskanis commented Jun 14, 2022