Description
Note: If you have a question about Sequel, would like help using
Sequel, want to request a feature, or do anything else other than
submit a bug report, please use the sequel-talk Google Group or IRC.
Complete Description of Issue
The postgresql jdbc driver requires autocommit to be set to false when setFetchSize is used to request results using a cursor. If autocommit is not set to false, then setFetchSize is silently ignored and the entire resultset is loaded into memory. See https://jdbc.postgresql.org/documentation/head/query.html#query-with-cursor
Sequel by default sets autocommit to be true, and as a result the method Sequel::Jdbc::Dataset#with_fetch_size is silently ignored.
Patching Sequel::JDBC::Database seems to work, although I'm not sure what the side-effects of temporarily disabling autocommit could be.
module Sequel
module JDBC
class Database < Sequel::Database
# Yield a new statement object, and ensure that it is closed before returning.
def statement(conn)
# if a transaction is active, and a fetch_size is requested, turn off autocommit
conn.setAutoCommit(false) if fetch_size && in_transaction?
stmt = conn.createStatement
yield stmt
rescue *DATABASE_ERROR_CLASSES => e
raise_error(e)
ensure
stmt.close if stmt
conn.setAutoCommit(true) if !conn.getAutoCommit
end
end
end
end
If autocommit can not be safely disabled during execution, could an error be raised when with_fetch_size is used with the Postgresql JDBC Adaptor to let the user know that this combination is not supported?
Simplest Possible Self-Contained Example Showing the Bug
# will create OutOfMemoryException on JRuby
DB[:huge_table].with_fetch_size(10).each { |r| puts r }
Full Backtrace of Exception (if any)
none
SQL Log (if any)
none
Activity
jeremyevans commentedon Dec 20, 2019
Thanks for filling an issue. I agree that the current state of affairs is suboptimal. I'm not sure what the ramifications of disabling autocommit are, and I don't want to use that approach. Raising an exception could break existing code. I think a warning when
with_fetch_size
is called might be more appropriate here. We could probably recommend usingpaged_each
instead. What do you think?davebyrne commentedon Dec 20, 2019
Thanks for responding. I think a warning when
with_fetch_size
is called would be very helpful from a debugging standpoint to at least avoid any unexpected behavior.Looking at
paged_each
in the JDBC adapter, I see that it is implemented using LIMIT/OFFSET clauses. This has a severe performance impact on large tables (think scanning, sorting, then offsetting 1M+ rows repeatedly). I can see that the adapter using the pg gem explicitly uses a server side cursor, which would certainly solve the issues in the JDBC adapter as well. I'm not familiar enough with the dataset internals to refactor it appropriately, but just translating the libpq calls to JDBC seems to work.For the time being, I can certainly explicitly call server-side cursors, but is it possible to get the cursor support ported to the JDBC adapter (in a less nasty version than below)?
jeremyevans commentedon Dec 20, 2019
Dataset#paged_each
does not have to use LIMIT/OFFSET, it supports a:strategy=>:filter
option that supports a LIMIT and a filter excluding previous rows. It isn't the default behavior because it requires the dataset have an unambiguous order (fairly easy in most production cases, though it can have performance impacts).I'm fine with a pull request that adds support for
Dataset#use_cursor
to the jdbc/postgresql adapter. If this is added, thenDataset#paged_each
can be set to use it automatically instead of LIMIT/OFFSET or LIMIT+filter (that is how the postgres adapter works). We could then havewith_fetch_size
not warn and just calluse_cursor
internally.Please let me know if it that is something you are interested in working on. If not, I can add the warning to
with_fetch_size
in the mean time, and add addingDataset#use_cursor
to the jdbc/postgresql adapter to my todo list.davebyrne commentedon Dec 21, 2019
Great! I'll take a stab at a PR.
jeremyevans commentedon Dec 30, 2019
I'm going to add a warning for this shortly. The PR can remove the warning and replace it with an implementation that uses cursors.
Warn when calling Sequel::JDBC::Postgres::Dataset#with_fetch_size (Fixes