Skip to content

with_fetch_size not working with postgresql jdbc #1665

Closed
@davebyrne

Description

@davebyrne

Note: If you have a question about Sequel, would like help using
Sequel, want to request a feature, or do anything else other than
submit a bug report, please use the sequel-talk Google Group or IRC.

Complete Description of Issue

The postgresql jdbc driver requires autocommit to be set to false when setFetchSize is used to request results using a cursor. If autocommit is not set to false, then setFetchSize is silently ignored and the entire resultset is loaded into memory. See https://jdbc.postgresql.org/documentation/head/query.html#query-with-cursor

Sequel by default sets autocommit to be true, and as a result the method Sequel::Jdbc::Dataset#with_fetch_size is silently ignored.

Patching Sequel::JDBC::Database seems to work, although I'm not sure what the side-effects of temporarily disabling autocommit could be.

module Sequel
  module JDBC
    class Database < Sequel::Database
      # Yield a new statement object, and ensure that it is closed before returning.
      def statement(conn)
        # if a transaction is active, and a fetch_size is requested, turn off autocommit
        conn.setAutoCommit(false) if fetch_size && in_transaction?
        stmt = conn.createStatement
        yield stmt
      rescue *DATABASE_ERROR_CLASSES => e
        raise_error(e)
      ensure
        stmt.close if stmt
        conn.setAutoCommit(true) if !conn.getAutoCommit
      end
    end
  end
end

If autocommit can not be safely disabled during execution, could an error be raised when with_fetch_size is used with the Postgresql JDBC Adaptor to let the user know that this combination is not supported?

Simplest Possible Self-Contained Example Showing the Bug

# will create OutOfMemoryException on JRuby
DB[:huge_table].with_fetch_size(10).each { |r| puts r }

Full Backtrace of Exception (if any)

none

SQL Log (if any)

none

Activity

jeremyevans

jeremyevans commented on Dec 20, 2019

@jeremyevans
Owner

Thanks for filling an issue. I agree that the current state of affairs is suboptimal. I'm not sure what the ramifications of disabling autocommit are, and I don't want to use that approach. Raising an exception could break existing code. I think a warning when with_fetch_size is called might be more appropriate here. We could probably recommend using paged_each instead. What do you think?

davebyrne

davebyrne commented on Dec 20, 2019

@davebyrne
Author

Thanks for responding. I think a warning when with_fetch_size is called would be very helpful from a debugging standpoint to at least avoid any unexpected behavior.

Looking at paged_each in the JDBC adapter, I see that it is implemented using LIMIT/OFFSET clauses. This has a severe performance impact on large tables (think scanning, sorting, then offsetting 1M+ rows repeatedly). I can see that the adapter using the pg gem explicitly uses a server side cursor, which would certainly solve the issues in the JDBC adapter as well. I'm not familiar enough with the dataset internals to refactor it appropriately, but just translating the libpq calls to JDBC seems to work.

For the time being, I can certainly explicitly call server-side cursors, but is it possible to get the cursor support ported to the JDBC adapter (in a less nasty version than below)?

module Sequel
  module JDBC
    class Dataset < Sequel::Dataset
      def fetch_rows(sql)
        return cursor_fetch_rows(sql){|h| yield h} if @opts[:cursor]
        execute(sql){|res| process_result_set(res) { |row| yield row }}
      end
      # Use a cursor for paging.
      def paged_each(opts=OPTS, &block)
        unless block_given?
          return enum_for(:paged_each, opts)
        end
        use_cursor(opts).each(&block)
      end

      def use_cursor(opts=OPTS)
        clone(:cursor=>{:rows_per_fetch=>1000}.merge!(opts))
      end

      # Use a cursor to fetch groups of records at a time, yielding them to the block.
      def cursor_fetch_rows(sql)
        server_opts = {:server=>@opts[:server] || :read_only}
        cursor = @opts[:cursor]
        hold = cursor[:hold]
        cursor_name = quote_identifier(cursor[:cursor_name] || 'sequel_cursor')
        rows_per_fetch = cursor[:rows_per_fetch].to_i

        db.public_send(*(hold ? [:synchronize, server_opts[:server]] : [:transaction, server_opts])) do 
          begin
            execute_ddl("DECLARE #{cursor_name} NO SCROLL CURSOR WITH#{'OUT' unless hold} HOLD 
FOR #{sql}", server_opts)
            rows_per_fetch = 1000 if rows_per_fetch <= 0
            fetch_sql = "FETCH FORWARD #{rows_per_fetch} FROM #{cursor_name}"
            while true
              execute(fetch_sql) do |res|
                count = 0
                process_result_set(res) do |row|
                  count +=1
                  yield row
                end
                return if count < rows_per_fetch
              end
            end
          rescue Exception => e
            raise
          ensure
            begin
              execute_ddl("CLOSE #{cursor_name}", server_opts)
            rescue
              raise e if e
              raise
            end
          end
        end
      end
    end
  end
end
jeremyevans

jeremyevans commented on Dec 20, 2019

@jeremyevans
Owner

Dataset#paged_each does not have to use LIMIT/OFFSET, it supports a :strategy=>:filter option that supports a LIMIT and a filter excluding previous rows. It isn't the default behavior because it requires the dataset have an unambiguous order (fairly easy in most production cases, though it can have performance impacts).

I'm fine with a pull request that adds support for Dataset#use_cursor to the jdbc/postgresql adapter. If this is added, then Dataset#paged_each can be set to use it automatically instead of LIMIT/OFFSET or LIMIT+filter (that is how the postgres adapter works). We could then have with_fetch_size not warn and just call use_cursor internally.

Please let me know if it that is something you are interested in working on. If not, I can add the warning to with_fetch_size in the mean time, and add adding Dataset#use_cursor to the jdbc/postgresql adapter to my todo list.

davebyrne

davebyrne commented on Dec 21, 2019

@davebyrne
Author

Great! I'll take a stab at a PR.

jeremyevans

jeremyevans commented on Dec 30, 2019

@jeremyevans
Owner

I'm going to add a warning for this shortly. The PR can remove the warning and replace it with an implementation that uses cursors.

added a commit that references this issue on Feb 25, 2025
3206ce5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @jeremyevans@davebyrne

        Issue actions

          with_fetch_size not working with postgresql jdbc · Issue #1665 · jeremyevans/sequel