Add option to use ordered relation for count #4911

mtollie · 2024-04-11T19:33:49Z

Is your feature request related to a problem? Please describe.

While paginating through results from an expensive PostgreSQL condition on a single partitioned table with many rows and questionably accurate table statistics, the ungrouped relation count query causes a suboptimal query to be chosen while evaluating hasNextPage.

Describe the solution you'd like

The ability to disable unscope(:order) for certain relations.

Describe alternatives you've considered

We have better table statistics and pg_hint_plan to use known better indexes.

#1911 introduced this optimization in the context of MySQL (and suggested this customizability too!). In any case, it intuitively makes sense that a semantically equivalent but more relaxed query would be more efficient. But for PostgreSQL in particular there's some reference to this situation

The query optimizer takes LIMIT into account when generating query plans, so you are very likely to get different plans (yielding different row orders) [...and...] inconsistent results unless you enforce a predictable result ordering with ORDER BY.

Additional context

This is a lightly anonymized query plan showing that the ORDER BY leads to a more efficient query in this case.

> EXPLAIN ANALYZE SELECT COUNT(*) FROM (SELECT ... LIMIT 1 OFFSET 25) subquery_for_count
QUERY PLAN
Aggregate  (cost=1160.74..1160.75 rows=1 width=8) (actual time=1139.914..1139.916 rows=1 loops=1)
  ->  Limit  (cost=1116.10..1160.72 rows=1 width=4) (actual time=1139.909..1139.911 rows=0 loops=1)
        ->  Append  (cost=0.57..372633.93 rows=8351 width=4) (actual time=226.936..1139.908 rows=1 loops=1)
              ->  Index Scan using idx1 on part1 (cost=0.57..312929.10 rows=5773 width=4) (actual time=226.934..1091.408 rows=1 loops=1)
                    Index Cond: cond1
                    Filter: cond2
                    Rows Removed by Filter: 731442
              ->  Index Scan using idx2 on part2  (cost=0.56..59663.08 rows=2578 width=4) (actual time=48.492..48.492 rows=0 loops=1)
                    Index Cond: cond1
                    Filter: cond2
                    Rows Removed by Filter: 34840
Planning Time: 4.519 ms
Execution Time: 1140.173 ms

> EXPLAIN ANALYZE SELECT COUNT(*) FROM (SELECT ... ORDER BY ? LIMIT 1 OFFSET 25) subquery_for_count
QUERY PLAN
Aggregate  (cost=1161.13..1161.14 rows=1 width=8) (actual time=981.271..981.273 rows=1 loops=1)
  ->  Limit  (cost=1116.51..1161.12 rows=1 width=28) (actual time=981.266..981.267 rows=0 loops=1)
        ->  Append  (cost=1.13..372669.54 rows=8353 width=28) (actual time=796.535..981.265 rows=1 loops=1)
              ->  Index Scan Backward using idx2 on part2  (cost=0.56..59698.68 rows=2580 width=28) (actual time=47.891..47.891 rows=0 loops=1)
                    Index Cond: cond1
                    Filter: cond2
                    Rows Removed by Filter: 34840
              ->  Index Scan Backward using idx1 on part1 (cost=0.57..312929.10 rows=5773 width=28) (actual time=748.640..933.368 rows=1 loops=1)
                    Index Cond: cond1
                    Filter: cond2
                    Rows Removed by Filter: 731442
Planning Time: 4.572 ms
Execution Time: 981.537 ms

The text was updated successfully, but these errors were encountered:

rmosolgo · 2024-04-11T20:08:35Z

Hey, thanks for the suggestion. It looks like we have a helper that unscopes it:

graphql-ruby/lib/graphql/pagination/active_record_relation_connection.rb

Lines 10 to 25 in 8eb9833

    
           def relation_count(relation) 
        
             int_or_hash = if already_loaded?(relation) 
        
               relation.size 
        
             elsif relation.respond_to?(:unscope) 
        
               relation.unscope(:order).count(:all) 
        
             else 
        
               # Rails 3 
        
               relation.count 
        
             end 
        
             if int_or_hash.is_a?(Integer) 
        
               int_or_hash 
        
             else 
        
               # Grouped relations return count-by-group hashes 
        
               int_or_hash.length 
        
             end 
        
           end

But it wasn't being used in has_next_page:

graphql-ruby/lib/graphql/pagination/relation_connection.rb

Line 35 in 8eb9833

if @nodes && @nodes.count < first

I'm opening a PR to use that helper there: #4912.

mtollie · 2024-04-11T20:23:41Z

Hi @rmosolgo good catch I didn't notice that! But my request is the opposite actually: I want it to be ordered (scoped) all the time, even in relation_count.

mtollie · 2024-04-11T20:25:25Z

I updated the title to remove the confusing double negative.

rmosolgo · 2024-04-12T15:48:15Z

Oh -- sorry I misunderstood! One way to accomplish this would be to make a subclass of ActiveRecordRelationConnection and implement relation_count not to unscope the relation, for example:

class OrderedCountActiveRecordRelationConnection < GraphQL::Pagination::ActiveRecordRelationConnection 
  def relation_count(relation)
    # GraphQL-Ruby unscopes relations by default, but we don't always want that: 
    relation.count 
  end 
end

Then, in your resolvers, manually apply that wrapper whenever you want that behavior:

field :things, Thing.connection_type 

def things 
  # don't unscope here because the count actually makes the query faster:
  OrderedCountActiveRecordRelationConnection.new(Thing.all)
end

It's the same idea as "custom connections" (https://graphql-ruby.org/pagination/custom_connections.html#using-a-custom-connection) but in this case, it's just a tweak of built-in behavior. Want to give that approach a try?

mtollie · 2024-04-12T19:06:27Z

I think that suggestion will work great and avoid the game of query regression whac-a-mole these sorts of changes always seem to introduce when done on a generic level.

In out context, this is really an edge case where the fix probably is better done on the database level and a custom connection strategy closer to #4908 allows us to largely eliminate all the count calls anyway. But I thought I would open the issue to see if anyone else was having a similar issue. 😄 Feel free to close for now if you'd like!

rmosolgo · 2024-04-12T19:56:00Z

Sounds good -- I'll check out the other PR soon!

rmosolgo mentioned this issue Apr 11, 2024

Use relation_count to avoid ordered count in has_next_page #4912

Closed

mtollie changed the title ~~Add option to disable relation unordering before counting~~ Add option to use ordered relation for count Apr 11, 2024

rmosolgo closed this as completed Apr 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option to use ordered relation for count #4911

Add option to use ordered relation for count #4911

mtollie commented Apr 11, 2024

rmosolgo commented Apr 11, 2024

mtollie commented Apr 11, 2024

mtollie commented Apr 11, 2024

rmosolgo commented Apr 12, 2024

mtollie commented Apr 12, 2024

rmosolgo commented Apr 12, 2024

Add option to use ordered relation for count #4911

Add option to use ordered relation for count #4911

Comments

mtollie commented Apr 11, 2024

rmosolgo commented Apr 11, 2024

mtollie commented Apr 11, 2024

mtollie commented Apr 11, 2024

rmosolgo commented Apr 12, 2024

mtollie commented Apr 12, 2024

rmosolgo commented Apr 12, 2024