Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to use ordered relation for count #4911

Closed
mtollie opened this issue Apr 11, 2024 · 6 comments
Closed

Add option to use ordered relation for count #4911

mtollie opened this issue Apr 11, 2024 · 6 comments

Comments

@mtollie
Copy link

mtollie commented Apr 11, 2024

Is your feature request related to a problem? Please describe.

While paginating through results from an expensive PostgreSQL condition on a single partitioned table with many rows and questionably accurate table statistics, the ungrouped relation count query causes a suboptimal query to be chosen while evaluating hasNextPage.

Describe the solution you'd like

The ability to disable unscope(:order) for certain relations.

Describe alternatives you've considered

We have better table statistics and pg_hint_plan to use known better indexes.

#1911 introduced this optimization in the context of MySQL (and suggested this customizability too!). In any case, it intuitively makes sense that a semantically equivalent but more relaxed query would be more efficient. But for PostgreSQL in particular there's some reference to this situation

The query optimizer takes LIMIT into account when generating query plans, so you are very likely to get different plans (yielding different row orders) [...and...] inconsistent results unless you enforce a predictable result ordering with ORDER BY.

Additional context

This is a lightly anonymized query plan showing that the ORDER BY leads to a more efficient query in this case.

> EXPLAIN ANALYZE SELECT COUNT(*) FROM (SELECT ... LIMIT 1 OFFSET 25) subquery_for_count
QUERY PLAN
Aggregate  (cost=1160.74..1160.75 rows=1 width=8) (actual time=1139.914..1139.916 rows=1 loops=1)
  ->  Limit  (cost=1116.10..1160.72 rows=1 width=4) (actual time=1139.909..1139.911 rows=0 loops=1)
        ->  Append  (cost=0.57..372633.93 rows=8351 width=4) (actual time=226.936..1139.908 rows=1 loops=1)
              ->  Index Scan using idx1 on part1 (cost=0.57..312929.10 rows=5773 width=4) (actual time=226.934..1091.408 rows=1 loops=1)
                    Index Cond: cond1
                    Filter: cond2
                    Rows Removed by Filter: 731442
              ->  Index Scan using idx2 on part2  (cost=0.56..59663.08 rows=2578 width=4) (actual time=48.492..48.492 rows=0 loops=1)
                    Index Cond: cond1
                    Filter: cond2
                    Rows Removed by Filter: 34840
Planning Time: 4.519 ms
Execution Time: 1140.173 ms

> EXPLAIN ANALYZE SELECT COUNT(*) FROM (SELECT ... ORDER BY ? LIMIT 1 OFFSET 25) subquery_for_count
QUERY PLAN
Aggregate  (cost=1161.13..1161.14 rows=1 width=8) (actual time=981.271..981.273 rows=1 loops=1)
  ->  Limit  (cost=1116.51..1161.12 rows=1 width=28) (actual time=981.266..981.267 rows=0 loops=1)
        ->  Append  (cost=1.13..372669.54 rows=8353 width=28) (actual time=796.535..981.265 rows=1 loops=1)
              ->  Index Scan Backward using idx2 on part2  (cost=0.56..59698.68 rows=2580 width=28) (actual time=47.891..47.891 rows=0 loops=1)
                    Index Cond: cond1
                    Filter: cond2
                    Rows Removed by Filter: 34840
              ->  Index Scan Backward using idx1 on part1 (cost=0.57..312929.10 rows=5773 width=28) (actual time=748.640..933.368 rows=1 loops=1)
                    Index Cond: cond1
                    Filter: cond2
                    Rows Removed by Filter: 731442
Planning Time: 4.572 ms
Execution Time: 981.537 ms
@rmosolgo
Copy link
Owner

Hey, thanks for the suggestion. It looks like we have a helper that unscopes it:

def relation_count(relation)
int_or_hash = if already_loaded?(relation)
relation.size
elsif relation.respond_to?(:unscope)
relation.unscope(:order).count(:all)
else
# Rails 3
relation.count
end
if int_or_hash.is_a?(Integer)
int_or_hash
else
# Grouped relations return count-by-group hashes
int_or_hash.length
end
end

But it wasn't being used in has_next_page:

if @nodes && @nodes.count < first

I'm opening a PR to use that helper there: #4912.

@mtollie
Copy link
Author

mtollie commented Apr 11, 2024

Hi @rmosolgo good catch I didn't notice that! But my request is the opposite actually: I want it to be ordered (scoped) all the time, even in relation_count.

@mtollie mtollie changed the title Add option to disable relation unordering before counting Add option to use ordered relation for count Apr 11, 2024
@mtollie
Copy link
Author

mtollie commented Apr 11, 2024

I updated the title to remove the confusing double negative.

@rmosolgo
Copy link
Owner

Oh -- sorry I misunderstood! One way to accomplish this would be to make a subclass of ActiveRecordRelationConnection and implement relation_count not to unscope the relation, for example:

class OrderedCountActiveRecordRelationConnection < GraphQL::Pagination::ActiveRecordRelationConnection 
  def relation_count(relation)
    # GraphQL-Ruby unscopes relations by default, but we don't always want that: 
    relation.count 
  end 
end 

Then, in your resolvers, manually apply that wrapper whenever you want that behavior:

field :things, Thing.connection_type 

def things 
  # don't unscope here because the count actually makes the query faster:
  OrderedCountActiveRecordRelationConnection.new(Thing.all)
end 

It's the same idea as "custom connections" (https://graphql-ruby.org/pagination/custom_connections.html#using-a-custom-connection) but in this case, it's just a tweak of built-in behavior. Want to give that approach a try?

@mtollie
Copy link
Author

mtollie commented Apr 12, 2024

I think that suggestion will work great and avoid the game of query regression whac-a-mole these sorts of changes always seem to introduce when done on a generic level.

In out context, this is really an edge case where the fix probably is better done on the database level and a custom connection strategy closer to #4908 allows us to largely eliminate all the count calls anyway. But I thought I would open the issue to see if anyone else was having a similar issue. 😄 Feel free to close for now if you'd like!

@rmosolgo
Copy link
Owner

Sounds good -- I'll check out the other PR soon!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants