Tweak list of optimization rules #3841

Dandandan · 2022-10-15T14:53:09Z

Which issue does this PR close?

Closes #.

Rationale for this change

I ran all the optimization rules twice and looked at the change in output in the TPC-H regression tests.
Next, I removed all the rules that didn't have any effect, and tried to remove the earlier passes.

What changes are included in this PR?

Are there any user-facing changes?

…n_rules

Dandandan · 2022-10-15T15:21:42Z

datafusion/optimizer/src/optimizer.rs

@@ -163,6 +162,14 @@ impl Optimizer {
        rules.push(Arc::new(FilterPushDown::new()));
        rules.push(Arc::new(LimitPushDown::new()));
        rules.push(Arc::new(SingleDistinctToGroupBy::new()));
+        rules.push(Arc::new(ProjectionPushDown::new()));


This one can simply be run after the other optimizations.

alamb

The new plans look a lot nicer to me. The only thing I start to worry about is how long our planning will take -- Perhaps it would be good to add some logging to improve the results.

Maybe related to #1160

alamb · 2022-10-17T13:36:16Z

benchmarks/expected-plans/q21.txt

@@ -7,12 +7,12 @@ Sort: numwait DESC NULLS FIRST, supplier.s_name ASC NULLS LAST
            Inner Join: l1.l_orderkey = orders.o_orderkey
              Inner Join: supplier.s_suppkey = l1.l_suppkey
                TableScan: supplier projection=[s_suppkey, s_name, s_nationkey]
-                Filter: l1.l_receiptdate > l1.l_commitdate AND l1.l_receiptdate > l1.l_commitdate
+                Filter: l1.l_receiptdate > l1.l_commitdate


Dandandan · 2022-10-17T13:50:33Z

The new plans look a lot nicer to me. The only thing I start to worry about is how long our planning will take -- Perhaps it would be good to add some logging to improve the results.

Maybe related to #1160

Yes, good idea. Also some optimization rules might be improved to e.g. not create filters that already exist etc. But in the end we would also need to keep the passes as simple as possible - a hard balancing act :)

ursabot · 2022-10-17T14:01:42Z

Benchmark runs are scheduled for baseline = 743ff28 and contender = 431a412. 431a412 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

andygrove · 2022-10-17T19:59:09Z

I've just started reviewing this PR and noticed one difference in one of the queries I tested with that might not be optimal.

before

Table scan filter: CAST(col1 AS Int64) BETWEEN Int64(4) AND Int64(4) + Int64(3)

after

Table scan filter: CAST(col1 AS Int64) >= Int64(4), CAST(col1 AS Int64) <= Int64(7)

It looks like we are now performing the CAST twice.

andygrove · 2022-10-17T20:07:43Z

Actually, I am testing against the latest DF that includes this PR but also has other changes ... so this change could possibly be unrelated to this PR. I will post an update here soon.

andygrove · 2022-10-17T20:17:27Z

Filed #3863 for this issue. It is not related to this PR after all.

Tweak list of optimization rules

cf86e12

github-actions bot added the optimizer Optimizer rules label Oct 15, 2022

Dandandan added 5 commits October 15, 2022 16:54

Tweak list of optimization rules

b64c6ee

Merge remote-tracking branch 'upstream/master' into tweak_optimizatio…

b0abe56

…n_rules

Merge

a3c7061

Fmt

71c21f2

Just move projection push down

cecca34

Dandandan commented Oct 15, 2022

View reviewed changes

Dandandan added 2 commits October 15, 2022 17:23

Just move projection push down

0170654

Fix test

48336d5

github-actions bot added the core Core datafusion crate label Oct 15, 2022

Dandandan marked this pull request as ready for review October 15, 2022 15:59

Dandandan requested review from alamb and andygrove October 15, 2022 18:21

alamb approved these changes Oct 17, 2022

View reviewed changes

Dandandan merged commit 431a412 into apache:master Oct 17, 2022

alamb mentioned this pull request Oct 18, 2022

Replace Filter: Boolean(false) with EmptyRelation #3864

Closed

waynexia mentioned this pull request Oct 21, 2022

FilterPushdown will generate duplicate exprs #3914

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tweak list of optimization rules #3841

Tweak list of optimization rules #3841

Dandandan commented Oct 15, 2022 •

edited

Dandandan Oct 15, 2022

alamb left a comment

alamb Oct 17, 2022

Dandandan commented Oct 17, 2022

ursabot commented Oct 17, 2022

andygrove commented Oct 17, 2022

andygrove commented Oct 17, 2022

andygrove commented Oct 17, 2022

Tweak list of optimization rules #3841

Tweak list of optimization rules #3841

Conversation

Dandandan commented Oct 15, 2022 • edited

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

Dandandan Oct 15, 2022

Choose a reason for hiding this comment

alamb left a comment

Choose a reason for hiding this comment

alamb Oct 17, 2022

Choose a reason for hiding this comment

Dandandan commented Oct 17, 2022

ursabot commented Oct 17, 2022

andygrove commented Oct 17, 2022

before

after

andygrove commented Oct 17, 2022

andygrove commented Oct 17, 2022

Dandandan commented Oct 15, 2022 •

edited