Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internal error when parquet predicate pushdown is enabled "Error evaluating filter predicate:" #4006

Closed
Tracked by #3463 ...
alamb opened this issue Oct 28, 2022 · 1 comment · Fixed by #4021
Closed
Tracked by #3463 ...
Labels
bug Something isn't working

Comments

@alamb
Copy link
Contributor

alamb commented Oct 28, 2022

Describe the bug
DataFusion generates an error for some predicates when predicate pushdown is enabled

NOTE that pushdown filtering is not enabled by default (as we are still working on it) so this issue will not likely affect users:

To Reproduce

  1. Download data from
    repro.zip
  2. Run datafusion CLI

The query run is

select count(*) from foo where request_duration_ns > 791684060 OR client_addr NOT in ('213.120.214.213');

Expected behavior
Same answer should be produced with and without row filtering enabled. However, with row filtering an error is produced

datafusion-cli -f script.sql 
+-----------------+
| COUNT(UInt8(1)) |
+-----------------+
| 53819           |
+-----------------+
1 row in set. Query took 0.006 seconds.

With it enabled:

DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true datafusion-cli -f script.sql 
...
1 row in set. Query took 0.002 seconds.
ArrowError(ExternalError(Execution("Arrow error: External error: Arrow: underlying Arrow error: Compute error: Error evaluating filter predicate: Internal(\"Cannot evaluate binary expression Gt with types Utf8 and Int32\")")))

Additional context
Found by the test here #3976

@tustvold
Copy link
Contributor

It looks like this has the same underlying cause as #4005 (comment)

Reordering the predicates works

❯ select count(*) from foo where client_addr NOT in ('213.120.214.213') OR request_duration_ns > 791684060;
+-----------------+
| COUNT(UInt8(1)) |
+-----------------+
| 53819           |
+-----------------+
1 row in set. Query took 0.247 seconds.

tustvold added a commit to tustvold/arrow-datafusion that referenced this issue Oct 29, 2022
tustvold added a commit that referenced this issue Oct 30, 2022
…edicate (#4005) (#4006) (#4021)

* Project columns within DatafusionArrowPredicate (#4005) (#4006)

* Add test

* Format

* Fix merge blunder

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
jimexist pushed a commit to jimexist/arrow-datafusion that referenced this issue Oct 31, 2022
…edicate (apache#4005) (apache#4006) (apache#4021)

* Project columns within DatafusionArrowPredicate (apache#4005) (apache#4006)

* Add test

* Format

* Fix merge blunder

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Dandandan pushed a commit to yuuch/arrow-datafusion that referenced this issue Nov 5, 2022
…edicate (apache#4005) (apache#4006) (apache#4021)

* Project columns within DatafusionArrowPredicate (apache#4005) (apache#4006)

* Add test

* Format

* Fix merge blunder

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants