New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix aggregate type coercion bug #3710
Changes from 2 commits
af3322a
b486da7
e71aeed
edc2561
a9d6f8c
e3830a7
f6e8ffa
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -178,16 +178,15 @@ impl Optimizer { | |
F: FnMut(&LogicalPlan, &dyn OptimizerRule), | ||
{ | ||
let mut new_plan = plan.clone(); | ||
debug!("Input logical plan:\n{}\n", plan.display_indent()); | ||
trace!("Full input logical plan:\n{:?}", plan); | ||
log_plan("Optimizer input", plan); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a driveby cleanup to improve logging (specifically, also add |
||
|
||
for rule in &self.rules { | ||
let result = rule.optimize(&new_plan, optimizer_config); | ||
match result { | ||
Ok(plan) => { | ||
new_plan = plan; | ||
observer(&new_plan, rule.as_ref()); | ||
debug!("After apply {} rule:\n", rule.name()); | ||
debug!("Optimized logical plan:\n{}\n", new_plan.display_indent()); | ||
log_plan(rule.name(), &new_plan); | ||
} | ||
Err(ref e) => { | ||
if optimizer_config.skip_failing_rules { | ||
|
@@ -209,12 +208,17 @@ impl Optimizer { | |
} | ||
} | ||
} | ||
debug!("Optimized logical plan:\n{}\n", new_plan.display_indent()); | ||
trace!("Full Optimized logical plan:\n {:?}", new_plan); | ||
log_plan("Optimized plan", &new_plan); | ||
Ok(new_plan) | ||
} | ||
} | ||
|
||
/// Log the plan in debug/tracing mode after some part of the optimizer runs | ||
fn log_plan(description: &str, plan: &LogicalPlan) { | ||
debug!("{description}:\n{}\n", plan.display_indent()); | ||
trace!("{description}::\n{}\n", plan.display_indent_schema()); | ||
} | ||
|
||
#[cfg(test)] | ||
mod tests { | ||
use crate::optimizer::Optimizer; | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -97,7 +97,18 @@ fn optimize(plan: &LogicalPlan) -> Result<LogicalPlan> { | |
let new_exprs = plan | ||
.expressions() | ||
.into_iter() | ||
.map(|expr| expr.rewrite(&mut expr_rewriter)) | ||
.map(|expr| { | ||
let original_name = expr.name()?; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the root cause issue is that the Bigger picture there are at least three places that we have rediscovered this same problem when rewriting expressions --#3555 and https://github.com/apache/arrow-datafusion/blob/master/datafusion/optimizer/src/simplify_expressions.rs#L316 I will try and make a follow on PR to clean them all up. In particular, I think this is something There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this optimizer rule may actually change the name (e.g. from |
||
let expr = expr.rewrite(&mut expr_rewriter)?; | ||
|
||
// Ensure this rewrite doesn't change the name | ||
// https://github.com/apache/arrow-datafusion/issues/3704 | ||
if expr.name()? != original_name { | ||
Ok(expr.alias(&original_name)) | ||
} else { | ||
Ok(expr) | ||
} | ||
}) | ||
.collect::<Result<Vec<_>>>()?; | ||
|
||
from_plan(plan, new_exprs.as_slice(), new_inputs.as_slice()) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -29,6 +29,12 @@ use std::any::Any; | |
use std::collections::HashMap; | ||
use std::sync::Arc; | ||
|
||
#[cfg(test)] | ||
#[ctor::ctor] | ||
fn init() { | ||
let _ = env_logger::try_init(); | ||
} | ||
|
||
#[test] | ||
fn case_when() -> Result<()> { | ||
let sql = "SELECT CASE WHEN col_int32 > 0 THEN 1 ELSE 0 END FROM test"; | ||
|
@@ -45,6 +51,17 @@ fn case_when() -> Result<()> { | |
Ok(()) | ||
} | ||
|
||
#[test] | ||
fn case_when_aggregate() -> Result<()> { | ||
let sql = "SELECT col_utf8, SUM(CASE WHEN col_int32 > 0 THEN 1 ELSE 0 END) AS n FROM test GROUP BY col_utf8"; | ||
let plan = test_sql(sql)?; | ||
let expected = "Projection: #test.col_utf8, #SUM(CASE WHEN test.col_int32 > Int64(0) THEN Int64(1) ELSE Int64(0) END) AS n\ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you merge latest from master - we should not include There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will do There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. in e3830a7 |
||
\n Aggregate: groupBy=[[#test.col_utf8]], aggr=[[SUM(CASE WHEN #test.col_int32 > Int32(0) THEN Int64(1) ELSE Int64(0) END) AS SUM(CASE WHEN test.col_int32 > Int64(0) THEN Int64(1) ELSE Int64(0) END)]]\ | ||
\n TableScan: test projection=[col_int32, col_utf8]"; | ||
assert_eq!(expected, format!("{:?}", plan)); | ||
Ok(()) | ||
} | ||
|
||
#[test] | ||
fn unsigned_target_type() -> Result<()> { | ||
let sql = "SELECT * FROM test WHERE col_uint32 > 0"; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be honest, I am not sure why this has changed (aka the filters are no longer simplified). I will look into that in the morning