Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SQL planner support for Like, ILike and SimilarTo, with optional escape character #3101

Merged
merged 12 commits into from
Sep 9, 2022
16 changes: 15 additions & 1 deletion datafusion/core/src/physical_plan/planner.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1682,6 +1682,8 @@ mod tests {
use datafusion_expr::expr::GroupingSet;
use datafusion_expr::sum;
use datafusion_expr::{col, lit};
use datafusion_optimizer::type_coercion::TypeCoercion;
use datafusion_optimizer::{OptimizerConfig, OptimizerRule};
use fmt::Debug;
use std::collections::HashMap;
use std::convert::TryFrom;
Expand Down Expand Up @@ -1859,7 +1861,19 @@ mod tests {
col("c1").like(col("c2")),
];
for case in cases {
let logical_plan = test_csv_scan().await?.project(vec![case.clone()]);
let logical_plan = test_csv_scan()
.await?
.project(vec![case.clone()])
.and_then(|b| b.build())
.and_then(|plan| {
// this test was expecting type coercion/validation errors before the optimizer
// had run due to the legacy approach of type coercion but now we need to run the
// optimizer here
let type_coercion = TypeCoercion::default();
let mut config = OptimizerConfig::new();
type_coercion.optimize(&plan, &mut config)
});

let message = format!(
"Expression {:?} expected to error due to impossible coercion",
case
Expand Down
33 changes: 25 additions & 8 deletions datafusion/optimizer/src/type_coercion.rs
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,8 @@
//! Optimizer rule for type validation and coercion

use crate::{OptimizerConfig, OptimizerRule};
use datafusion_common::{DFSchema, DFSchemaRef, Result};
use arrow::datatypes::DataType;
use datafusion_common::{DFSchema, DFSchemaRef, DataFusionError, Result};
use datafusion_expr::binary_rule::coerce_types;
use datafusion_expr::expr_rewriter::{ExprRewritable, ExprRewriter, RewriteRecursion};
use datafusion_expr::logical_plan::builder::build_join_schema;
Expand Down Expand Up @@ -86,29 +87,45 @@ impl ExprRewriter for TypeCoercionRewriter {
}

fn mutate(&mut self, expr: Expr) -> Result<Expr> {
match expr {
match &expr {
Expr::BinaryExpr { left, op, right } => {
let left_type = left.get_type(&self.schema)?;
let right_type = right.get_type(&self.schema)?;
let coerced_type = coerce_types(&left_type, &op, &right_type)?;
let coerced_type = coerce_types(&left_type, op, &right_type)?;
Ok(Expr::BinaryExpr {
left: Box::new(left.cast_to(&coerced_type, &self.schema)?),
op,
right: Box::new(right.cast_to(&coerced_type, &self.schema)?),
left: Box::new(
left.as_ref().clone().cast_to(&coerced_type, &self.schema)?,
),
op: *op,
right: Box::new(
right
.as_ref()
.clone()
.cast_to(&coerced_type, &self.schema)?,
),
})
}
Expr::Like { pattern, .. }
| Expr::ILike { pattern, .. }
| Expr::SimilarTo { pattern, .. } => match pattern.get_type(&self.schema)? {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like the wrong place to be type checking the argument to Expr::Like, etc. as the argument types to other exprs are checked so why would we treat Expr::Like differently?

I would expect that to be done in the SQL planner perhaps? or in the physical_expr conversion?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't check the type of the pattern expression until we have a schema to resolve against (it could be a reference to a column or any other type of expression).

My goal was to do the validation in the logical plan for the benefit of other engines building on DataFusion that don't use the physical plan.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind, we do have the schema in SQL planning ... I will make that change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb Ok, this PR just got a whole lot smaller. Maybe this really shouldn't have taken me 30 days to do 🤣

DataType::Utf8 => Ok(expr.clone()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about LargeUtf8?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for the regular expression pattern, which is typically a literal string but could, in theory, be a column reference. I'm not sure if it would make sense to support LargeUtf8 here?

other => Err(DataFusionError::Plan(format!(
"Expected pattern in Like, ILike, or SimilarTo to be Utf8 but was {}",
other
))),
},
Expr::ScalarUDF { fun, args } => {
let new_expr = coerce_arguments_for_signature(
args.as_slice(),
&self.schema,
&fun.signature,
)?;
Ok(Expr::ScalarUDF {
fun,
fun: fun.clone(),
args: new_expr,
})
}
expr => Ok(expr),
expr => Ok(expr.clone()),
}
}
}
Expand Down
20 changes: 20 additions & 0 deletions datafusion/physical-expr/src/planner.rs
Original file line number Diff line number Diff line change
Expand Up @@ -203,6 +203,26 @@ pub fn create_physical_expr(
}
}
}
Expr::Like {
negated,
expr,
pattern,
escape_char,
} => {
if escape_char.is_some() {
return Err(DataFusionError::Execution(
"LIKE does not support escape_char".to_string(),
));
}
let op = if *negated {
Operator::NotLike
} else {
Operator::Like
};
let bin_expr =
binary_expr(expr.as_ref().clone(), op, pattern.as_ref().clone());
create_physical_expr(&bin_expr, input_dfschema, input_schema, execution_props)
}
Expr::Case {
expr,
when_then_expr,
Expand Down
54 changes: 54 additions & 0 deletions datafusion/proto/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -929,6 +929,60 @@ mod roundtrip_tests {
roundtrip_expr_test(test_expr, ctx);
}

#[test]
fn roundtrip_like() {
fn like(negated: bool, escape_char: Option<char>) {
let test_expr = Expr::Like {
negated,
expr: Box::new(col("col")),
pattern: Box::new(lit("[0-9]+")),
escape_char,
};
let ctx = SessionContext::new();
roundtrip_expr_test(test_expr, ctx);
}
like(true, Some('X'));
like(false, Some('\\'));
like(true, None);
like(false, None);
}

#[test]
fn roundtrip_ilike() {
fn ilike(negated: bool, escape_char: Option<char>) {
let test_expr = Expr::ILike {
negated,
expr: Box::new(col("col")),
pattern: Box::new(lit("[0-9]+")),
escape_char,
};
let ctx = SessionContext::new();
roundtrip_expr_test(test_expr, ctx);
}
ilike(true, Some('X'));
ilike(false, Some('\\'));
ilike(true, None);
ilike(false, None);
}

#[test]
fn roundtrip_similar_to() {
fn similar_to(negated: bool, escape_char: Option<char>) {
let test_expr = Expr::SimilarTo {
negated,
expr: Box::new(col("col")),
pattern: Box::new(lit("[0-9]+")),
escape_char,
};
let ctx = SessionContext::new();
roundtrip_expr_test(test_expr, ctx);
}
similar_to(true, Some('X'));
similar_to(false, Some('\\'));
similar_to(true, None);
similar_to(false, None);
}

#[test]
fn roundtrip_count() {
let test_expr = Expr::AggregateFunction {
Expand Down
41 changes: 21 additions & 20 deletions datafusion/sql/src/planner.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1939,30 +1939,31 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
}

SQLExpr::Like { negated, expr, pattern, escape_char } => {
match escape_char {
Some(_) => {
// to support this we will need to introduce `Expr::Like` instead
// of treating it like a binary expression
Err(DataFusionError::NotImplemented("LIKE with ESCAPE is not yet supported".to_string()))
},
_ => {
Ok(Expr::BinaryExpr {
left: Box::new(self.sql_expr_to_logical_expr(*expr, schema, ctes)?),
op: if negated { Operator::NotLike } else { Operator::Like },
right: Box::new(self.sql_expr_to_logical_expr(*pattern, schema, ctes)?),
})
}
}
Ok(Expr::Like {
negated,
expr: Box::new(self.sql_expr_to_logical_expr(*expr, schema, ctes)?),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

pattern: Box::new(self.sql_expr_to_logical_expr(*pattern, schema, ctes)?),
escape_char

})
}

SQLExpr::ILike { .. } => {
// https://github.com/apache/arrow-datafusion/issues/3099
Err(DataFusionError::NotImplemented("ILIKE is not yet supported".to_string()))
SQLExpr::ILike { negated, expr, pattern, escape_char } => {
Ok(Expr::ILike {
negated,
expr: Box::new(self.sql_expr_to_logical_expr(*expr, schema, ctes)?),
pattern: Box::new(self.sql_expr_to_logical_expr(*pattern, schema, ctes)?),
escape_char
})
}

SQLExpr::SimilarTo { .. } => {
// https://github.com/apache/arrow-datafusion/issues/3099
Err(DataFusionError::NotImplemented("SIMILAR TO is not yet supported".to_string()))
SQLExpr::SimilarTo { negated, expr, pattern, escape_char } => {
Ok(Expr::SimilarTo {
negated,
expr: Box::new(self.sql_expr_to_logical_expr(*expr, schema, ctes)?),
pattern: Box::new(self.sql_expr_to_logical_expr(*pattern, schema, ctes)?),
escape_char
})
}

SQLExpr::BinaryOp {
Expand Down