-
Notifications
You must be signed in to change notification settings - Fork 977
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SQL planner support for Like
, ILike
and SimilarTo
, with optional escape character
#3101
Changes from 9 commits
64e8559
d1d6655
bd14a93
41f7db1
718c304
64dbd31
4882a98
df73d16
63bf702
3a86db8
8ea73b1
a18dec8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -18,7 +18,8 @@ | |
//! Optimizer rule for type validation and coercion | ||
|
||
use crate::{OptimizerConfig, OptimizerRule}; | ||
use datafusion_common::{DFSchema, DFSchemaRef, Result}; | ||
use arrow::datatypes::DataType; | ||
use datafusion_common::{DFSchema, DFSchemaRef, DataFusionError, Result}; | ||
use datafusion_expr::binary_rule::coerce_types; | ||
use datafusion_expr::expr_rewriter::{ExprRewritable, ExprRewriter, RewriteRecursion}; | ||
use datafusion_expr::logical_plan::builder::build_join_schema; | ||
|
@@ -86,29 +87,45 @@ impl ExprRewriter for TypeCoercionRewriter { | |
} | ||
|
||
fn mutate(&mut self, expr: Expr) -> Result<Expr> { | ||
match expr { | ||
match &expr { | ||
Expr::BinaryExpr { left, op, right } => { | ||
let left_type = left.get_type(&self.schema)?; | ||
let right_type = right.get_type(&self.schema)?; | ||
let coerced_type = coerce_types(&left_type, &op, &right_type)?; | ||
let coerced_type = coerce_types(&left_type, op, &right_type)?; | ||
Ok(Expr::BinaryExpr { | ||
left: Box::new(left.cast_to(&coerced_type, &self.schema)?), | ||
op, | ||
right: Box::new(right.cast_to(&coerced_type, &self.schema)?), | ||
left: Box::new( | ||
left.as_ref().clone().cast_to(&coerced_type, &self.schema)?, | ||
), | ||
op: *op, | ||
right: Box::new( | ||
right | ||
.as_ref() | ||
.clone() | ||
.cast_to(&coerced_type, &self.schema)?, | ||
), | ||
}) | ||
} | ||
Expr::Like { pattern, .. } | ||
| Expr::ILike { pattern, .. } | ||
| Expr::SimilarTo { pattern, .. } => match pattern.get_type(&self.schema)? { | ||
DataType::Utf8 => Ok(expr.clone()), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What about There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is for the regular expression pattern, which is typically a literal string but could, in theory, be a column reference. I'm not sure if it would make sense to support LargeUtf8 here? |
||
other => Err(DataFusionError::Plan(format!( | ||
"Expected pattern in Like, ILike, or SimilarTo to be Utf8 but was {}", | ||
other | ||
))), | ||
}, | ||
Expr::ScalarUDF { fun, args } => { | ||
let new_expr = coerce_arguments_for_signature( | ||
args.as_slice(), | ||
&self.schema, | ||
&fun.signature, | ||
)?; | ||
Ok(Expr::ScalarUDF { | ||
fun, | ||
fun: fun.clone(), | ||
args: new_expr, | ||
}) | ||
} | ||
expr => Ok(expr), | ||
expr => Ok(expr.clone()), | ||
} | ||
} | ||
} | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1939,30 +1939,31 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { | |
} | ||
|
||
SQLExpr::Like { negated, expr, pattern, escape_char } => { | ||
match escape_char { | ||
Some(_) => { | ||
// to support this we will need to introduce `Expr::Like` instead | ||
// of treating it like a binary expression | ||
Err(DataFusionError::NotImplemented("LIKE with ESCAPE is not yet supported".to_string())) | ||
}, | ||
_ => { | ||
Ok(Expr::BinaryExpr { | ||
left: Box::new(self.sql_expr_to_logical_expr(*expr, schema, ctes)?), | ||
op: if negated { Operator::NotLike } else { Operator::Like }, | ||
right: Box::new(self.sql_expr_to_logical_expr(*pattern, schema, ctes)?), | ||
}) | ||
} | ||
} | ||
Ok(Expr::Like { | ||
negated, | ||
expr: Box::new(self.sql_expr_to_logical_expr(*expr, schema, ctes)?), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ❤️ |
||
pattern: Box::new(self.sql_expr_to_logical_expr(*pattern, schema, ctes)?), | ||
escape_char | ||
|
||
}) | ||
} | ||
|
||
SQLExpr::ILike { .. } => { | ||
// https://github.com/apache/arrow-datafusion/issues/3099 | ||
Err(DataFusionError::NotImplemented("ILIKE is not yet supported".to_string())) | ||
SQLExpr::ILike { negated, expr, pattern, escape_char } => { | ||
Ok(Expr::ILike { | ||
negated, | ||
expr: Box::new(self.sql_expr_to_logical_expr(*expr, schema, ctes)?), | ||
pattern: Box::new(self.sql_expr_to_logical_expr(*pattern, schema, ctes)?), | ||
escape_char | ||
}) | ||
} | ||
|
||
SQLExpr::SimilarTo { .. } => { | ||
// https://github.com/apache/arrow-datafusion/issues/3099 | ||
Err(DataFusionError::NotImplemented("SIMILAR TO is not yet supported".to_string())) | ||
SQLExpr::SimilarTo { negated, expr, pattern, escape_char } => { | ||
Ok(Expr::SimilarTo { | ||
negated, | ||
expr: Box::new(self.sql_expr_to_logical_expr(*expr, schema, ctes)?), | ||
pattern: Box::new(self.sql_expr_to_logical_expr(*pattern, schema, ctes)?), | ||
escape_char | ||
}) | ||
} | ||
|
||
SQLExpr::BinaryOp { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like the wrong place to be type checking the argument to
Expr::Like
, etc. as the argument types to other exprs are checked so why would we treatExpr::Like
differently?I would expect that to be done in the SQL planner perhaps? or in the physical_expr conversion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't check the type of the
pattern
expression until we have a schema to resolve against (it could be a reference to a column or any other type of expression).My goal was to do the validation in the logical plan for the benefit of other engines building on DataFusion that don't use the physical plan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never mind, we do have the schema in SQL planning ... I will make that change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb Ok, this PR just got a whole lot smaller. Maybe this really shouldn't have taken me 30 days to do 🤣