Support "User defined coercion" rules #10423

alamb · 2024-05-08T14:01:13Z

Is your feature request related to a problem or challenge?

DataFusion automatically "coerces" (see docs here) input argument types to match the types required of operations or functions.

For functions, this is described by a desired TypeSignature

pub enum TypeSignature {
    Variadic(Vec<DataType>),
    VariadicEqual,
    VariadicAny,
    Uniform(usize, Vec<DataType>),
    Exact(Vec<DataType>),
    Any(usize),
    OneOf(Vec<TypeSignature>),
    ArraySignature(ArrayFunctionSignature),
}

However, some functions have special hard coded coercion logic such as sum and count (TODO link) as well as some Array functions like make_array. We started down the path of encoding the special array semantics into TypeSignature (see ArrayFunctionSignature))

However, as we continue to find other examples of different desired rules (most recently in sum and count), TypeSignature will grow and become more and more specialized

Describe the solution you'd like

@jayzhan211 had a great suggestion #10268 (comment) that in addition to encoding common coercion behaviors in TypeSignature, we can also add a variant of TypeSignature that permits user defined coercion rules

Describe alternatives you've considered

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

jayzhan211 · 2024-05-09T00:17:45Z

I found that to have coerce_types for both scalarUDF and aggregateUDF, I would need to introduce a more general trait for both function

impl UDFImpl for T {
    fn name(&self) -> &str {
       &self.name
    }

    fn coerce_types(&self, data_types: &[DataType]) -> Result<Vec<DataType>> {
        not_impl_err!("Function {} does not implement coerce_types", self.name)
    }
}

impl ScalarUDFImpl: UDFImpl
impl AggregateUDFImpl: UDFImpl

Then we can have

fn coerce_arguments_for_signature(
    expressions: Vec<Expr>,
    schema: &DFSchema,
    signature: &Signature,
    func: Arc<dyn UDFImpl>,
) -> Result<Vec<Expr>> {}

Another alternative is having a duplicate function for scalar and aggregate for a related function

fn coerce_arguments_for_signature(
    expressions: Vec<Expr>,
    schema: &DFSchema,
    signature: &Signature,
    func: &ScalarUDF,
) -> Result<Vec<Expr>> {}

fn coerce_arguments_for_signature(
    expressions: Vec<Expr>,
    schema: &DFSchema,
    signature: &Signature,
    func: &AggregateUDF,
) -> Result<Vec<Expr>> {}

I think the first option is potentially beneficial in the long run(?) but the user now needs to define two traits. The second option only increases the maintenance cost.

What do you think about this @alamb
I also track if there is rust solution for this in https://users.rust-lang.org/t/inheritance-like-with-rust-trait/111102

alamb · 2024-05-09T10:40:47Z

It is a good observation that ScalarUDFImpl and AggregateUDFImpl don't share a common base trait and thus adding functionality that affects both requires duplication of code

I would need to introduce a more general trait for both function
Another alternative is having a duplicate function for scalar and aggregate for a related function

I agree with your analysis of the tradeoffs: a common base trait would result in less duplication in DataFusion

However, I personally prefer duplicating coerce_arguments_for_signature in each trait rather than introducing a common base trait because:

It is backwards compatible (not an API change for the existing library of functions)
Makes it slightly easier to implement ScalarUDF and AggregateUDF (especially when new to rust) -- rather than two impls for your function, you only need one

alamb added the enhancement New feature or request label May 8, 2024

alamb changed the title ~~Support "User defined coercsion" function~~ Support "User defined coercion" rules May 8, 2024

alamb mentioned this issue May 8, 2024

Fix Coalesce casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion #10268

Open

jayzhan211 self-assigned this May 8, 2024

jayzhan211 mentioned this issue May 10, 2024

Introduce user-defined signature #10439

Merged

jayzhan211 closed this as completed in #10439 May 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support "User defined coercion" rules #10423

Support "User defined coercion" rules #10423

alamb commented May 8, 2024

jayzhan211 commented May 9, 2024 •

edited

alamb commented May 9, 2024

Support "User defined coercion" rules #10423

Support "User defined coercion" rules #10423

Comments

alamb commented May 8, 2024

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

jayzhan211 commented May 9, 2024 • edited

alamb commented May 9, 2024

jayzhan211 commented May 9, 2024 •

edited