Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix hash join non compat issues #3168

Closed
wants to merge 2 commits into from

Conversation

comphead
Copy link
Contributor

Which issue does this PR close?

Closes #2877 .

Rationale for this change

Currently hash join fails if two arms has different but castable avlues.

What changes are included in this PR?

The entire idea is to try to cast automatically values if possible.

Are there any user-facing changes?

@github-actions github-actions bot added the core Core datafusion crate label Aug 15, 2022
@comphead
Copy link
Contributor Author

@alamb I'm trying to create a block that can coerce and cast left and right ArrayRefs returning casted ones. This can be useful in different places where 2 castable but not same datatypes involved. Example i32 and i8. Without cast the downcast_ref will eventually fail on unwrap. I tried to put cast logic to function, but there are many lifetimes limitations in Rust, appreciate if you can help

@Dandandan
Copy link
Contributor

I think a good approach might be to have the cast part of the logical planner, so the join expression becomes in the form col1::TEXT = col2 AND col2::. That shouldn't be really very join specific

I believe we should already have most of the logic there for casting to a common type for expressions.

@comphead
Copy link
Contributor Author

I think a good approach might be to have the cast part of the logical planner, so the join expression becomes in the form col1::TEXT = col2 AND col2::. That shouldn't be really very join specific

I believe we should already have most of the logic there for casting to a common type for expressions.

Yes, that is something we discussed with @alamb, that other sql engines have cast as part of planner if join arms are not the same datatype but still castable.

So what your idea @Dandandan, to try move cast into the planner?

@comphead
Copy link
Contributor Author

Oh I see, there is a fn binary_cast which already does the same but on physical expr level, let me try to reuse it for joins in planner

@alamb
Copy link
Contributor

alamb commented Aug 17, 2022

Thanks @Dandandan -- yes I think binary_rule (and the coercion logic in general) is probably what would work best here

@comphead
Copy link
Contributor Author

comphead commented Aug 20, 2022

@Dandandan @alamb is anywhere documentation on planner? I have implemented the cast in planner in LogicalPlan::Join branch, and now can see physical expression contains TryCast but in this case the join still fails with old error, looks like real cast is lazy and gets computed after hash_join computes its indexes (which is in ExecutionPlan phase I believe). Please ping me where its possible to find in what sequence plans are executed

@alamb
Copy link
Contributor

alamb commented Aug 21, 2022

@Dandandan @alamb is anywhere documentation on planner?

I don't know of any real documentation on the planner

Note there is a discussion about the current location of inserting type coercion, which is currently in the physical planner rather than the logical planner) on #3031 with @andygrove and @liukun4515

Perhaps this is another example / rationale to move calling the type coercion logic from the physical planner to the logical planner 🤔

@liukun4515
Copy link
Contributor

@Dandandan @alamb is anywhere documentation on planner?

I don't know of any real documentation on the planner

Note there is a discussion about the current location of inserting type coercion, which is currently in the physical planner rather than the logical planner) on #3031 with @andygrove and @liukun4515

Perhaps this is another example / rationale to move calling the type coercion logic from the physical planner to the logical planner 🤔

I will take a look this pr and issue later today or tomorrow

@comphead
Copy link
Contributor Author

comphead commented Aug 22, 2022

Thanks @alamb @liukun4515
Imho its preferable to coerce in logical plan, and we probably have to avoid any kind of downcast before coercion. That looks like a beefy change. I'm happy to give a hand if needed

@liukun4515
Copy link
Contributor

Thanks @alamb @liukun4515 Imho its preferable to coerce in logical plan, and we probably have to avoid any kind of downcast before coercion. That looks like a beefy change. I'm happy to give a hand if needed

Thank you @comphead , We discussed about the migration the coercion rule from physical phase to logical phase #3031 and other issue.

But I need to take look at this issue and pr first, how and where to migrate the coercion is another thing.

@alamb
Copy link
Contributor

alamb commented Nov 28, 2023

Closing as this PR is over a year old. Please feel free to reopen it / rebase it if you plan to keep working on it

@alamb alamb closed this Nov 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core datafusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

hash_join panics when join keys have different data types
4 participants