-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade to DataFusion 13 (784f10bb) / Arrow 25.0.0 #176
Commits on Oct 26, 2022
-
Upgrade DataFusion to 13.0.0, Arrow to 25.0.0
The actual 13.0.0 DF release uses Arrow 24.0.0, but we need to pick up 25.0.0, since it brings back the Arrow Schema/Field-to-JSON serialization code (albeit in a different crate for integration tests). apache/arrow-rs#2868 apache/arrow-rs#2724
Configuration menu - View commit details
-
Copy full SHA for 0cb2ed9 - Browse repository at this point
Copy the full SHA 0cb2ed9View commit details -
It's now the default HashMap implementation and DF's planner uses it as well, so we can use std::HashMap everywhere.
Configuration menu - View commit details
-
Copy full SHA for b2d78ef - Browse repository at this point
Copy the full SHA b2d78efView commit details -
Configuration menu - View commit details
-
Copy full SHA for 94dc4d4 - Browse repository at this point
Copy the full SHA 94dc4d4View commit details -
Configuration menu - View commit details
-
Copy full SHA for efb6187 - Browse repository at this point
Copy the full SHA efb6187View commit details -
Fix some expected output change tests
Arrow file hash changes and minor changes in the query plan output
Configuration menu - View commit details
-
Copy full SHA for 2635b05 - Browse repository at this point
Copy the full SHA 2635b05View commit details -
Configuration menu - View commit details
-
Copy full SHA for 38c0568 - Browse repository at this point
Copy the full SHA 38c0568View commit details -
Include
UPDATE
/DELETE
in the query optimizerMake the `Update`/`Delete` nodes expose `inputs` and `expressions` in order to let the DF query optimizer work on the `WHERE ...` / `SET col = expr` expressions. This is slightly hacky: - as an "input", we return a `TableScan` node that we don't use after that (this is just so that the optimizer knows the input schema for all the expressions) - return the expressions used by the node and add code to pack/unpack them into a list The point of this is to let DataFusion run the `TypeCoercion` optimization, without which something like `WHERE float_col > 42` will raise an error (as after DF 13 these type coercions got removed from other places and moved into optimizations) (NB this doesn't work yet, we still get type coercion errors)
Configuration menu - View commit details
-
Copy full SHA for 7a61a7c - Browse repository at this point
Copy the full SHA 7a61a7cView commit details
Commits on Oct 27, 2022
-
Run the query optimizer for UPDATE/DELETE
(normally it's run only by DataFusion's `create_physical_plan`, but we don't run that, so we have to execute it manually to get auto type coercion working)
Configuration menu - View commit details
-
Copy full SHA for 0c6ece6 - Browse repository at this point
Copy the full SHA 0c6ece6View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2a27fc2 - Browse repository at this point
Copy the full SHA 2a27fc2View commit details -
Add more verbose plan output to
Update
/Delete
Include `SET` expressions and the predicate if it exists to aid debugging.
Configuration menu - View commit details
-
Copy full SHA for b4cfc90 - Browse repository at this point
Copy the full SHA b4cfc90View commit details -
Remove aliases from optimized
Update
/Delete
sThese expressions are similar to what DataFusion uses in the `Filter` node and not doing this seems to break partition pruning (perhaps it stops at the `Alias` node and doesn't prone anything, didn't investigate in depth). Copy the `ExprRewriter` visitor from https://github.com/apache/arrow-datafusion/blob/c50573939d21de40e591c04915d41f7c46a51d0d/datafusion/expr/src/utils.rs#L384-L428 and adapt it to remove aliases from all expressions that the query optimizer gives back to `Update`/`Delete` nodes.
Configuration menu - View commit details
-
Copy full SHA for 32eabcc - Browse repository at this point
Copy the full SHA 32eabccView commit details -
Assert the query plan in update/delete tests
Make sure the constants are correctly cast and let us detect changes to the optimizer faster with new DF updates.
Configuration menu - View commit details
-
Copy full SHA for 7877070 - Browse repository at this point
Copy the full SHA 7877070View commit details