Changelog

15.0.0 (2022-12-01)

Full Changelog

Breaking changes:

Expose remaining parquet config options into ConfigOptions (try 2) #4427 (alamb)
Config Cleanup: Remove TaskProperties and KV structure, keep key=value serialization #4382 (alamb)
add {TDigest,ScalarValue,Accumulator}::size #4342 (crepererum)
API-break: Support SubqueryAlias and remove Alias in Projection #4333 [sql] (jackwener)
split try_new_with_schema_alias from original code #4284 (jackwener)
Collapse statistics in normal explain plan #4157 (alamb)
Linearize binary expressions to reduce proto tree complexity #4115 (isidentical)
support SET Timezone #4107 [sql] (waitingkuo)

Implemented enhancements:

Refactor Built-in, Aggregate window functions to increase code reuse. #4440
Helper to get "root" error #4435
Do NOT convert intermediate/source errors to strings. #4434
Estimate the total_byte_size of the filter expression's result when selectivity is available #4374
refactor the code of the HashJoin #4356
CoalesceBatchesExec reports no ordering #4331
Introduce tournament tree to achieve better k-way sort-merging #4300
Add a checker to confirm physical optimizer rules will keep the physical plan schema immutable #4299
Remove the macro rule unary_scalar_expr from expr_fn.rs #4298
Remove Alias-in-Projection, replace it with SubqueryAlias #4291
reimplement reduce_outer_join #4270
Reimplement filter_push_down #4266
Reimplement eliminate_limit #4264
Reimplement limit_push_down #4263
Make a data driven SQL testing tool (so we can reuse duckdb test suite, example) #4248
upgrade chrono to 0.4.23 #4224
support scan non-string columns partitioned parquet files #4218
Allow optimizer rules to skip optimizing plans #4209
Supporting specifying schema when create tables #4183
Improve ergonomics of creating ListingOptions #4178
Add ability to specify external sort information for ParquetExec #4169
Add another method to collect referenced columns from an expression #4152
Improve EXPLAIN ANALYZE output for parquet exec #4144
TableProviderFactory::create should have Optional<DFSchemaRef> parameter #4142
Support more expressions in equality join #4140
JoinSelection Rule to choose physical join implementation: HashJoin(Partitioned or CollectLeft) or SortMergeJoin base on Stats #4139
Allow TPCH tooling to create a combined result for easier processing by outside tools #4127
Allow additional options when creating an external table #4125
reuse code utils::optimize_children instead of redundant implementation #4120
Add test field to PR template #4113
Allow for automatic registration of ListingTables #4111
Add CI check that configs.md is up-to-date #4108
Support SET timezone to non-UTC time zone #4106
Parquet predicates contains and true expressions #4091
Replace RwLock<HashMap> and Mutex<HashMap> by using DashMap #4077
add support for .xz compressed files #4074
add a feature gate to make support for compressed files optional #4073
Support serializing more deeply nested AND / OR expressions #4066
Use f64::total_cmp instead of OrderedFloat #4051
Add documentation to make it clear that decimal support is still experimental #4036
Simplify Pushed Down Predicates #4020
Improve HashJoinExec metrics #4009
Move physical plan serde from Ballista to DataFusion #3949
Support SubqueryAlias better in planner #3927
A framework for expression boundary analysis (and statistics) #3898
Replace Filter: Boolean(false) with EmptyRelation #3864
Implement statistics estimation for FilterExec #3845
Support parquet page filtering for more types: String, Binary(Decimal), Int96 #3833
Allow configuring parquet filter pushdown dynamically #3821
Unable to register tables in non-cloud S3 servers #3640
support more data type in prune for cast/try_cast #3442
Disable spill to disk globally #3264
Consider to categorize Operator #3216
Replace Projection.alias with SubqueryAlias #2212
[Optimizer] Eliminate the distinct #2045
beautify datafusion's site: https://arrow.apache.org/datafusion/ #1819
split datafusion-logical-plan sub-module #1755
convert outer join to inner join to improve performance #1585
Add sqllogictest for datafusion #1453
Add additional simplification rules #1406
support more subqueries #1209
Add baseline metrics for remaining execution plan nodes #1019
Make ExecutionPlan implementations immutable #987
Architecture overview may be insufficient in README #980
Add a separate configuration setting for parallelism of scanning parquet files #924
Support hash repartion elimination #41

Fixed bugs:

pyarrow CI failed #4448
UnwrapCastInComparison exist bug #4430
The CLI panics when passing an invalid explain query #4378
HashJoin should return Err when the right side input stream produce Err #4362
Optimizer check errors if resulting schema has different metadata #4346
Panic with function to_hex #4339
LimitPushDown pushdown into limit, result is wrong #4308
DESCRIBE statement issue with qualified table references #4303
Panic with window function LAST_VALUE #4297
CI failed in Compare to postgres #4294
Field alias can't work in where clause #4288
Some valid filters are not pushed down to parquet scan #4282
The type renaming pub type NullColumnarValue = ColumnarValue makes no sense #4271
Current limit_push_down can't support cross_join #4256
Cargo test fail #4253
RightSemi/RightAnti HashJoin has bug, the left_indices is never populated, causing failure to apply join filters. #4247
Clippy failures #4245
Cannot query s3 data from datafusion-cli #4239
Bug parsing interval with negative values #4237
cargo test reports errors on the master branch. #4236
Doc of the expression functionlog2 is incorrect #4231
HashJoin with mode PartitionMode:CollectLeft has bug and can produce wrong result #4230
Add ambiguous check when generate projection plan #4210
What happened for NDJSON support on CLI? #4198
Add ambiguous check when generate join plan #4197
Clippy failing on master : error: use of deprecated associated function chrono::NaiveDate::from_ymd: use from_ymd_opt() instead #4187
Reimplement the eliminate_cross_join #4176
Incorrect handling of column names #4166
Update release scripts to support datafusion-benchmarks #4134
Bug in interpreting correctly parsed SQL with aliases #4123
The percentile argument for ApproxPercentileCont must be Float64, not Decimal128(2, 1) #4103
Panic when using array_agg #4080
Wrong result for FIRST_VALUE AND LAST_VALUE window functions #4076
Round error when casting float to decimal #4071
Predicate still has cast when comparing Timestamp(Nano, None) to a timestamp literal, so can't be pushed down or used for pruning #3938
Revisit required_child_distribution(), output_partitioning(), output_ordering() implementations in ExecutionPlan's implementations #3653
Can't push down projection after do type coercion #3583
In some circumstances cast expression is not working #3499
output_partitioning() and output_ordering() implementations are wrong in some physical plan implementations with alias #3400
Interval Literal doesn't work for timeunit less than millisecond #3204
INTERVAL literal with duplicated interval types should raise error #3183
Error occurs when only using partition columns in query #1999
regex_match does not compile using the g flag #1429
between with NULL literals does not work: can't be evaluated because there isn't a common type to coerce the types to #1193
[Datafusion] Error with CAST: Unsupported SQL type Time #193

Closed issues:

SQL level coverage for when memory limit is exceeded #4404
Throw error (not panic) if a listing table specifies an missing partition column #4350
Page index pruning fail on complex_expr #4317
optimize limit-full join in the limit push down rule #4275
infer_schema function is not working with s3 Urls or http endpoints #4269
Add support binary boolean operators with nulls #4241
Add additional testing to parquet predicate pushdown integration tests #4087
Add metrics for parquet page level skipping #4086
Add parquet page index pushdown metrics #4058
Throw a runtime error if the memory allocated to GroupByHash exceeds a limit #3940
support unsigned numeric data type in UnwrapCastInBinaryComparison rule #3702
Support type cast in union #2125
[EPIC] Memory Limited Sort (Externalized / Spill) #1568
Maintain partition information in Union #189
Add coercion support for NULL literals #185

Merged pull requests:

Make datafusion-sql depend on arrow-schema instead of arrow #4456 [sql] (mbrobbel)
replace the comparator for decimal array op scalar using arrow kernel #4453 (liukun4515)
Fix pyarrow test #4450 (mvanschellebeeck)
Replace &Option<T> with Option<&T> #4446 [sql] (askoa)
Improve error handling for array downcasting #4445 (retikulum)
Refactor Builtin Window Function Implementation #4441 (mustafasrepo)
feat: DataFusionError::find_root #4437 (crepererum)
fix: do NOT convert errors to strings but keep the type #4436 (crepererum)
The CLI panics when passing an invalid explain query #4429 (comphead)
[minor] use arrow kernel concat_batches instead combine_batches #4423 (Ted-Jiang)
fix panic on to_hex function for negative numbers #4422 (retikulum)
Optimize filter executor in pull-based executor #4421 (xudong963)
optimize limit push for join case #4411 (liukun4515)
Add integration test for erroring when memory limits are hit #4406 (alamb)
feat: ResourceExhausted for memory limit in AggregateStream #4405 (crepererum)
Update to arrow 28 #4400 [sql] (tustvold)
Update rstest requirement from 0.15.0 to 0.16.0 #4399 (dependabot[bot])
Add sqllogictests (v0) #4395 (mvanschellebeeck)
improve hashjoin execution metrics #4394 (AssHero)
Add with_new_inputs for LogicalPlan #4393 (jackwener)
Clean the code in limit.rs. #4391 (HaoYang670)
Move physical plan serde from Ballista to DataFusion #4390 (Kikkon)
Fix page index pruning fail on complex_expr #4387 (Ted-Jiang)
Add check for nested types in equivalent names and types #4380 (alamb)
refine the code of build schema for ambiguous check, factor this out into a function #4379 [sql] (AssHero)
Refactor the Hash Join #4377 (liukun4515)
Minor: Fix typos in the documentation #4376 (martin-g)
Include byte size estimates in the filter statistics #4375 (isidentical)
HashJoin should return Err when the right side input stream produce Err, add more join UTs to cover different join types #4373 [sql] (mingmwang)
feat: ResourceExhausted for memory limit in GroupedHashAggregateStream #4371 (crepererum)
Use limit() function instead of show_limit() in the first example #4369 (martin-g)
Update env_logger requirement from 0.9 to 0.10 #4367 (dependabot[bot])
reimplement push_down_filter to remove global-state #4365 (jackwener)
Support to use Schedular in tpch benchmark #4361 (xudong963)
Adding more dataframe example to read csv files #4360 (DataPsycho)
minor: correct name and typo #4359 (jackwener)
Do not log error if page index can not be evaluated #4358 (alamb)
Clean the expr_fn - use scalar_expr to create unary scalar expr functions, remove macro unary_scalar_functions #4357 (HaoYang670)
Throw error (not panic) if a listing table specifies an missing partition column #4354 (doki23)
Improve error handling and add some more types for proper downcasting #4352 (retikulum)
Add check to avoid underflow in memory manager #4351 (askoa)
Improve error messages when memory is exhausted while sorting #4348 (alamb)
Do not error in optimizer if resulting schema has different metadata #4347 (alamb)
minor: improve optimizer logging and do not repeat rule name #4345 (alamb)
minor: fix typos in test names #4344 [sql] (alamb)
Minor: Add docstrings to EliminateOuterJoins optimizer pass #4343 (alamb)
Minor: refactor: isolate common memory accounting utils #4341 (crepererum)
minor: make plan_from_tables return one plan instead of Vec #4336 [sql] (jackwener)
enhancement: when fetch == 0, pushdown limit 0 instead skip+fetch. #4334 (jackwener)
Teach optimizer that CoalesceBatchesExec does not destroy output order #4332 (alamb)
Add ability to disable DiskManager #4330 (tustvold)
Update cli.md #4329 (psvri)
fix bug: right semi join can't support the filter #4327 (liukun4515)
reimplment eliminate_limit to remove global-state. #4324 (jackwener)
Refine Err propagation and avoid unwrap in transform closures #4318 (mingmwang)
Add a checker to confirm physical optimizer rules will keep the physical plan schema immutable #4316 (mingmwang)
Refactor downcasting functions with downcastvalue macro and improve error handling of ListArray downcasting #4313 (retikulum)
minor: add another test case to cover join ambiguous check #4305 [sql] (ygf11)
Fix DESCRIBE statement qualified table issue #4304 [sql] (gruuya)
Use tournament loser tree for k-way sort-merging, increase merge speed by 50% #4301 (richox)
Pin Python setuptools in the CI to fix integration tests #4296 (isidentical)
Support SubqueryAlias in optimizer, physcial planner. #4293 (jackwener)
minor: avoid a clone into string when checking ambiguous #4292 [sql] (ygf11)
replace the comparison op for decimal array op using the arrow-rs kernel #4290 (liukun4515)
MINOR: replace {..} with (_), typo, remove outdated TODO #4286 (jackwener)
Reduce Expr copies in ParquetExec #4283 (alamb)
Fix issue in filter pushdown with overloaded projection index #4281 (thinkharderdev)
Skip useless pruning predicates in ParquetExec #4280 (alamb)
Push down more predicates into ParquetExec #4279 (alamb)
Fix EXPLAIN plan for ParquetExec to show pruning_predicate #4278 (alamb)
reimplement limit_push_down to remove global-state, enhance optimize and simplify code. #4276 (jackwener)
Bump actions/labeler from 4.0.2 to 4.1.0 #4274 (dependabot[bot])
Remove the type alias NullColumnarValue #4273 (HaoYang670)
reimplement eliminate_outer_join #4272 (jackwener)
Fix bugs in parsing with header row and partitioned by #4268 [sql] (HaoYang670)
improve error messages while downcasting UInt32Array, UInt64Array and BooleanArray #4261 (retikulum)
add ambiguous check for projection #4260 [sql] (AssHero)
Add ambiguous check for join #4258 [sql] (ygf11)
support cross_join in limit_push_down #4257 (jackwener)
Support parquet page filtering on min_max for decimal128 and string columns #4255 (Ted-Jiang)
fix conflict and UT, cleanup redundant legacy code #4252 (jackwener)
Minor: remove unecessary clone() in planner #4249 [sql] (alamb)
Fix nightly clippy failures #4246 (mvanschellebeeck)
Improve Error Handling and Readibility for downcasting Float32Array, Float64Array, StringArray #4244 (retikulum)
Use defaults for ListingOptions builder #4243 (mvanschellebeeck)
Support binary boolean operators with nulls #4242 (Ted-Jiang)
Fixing doc of the expression #4240 (Creampanda)
Fix negative interval parsing bug #4238 (Jefffrey)
remove duplicate or redundant code #4235 (jackwener)
add a checker to confirm optimizer can keep plan schema immutable. #4233 (jackwener)
Fix the percentile argument for ApproxPercentileCont must be Float64, not Decimal128(2, 1) #4228 (comphead)
refactor how we create listing tables #4227 (timvw)
Update sqlparser requirement from 0.26 to 0.27 #4226 [sql] (alamb)
upgrade required chrono version to 0.4.23 #4225 (waitingkuo)
Support types other than String for partition columns on ListingTables #4221 (doki23)
[CBO] JoinSelection Rule, select HashJoin Partition Mode based on the Join Type and available statistics, option for SortMergeJoin #4219 (mingmwang)
Remove alias in Union #4212 (jackwener)
Add try_optimize method #4208 (andygrove)
Provide a builder for ListingOptions with fixups #4207 (alamb)
Avoid error with empty iterators used for ScalarValue::iter_to_array #4206 (GrandChaman)
Improve error message for regexp_match 'g' flag #4203 (Jefffrey)
Return ResourceExhausted errors when memory limit is exceed in GroupedHashAggregateStreamV2 (Row Hash) #4202 (crepererum)
Add additional expr boolean simplification rules #4200 (Jefffrey)
Update to arrow and parquet 27.0.0 #4199 [sql] (tustvold)
Support create table with explicit column definitions #4194 [sql] (doki23)
Support all equality predicates in equality join #4193 [sql] (ygf11)
add propagate_empty_relation optimizer rule #4192 (jackwener)
fix clippy #4190 [sql] (jackwener)
Fix clippy by avoiding deprecated functions in chrono #4189 (alamb)
Disallow duplicate interval types during parsing #4188 (Jefffrey)
Parse nanoseconds for intervals #4186 (Jefffrey)
Add rule to reimplement Eliminate cross join and remove it in planner #4185 [sql] (jackwener)
[FOLLOWUP] Enforcement Rule: resolve review comments, refactor adjust_input_keys_ordering() #4184 (mingmwang)
Simplify boolean parquet pushdown predicate #4182 (Jefffrey)
Minor: consolidate parquet custom_reader integration test into parquet_exec #4175 (alamb)
minor: remove redundant println and cleanup #4173 (jackwener)
Add ability to specify external sort information for ListingTables #4170 (alamb)
Improve Error Handling and Readibility for downcasting Decimal128Array #4168 (retikulum)
Minor: Remove completed comment on parquet row group pruning #4167 (alamb)
Update hashbrown requirement from 0.12 to 0.13 #4164 (dependabot[bot])
MINOR: enable dyn_cmp_dict feature on arrow for physical expr crate #4163 (isidentical)
Derive filter statistic estimates from the predicate expression #4162 (isidentical)
Minor: pass ParquetFileMetrics to build_row_filter in parquet #4161 (alamb)
Minor: Extract parquet row group pruning code into its own module #4160 (alamb)
Full support for time32 and time64 literal values (ScalarValue) #4156 (andre-cc-natzka)
Window frame GROUPS mode support #4155 (zembunia)
Improve error messages while downcasting Int64Array #4154 (retikulum)
Add another method to collect referenced columns from an expression #4153 [sql] (ygf11)
Remove BoxedAsyncFileReader #4150 (tustvold)
Support unsigned integers in unwrap_cast_in_comparison Optimizer rule #4149 (alamb)
Add support for DataType::Timestamp casts in unwrap_cast_in_comparison optimizer pass #4148 (alamb)
Add additional testing for unwrap_cast_in_comparison #4147 (alamb)
improve error messages while downcasting Int32Array #4146 (retikulum)
Minor: Update docstring on unwrap_cast_in_comparison #4145 (alamb)
add schema parameter to table provider factory create method #4143 (milenkovicm)
fix: shouldn't pass alias through into subquery. #4141 [sql] (jackwener)
Preserve the Cast expression in columnize_expr #4137 [sql] (HaoYang670)
Set versions to dependencies with path in benchmarks Cargo.toml file #4136 (ArkashaJavelin)
Fix links #4135 (mvanschellebeeck)
Use f64::total_cmp instead of OrderedFloat #4133 (comphead)
Add parquet integration tests for explicitly smaller page sizes, page pruning #4131 (alamb)
Consolidate ParquetExec tests in parquet_exec integration test #4130 (alamb)
Minor: Use upstream BooleanArray::true_count #4129 (alamb)
Combined TPCH runs & uniformed summaries for benchmarks #4128 (isidentical)
Enable TableProviderFactories to receive additional options when creating an external table #4126 [sql] (timvw)
Add CI check that configs.md is up-to-date #4124 (mvanschellebeeck)
[Part3] Partition and Sort Enforcement, Enforcement rule implementation #4122 (mingmwang)
reuse code utils::optimize_children but affect inline. #4121 (jackwener)
reuse code utils::optimize_children instead of redundant implementation #4119 (jackwener)
Allow listing tables to be created via TableFactories #4112 (avantgardnerio)
Update SQL reference to state that decimal support is currently experimental #4109 (andygrove)
Add metrics for parquet page level skipping #4105 (Ted-Jiang)
Add parser option for parsing SQL numeric literals as decimal #4102 [sql] (andygrove)
Replace RwLock<HashMap> and Mutex<HashMap> by using DashMap #4079 (yahoNanJing)
Custom window frame support extended to built-in window functions #4078 (mustafasrepo)
Enable tests for page index filtering in parquet filter pushdown test #4062 (alamb)
[Part2] Partition and Sort Enforcement, ExecutionPlan enhancement #4043 (mingmwang)
add support for xz file compression and compression feature #3993 [sql] (Jimexist)
Expression boundary analysis framework #3912 (isidentical)

14.0.0-rc1 (2022-11-04)

Full Changelog

14.0.0 (2022-11-04)

Full Changelog

Breaking changes:

Improve FieldNotFound errors #4084 [sql] (andygrove)
Refactor: move simplify_expression.rs and expr_simplifier.rs to a new mod simplify_expressions #3951 (HaoYang670)
Support for non-u64 types for Window Bound #3916 [sql] (mustafasrepo)
Expose parquet reader settings using normal DataFusion ConfigOptions #3822 (alamb)
Add Filter::try_new with validation #3796 [sql] (andygrove)
Change public simplify API and add a public coerce API #3758 (alamb)

Implemented enhancements:

Automatically register tables if ObjectStore root is configured #4094
Simplify small InList expressions #4089
Support SET command #4067
add uuid() function to generate unique uuid per row #4045
Publish benchmark crate so that it can be used as a library in Ballista #4016
Add statistics methods to TableProvider trait for use in cost-based optimizations in the logical plan #3983
Implement current_time Function #3982
Implement current_date Function #3981
Put common code used for testing code into datafusion/test_utils.rs #3960
Print the configurations of ConfigOptions in an ordered way so that we can directly compare the equality of two ConfigOptions by their debug strings #3952
Don't make dependants install protoc #3947
Implement right anti join and support it in HashBuildProbeOrder #3946
Implement right semi join and support it in HashBuildProbeOrder #3945
Refactor simplify_expressions and expr_simplifier #3934
Implement serialization for ScalarValue::FixedSizeBinary #3928
Support inlining view / dataframes logical plan #3913
Plans with tables from TableProviderFactorys can't be serialized #3906
Simplify a AND a and a OR a. #3895
Allow configuring statistics on TPC-H benchmarks #3888
CI checks stuck in queued mode #3883
Multiple optimizer passes #3879
datafusion-proto does not support view table scan #3874
TableProviderFactories need to be async and return a Result to be useful #3866
Factorize common AND factors out of OR predicates to support filterPushDown as possible #3858
Replace concat_ws with concat when the delimiter is empty string #3857
Concatenate contiguous literal arguments of concat_ws when doing the expression simplification #3856
Partition and Sort Enforcement #3854
Enable mimalloc by default in benchmarks #3851
Add collect statistics configuration #3847
[SQL] - Support cache/uncache table syntax #3842
Filter pushdown doesn't seem to apply for filter on TPC-H Q17 #3839
Support pushdown multi-columns in PageIndex pruning. #3834
Consolidate Expr manipulation code so it is more discoverable and make it easier to use #3808
Leverage input array's null buffer for regex replace to optimize sparse arrays #3803
Improve join cardinality estimation when there is no overlap in the min/max values #3802
datafusion-cli up to date check is failing on master #3798
Optimize benchmark q2 subquery filter #3789
Benchmark should infer schema when running against Parquet #3776
Allow specialized physical functions to provide hints for the array adapter #3762
[User Guide] Add EXPLAIN to SQL reference #3755
move type coercion for agg/agg udf #3752
Prevent Cargo.lock for datafusion-cli being out-of-date #3744
Add example of expr apis including simplification and coercion #3740
support type coercion for ScalarFunction expr in the logical phase #3731
Add support for DISTINCT projections in decorrelate_where_exists #3724
Add type coercion rule for CONCAT and CONCAT_WS #3720
Expose and document a simpler public API for simplify expressions #3709
Expose + document the type coercion API publicly #3708
Concatenate contiguous literal arguments of CONCAT during the expression simplification. #3683
DataFusion 13.0.0 Release #3671
Add division by 0 rules in the expression simplification #3663
Compressed CSV/JSON Read #3641
remove type coercion for agg #3623
extract or clause as predicate for join rels #3577
Improve performance of regex_replace #3518
Add benchmarks for parquet queries with filter pushdown enabled #3457
Make type coercion rule more robust #3390
ViewTable::scan ignores filters and limits #3249
Add CREATE VIEW documentation to user guide #3211
Push additional parquet filtering into the parquet scan [EPIC] #3147
Remove core/logical_plan module #2683
Datafusion Optimizer Enhancement #2255
[Optimizer] Eliminate self compare self #2252
Break datafusion crate into smaller crates #1750
Benchmark constellation-rs/amadeus's parquet implementation #1341
Use parquet2 async reader in physical_plan/parquet #1058
Table Scan Enhancement Plan #944
Implement parquet page-level skipping with column index, using min/max stats #847
Support min/max statistics in ParquetTable and ParquetExec #537

Fixed bugs:

Clippy failing on master #4100
Panic when the number of partitions of the pipeline that throws the exception is inconsistent with the number of partitions output by the query #4096
FieldNotFound when field is available #4083
SingleDistinctToGroupBy being applied too broadly #4082
single_distinct_to_groupby strips qualifiers from group-by expressions #4049
Another Internal error when parquet predicate pushdown is enabled "Error evaluating filter predicate: #4046
Decimal multiplied by Float produces incorrect results #4035
Cannot query external table - TableScan replaced with EmptyExec #4027
benchmark q17 produces incorrect result #4026
benchmark q14 produces incorrect result #4025
benchmark q11 producing incorrect results #4023
Internal error when parquet predicate pushdown is enabled "Error evaluating filter predicate:" #4006
Incorrect results with parquet filtering pushdown enabled #4005
Wrong results when parquet page index filtering is enabled #4002
Output schema of semi join has invalid projection added after HashBuildProbeOrder #4001
async deserialization functions are unintuitive and possibly insecure #3977
Expr::to_bytes can produce output that hits Expr::from_bytes recursion limit #3968
Bug on propagating arrow field metadata #3964
Predicate still has cast when comparing Timestamp(Nano, None) to a timestamp literal, so can't be pushed down or used for pruning #3938
Error using IN list on dictionary encoded data: InList does not support datatype Dictionary(Int32, Utf8). #3936
Internal error in CAST from Timestamp[us] #3922
ScalarValue not implemented for FixedSizeBinary types #3910
[DOC] - There are unsupported DDL in the official documentation #3904
datafusion-proto deserialize with Substring(str [from int] [for int]) fails #3901
count(Literal) gives wrong column name #3891
projection_push_down adds duplicate projections with multiple passes #3881
Default physical planner generates empty relation for DROP TABLE, CREATE MEMORY TABLE, etc #3873
Binary expression canonical names are incorrect in some cases #3865
Using the window function lag causes panic. #3830
chrono crate : specify 0.4.22 as the minimum version due to spurious build failures #3827
datafusion-proto deserialize with q16 sql fails #3820
Filter predicates should not be aliased #3795
Write csv not save all lines of dataframe #3783
Regression in simplifying expressions in subqueries #3760
DataFusionError(Internal("The size of the sorted batch is larger than the size of the input batch: 2120 > 2312")) #3747
"labeler" PR check is broken #3743
DataFrame::select_columns doesn't work with names containing "." #3733
TPC-H Query 1 has regressed #3729
[RUST][Datafusion] What causes "Error: Execution("file size of 4 is less than footer")" error? #3800
Field names containing periods such as f.c cannot work #3682
TableProvider implementation for DataFrame does not support filter pushdown #3681
using Decimal(0) make system panicked #3665
Cannot query some parquet files in S3, but they work locally #3633
col / col returns 1 when col = 0 #3615
register_csv allow space in table_path #3589
Hardcoded u64 for WindowFrameBound fields #3571
docs.rs cannot build datafusion-proto crate #3538
Row Hash loads whole aggregation state to memory before sending #3460
approx_percentile_cont return wrong result when scan multi parquet files. #3140
User guide is incorrect regarding using CLI to register CSV files using schema inference #3001
Exception: Internal error, Exception: Schema error #2938
Version 0.6.0 Panic error during SQL execution #2738
wrong result when operation parquet #2044
Local object store accepts file:/// as base path, but LocalStore returns meta without the prefix. #1923
Reading nested parquet files results in index out of bounds #1383
- (negation) with NULL literals does not work: can't be evaluated because the expression's type is Utf8, not signed #1192
Inconsistent cast behavior #957
single_distinct_to_groupby no longer drops qualifiers #4050 [sql] (andygrove)

Documentation updates:

Clarify in docs that Identifiers are made lower-case in SQL query #2374
Fix broken links in contributor guide #3956 (Jefffrey)
add create view explanation #3925 (retikulum)
Update datafusion-examples README #3814 (alamb)
Add Seafowl to list of projects using DataFusion #3792 (mildbyte)

Closed issues:

[QUESTION] How many times should be the function create_name called when executing a query? #3900
Improve the Expr string format #3878
Simplify division by zero (division by one / multiplication by zero / multiplication by one) for Decimal types as well #3643
InList: merge check branch #2833
Optimization InList: compare the float data type using OrderedFloat<T> #2831
Outdated section of the add function of the contribution guide #2560
Optimize InList implementation with native types rather than ScalarValue #2165
Improve testing of optimizers using EXPLAIN #1118
Crash on parsing sql query with Cyrillic letters #184
[EPIC] Support all TPC-H queries in benchmark #158
Implement optional second argument to ltrim and rtrim functions #144
Benchmark crate does not have a SIMD feature #124
ColumnarValue::into_array should not require batch #113
[Rust] Parquet data source does not support complex types #83

Merged pull requests:

Appease new clippy #4101 (alamb)
minor: Split parquet reader up into smaller modules #4099 (alamb)
[MINOR] Update SET in cli.md #4098 (waitingkuo)
fix: Scheduler panic routing errors #4097 (yukkit)
Automatically register tables if ObjectStore root is configured #4095 (avantgardnerio)
minor: Use Operator::swap #4092 (alamb)
Simplify small InListExpr #4090 (Dandandan)
Minor: Add arrow-rs ticket reference and turn some comments into docstrings #4088 (alamb)
Support Dictionary in InListExpr #4070 (tustvold)
support SET variable #4069 [sql] (waitingkuo)
Add in list bench #4068 (tustvold)
Improve Error Handling and Readibility for downcasting StructArray #4061 (retikulum)
Build tests separately from running #4060 (alamb)
Simplify InListExpr ~20-70% Faster #4057 (tustvold)
MINOR: Print unoptimized logical plan in execute_query of tpch benchmark #4056 (viirya)
Minor: clean the code in eliminate_filter #4055 (HaoYang670)
Implement current_time scalar function #4054 (naosense)
Cleanup hash_utils adding support for decimal256 and f16 #4053 (tustvold)
Fix multicolumn parquet predicate pushdown (#4046) #4048 (tustvold)
Add CI checks that we can serde all benchmark queries #4047 (andygrove)
Enable more benchmark verification tests #4044 (andygrove)
Extract common parquet testing code to parquet-test-util crate #4042 (alamb)
add uuid() function #4041 (Jimexist)
Update to arrow 26, change timezones #4039 [sql] (tustvold)
Fix Decimal and Floating type coerce rule #4038 (viirya)
Reserve the literal expression of Count function #4031 [sql] (HaoYang670)
Implement current_date scalar function #4022 (comphead)
Fix predicate pushdown bugs: project columns within DatafusionArrowPredicate (#4005) (#4006) #4021 (tustvold)
minor: remove redundant code/TODO #4019 (jackwener)
Add CI check to verify that benchmark queries return the expected results #4015 (andygrove)
Minor: Add TODO and tracking ticket reference #4012 (alamb)
Add right anti join support and support it in HashBuildProbeOrder #4011 (Dandandan)
MINOR: Generate expected benchmark query results #4010 (andygrove)
Minor: remove unecessary clippy allow #4008 (alamb)
Minor: Do what clippy says and clean up some code #4007 (alamb)
Improve Error Handling and Readibility for downcasting Date32Array #4004 (retikulum)
Don't add projection for semi joins in HashBuildProbeOrder #4000 (Dandandan)
Minor: use DataType::is_nested #3995 (alamb)
[minor] bump prettier version #3992 (Jimexist)
Add parquet predicate pushdown metrics #3989 (alamb)
Pin datafusion-proto build dependencies #3987 (tustvold)
Add TableProvider.statistics method #3986 (andygrove)
Add Pull Request guidelines to contributor guide #3985 (alamb)
Update protos #3979 (tustvold)
Revert async changes but keep deltalake working #3978 (avantgardnerio)
Correctness integration test for parquet filter pushdown #3976 (alamb)
MINOR: Stop pretty printing batches in benchmark when there are no results #3974 (andygrove)
MINOR: Re-export Cast struct #3971 (andygrove)
fix: check recursion limit in Expr::to_bytes #3970 (crepererum)
[Part1] Partition and Sort Enforcement, PhysicalExpr enhancement #3969 (mingmwang)
Support pushdown multi-columns in PageIndex pruning. #3967 (Ted-Jiang)
Fix benchmarks README formatting #3966 (Jefffrey)
Bug fix on DFField to Field conversion: preserve metadata #3965 (metesynnada)
Informative Error Message for LAG and LEAD functions #3963 (mustafasrepo)
Minor: Add some docstrings to FileScanConfig and RuntimeEnv #3962 (alamb)
Move common code used for testing code into datafusion/test_utils #3961 (alamb)
Update minimum chrono dependency to 0.4.22 #3959 (alamb)
Implement right semi join and support in HashBuildProbeorder #3958 (Dandandan)
Print the configurations of ConfigOptions in an ordered way so that we can directly compare the equality of two ConfigOptions by their debug strings #3953 (yahoNanJing)
Vendor Generated Protobuf Code (#3947) #3950 (tustvold)
Implement serialization for ScalarValue::FixedSizeBinary #3943 (retikulum)
Consolidate physical join code into datafusion/core/src/physical_plan/joins #3942 (alamb)
Add optimizer test for simplifying predicates on timestamps #3939 (alamb)
Add test for querying predicate on dictionary #3937 (alamb)
fix: return error for unsupported SQL #3933 (Kikkon)
doc: fix doc about CREATE TABLE IF NOT EXISTS #3932 (jackwener)
Refactor Expr::Cast to use a struct. #3931 [sql] (jackwener)
minor: fix some typo. #3930 (jackwener)
chore: update cranelift-related dependencies #3926 (xudong963)
Change cast error from Internal to NotImplemented #3924 (alamb)
Support inlining view / dataframes logical plan #3923 (Dandandan)
Add test for Simplify redundant predicates #3915 (src255)
Implement ScalarValue for FixedSizeBinary #3911 (maxburke)
Add serde for plans with tables from TableProviderFactorys #3907 (avantgardnerio)
Support filter/limit pushdown for views/dataframes #3905 (Dandandan)
Factorize common AND factors out of OR predicates to support filterPu… #3903 (Ted-Jiang)
Add Substring(str [from int] [for int]) support in datafusion-proto #3902 (r4ntix)
Revert "Factorize common AND factors out of OR predicates to supportfilter Pu… (#3859)" #3897 (alamb)
MINOR: Add notes on Apache Reporter #3893 (andygrove)
Allow configuring collection of statistics during TPC-H benchmarks #3889 (isidentical)
Improve formatting of binary expressions #3884 [sql] (andygrove)
Multiple optimizer passes #3880 (andygrove)
[MINOR] Update docs with newly added configuration values #3877 (alamb)
[MINOR] Add a hint about how to resolve the Cargo.lock CI check #3876 (alamb)
Add LogicalPlan::ViewTable support in datafusion-proto #3875 (r4ntix)
Optimize the concat_ws function #3869 (HaoYang670)
Implement foundational filter selectivity analysis #3868 (isidentical)
Update TableProviderFactory trait to support real-world use-cases #3867 (avantgardnerio)
put subquery's equal clause into join on clauses instead of filter cl… #3862 (AssHero)
Factorize common AND factors out of OR predicates to support filterPu… #3859 (Ted-Jiang)
Enable mimalloc by default in benchmark #3853 (Dandandan)
Refactor Expr::Between to use a struct #3850 [sql] (b41sh)
Handle cardinality estimation for disjoint inner and outer joins #3848 (isidentical)
Add setting for statistics collection #3846 (Dandandan)
Update to arrow 25.0.0 #3844 [sql] (tustvold)
Tweak list of optimization rules #3841 (Dandandan)
Refactor Expr::GetIndexedField to use a struct #3838 [sql] (ygf11)
Infer the count of maximum distinct values from min/max #3837 (isidentical)
Refactor Expr::Like, Expr::ILike, Expr::SimilarTo to use a struct #3836 [sql] (b41sh)
Refactor Expr::BinaryExpr to use a struct #3835 [sql] (zhoudongyan)
update postgres version to 15 in integration test #3831 (Jimexist)
Fix the panic when lpad/rpad parameter is negative #3829 (ZuoTiJia)
MINOR: Document SHOW ALL in the users guide #3826 (alamb)
MINOR: Add datafusion-cli documentation on showing configuration #3825 (alamb)
Add/Remove Division Rules #3824 (retikulum)
Minor: Sort the output of SHOW ALL by config name #3823 [sql] (alamb)
Add precision != 0 check when making decimal type #3818 [sql] (HaoYang670)
Infer schema when running benchmarks against parquet #3817 (andygrove)
Finish removing deprecated datafusion::logical_plan module #3816 (andygrove)
Clarify initial example with respect to capitalization #3815 (alamb)
Improve expression simplification by running it twice #3811 (alamb)
Make expression manipulation consistent and easier to use: combine/split filter conjunction, etc #3810 (alamb)
Consolidate expression manipulation functions into datafusion_optimizer #3809 (alamb)
Optimize regexp_replace when the input is a sparse array #3804 (isidentical)
Stop ignoring errors when writing DataFrame to csv, parquet, json #3801 (andygrove)
Update datafusion-cli Cargo.lock to fix CI check on master #3799 (alamb)
MINOR: Benchmark regression tests #3790 (andygrove)
MINOR: Optimizer example and docs, deprecate Expr::name #3788 (andygrove)
Join cardinality computation for cost-based nested join optimizations #3787 (isidentical)
Optimizer now simplifies multiplication, division, module arg is a literal Decimal zero or one #3782 (drrtuy)
Implement parquet page-level skipping with column index, using min/ma… #3780 (Ted-Jiang)
Bump actions/labeler from 4.0.1 to 4.0.2 #3779 (dependabot[bot])
MINOR: correct ListingOptions.try_new docs to include the enabled stat collection #3775 (isidentical)
Teach a negative NULL expression to return NULL instead of an error #3771 (drrtuy)
Add benchmarks for testing row filtering #3769 (thinkharderdev)
move type coercion of agg and agg_udaf to logical phase #3768 (liukun4515)
User Guide: Add EXPLAIN to SQL reference #3767 (unvalley)
Allow specialized implementations to produce hints for the array adapter #3765 (isidentical)
Fix optimizer regression with simplifying expressions in subquery filters #3764 (andygrove)
Run all datafusion-examples in CI tests #3761 (alamb)
MINOR: Remove deprecated module datafusion::logical_plan::plan #3759 (andygrove)
Refactor Expr::Case to use a struct #3757 [sql] (andygrove)
Do not run labeler CI check if it would fail due to permissions #3756 (alamb)
MINOR: Improvements to scalar_subquery_to_join error handling #3754 (andygrove)
Always track the final size of the in-mem sorted arrays #3753 (isidentical)
Fix DataFrame::select_columns to handle column names with a period #3751 (zhoudongyan)
Fix ListingTableUrl to decode percent #3750 (unvalley)
remove type coercion for physical ScalarFunction #3749 (liukun4515)
CI: Add a new run to check whether datafusion-cli lock file is up-to-date #3745 (isidentical)
Add datafusion example of expression apis #3741 (alamb)
fix subquery where exists distinct #3732 (b41sh)
Remove some uneeded code in CommonSubexprEliminate #3730 (alamb)
Consolidate and better tests for expression re-rewriting / aliasing #3727 (alamb)
Fix output schema generated by CommonSubExprEliminate #3726 (alex-natzka)
Add type coercion rule for concat and concat_ws #3721 (HaoYang670)
Expose and document a simpler public API for simplify expressions #3719 (ygf11)
Remove dead code in UnwrapCastExprRewriter that may mask errors #3703 (alamb)
Fix DataFrame::with_column to handle creating column names with a period #3700 (alamb)
Add simplification rules for the CONCAT function #3684 (HaoYang670)
Compressed CSV/JSON support #3642 [sql] (Licht-T)
Simplify serialization by removing redundant PrimitiveScalarValue #3612 (alamb)
Pushdown single column predicates from ON join clauses #3578 (AssHero)
Simplify the serialization of ScalarValue::List #3547 (alamb)
Generate hash aggregation output in smaller record batches #3461 (milenkovicm)
Improve doc on lowercase treatment of columns on SQL #3385 (nanicpc)

13.0.0-rc1 (2022-10-07)

Full Changelog

13.0.0 (2022-10-06)

Full Changelog

Breaking changes:

Make ObjectStoreProvider fallible (return Result rather than Option) #3584 (tustvold)
Make OptimizerConfig a builder style API #3525 (alamb)

Implemented enhancements:

remove type coercion for ScalarUDF in the physical phase #3734
Allow with statements to specify their columns alongside their expression names #3716
Support SQLDataType::Timestamp(TimezoneInfo) #3693
support type coercion for case when expr #3673
Add simplification rules for the Modulo operator #3664
Add TIMESTAMPTZ #3659
Simplify A * 0 and A * null. #3626
change rule of PreCastLitInComparisonExpressions to unwrap cast rule after #3582 #3622
Optimize regex_replace with a known pattern / replacement #3613
Simplify CONCAT_WS(NULL, ..) to NULL #3607
Add OctoSQL to list of systems powered by DataFusion #3605
Prevent over-allocation (and spills) on TopK queries #3596
Allow ObjectStoreProvider to return None (return Result<Option> rather than Result) #3594
simplify between expr should consider the data type #3587
make type coercion simple and remove the evaluate logic #3585
ReduceOuterJoin optimizer support cast or try_cast expr. #3565
Support type coercion for subquery #3557
Make ParquetScanOptions public and expose a reference to the scan options from ParquetExec #3550
Use fetch limit in get_sorted_iter #3544
Push limit to sort #3528
Execute sorts in parallel when limit is used after sort #3526
Consolidate optimizer passes in optimizer module for better testing #3524
Support Top-K query optimization for `ORDER BY <EXPR> [ASC #3515
support the type coercion for like unlike istrue isfalse isunknown #3509
Automate the pushing of releases to Homebrew #3506
Add extra DATE_PART units that are already supported in arrow-rs #3502
Release datafusion-cli 12.0.0 on Homebrew #3501
Make from_proto_binary_op public #3489
coercion between decimal and other types lacking, compared to other numeric types #3479
move type coercion for inlist from physical phase to logical phase #3468
Make datafusion::physical_plan::file_format::file_strean::FileStream public #3466
Support using offset index in ParquetRecordBatchStream when pushing down RowFilter #3456
Support timestamp data type in In_list node #3449
Evaluate expressions after type coercion #3431
Make a convenience function to register a single RecordBatch as a table from SessionContext #3426
add datafusion-cli support of external table locations that object_store supports #3424
pruning support cast/try_cast expr #3414
Add documentation on querying against files in object store such as S3 #3399
Remove type-coercion from physical planner #3388
support Statement::ShowVariable to show session configs #3364
Support RowFilter in ParquetExec #3360
Apply TypeCoercion rule before FilterPushDown #3289
Add support for get / show timezone #3255
Consider adding DataFusion to ClickBench benchmarks #2902
filter_push_down panics on semi/anti join with join filters #2888
Migrate the cross join -> inner join optimization from the planner to the optimizer #2859
ObjectStore write support #2185
DataFusion should scan Parquet statistics once per query #871
Extend & generalize constant folding / evaluation in logical optimizer #237

Fixed bugs:

projection_push_down produces invalid aggregate plans in some cases #3738
Time With Time Zone should raise error until DataType::Time64 support tz #3715
SQL Planner doesn't distinguish normal CTEs from the recursive ones. #3713
Fix inconsistency between column name formats #3711
Optimizer rule 'projection_push_down' failed due to unexpected error: Error during planning: Aggregate schema has wrong number of fields. Expected 3 got 8 #3704
Optimizer regressions in unwrap_cast_in_comparison #3690
Internal error when evaluating a predicate = "The type of Dictionary(Int16, Utf8) = Int64 of binary physical should be same" #3685
Specialized regexp_replace should early-abort when the the input arrays are empty #3647
Internal error: Failed to coerce types Decimal128(10, 2) and Boolean in BETWEEN expression #3646
Internal error: Failed to coerce types Decimal128(10, 2) and Boolean in BETWEEN expression #3645
Type coercion error: The type of Boolean AND Decimal128(10, 2) of binary physical should be same #3644
LEFT JOIN not working as expected, error message is confusing #3639
INTERSECT and EXCEPT don't return an error when 2 sets have the different number of columns #3632
The datafusion-cli panics when union 2 table with different number of columns. #3630
The expression col(a) / null is not optimized. #3624
s3_build_error test may fail in some environments #3601
New clippy errors appears to be break the CI on the master #3597
StringConcat gives inconsistent result with concat when containing null #3569
simplify_expressions don't support different data type for binary #3556
Broken logical plan serialization for aggregation queries #3555
Aggregate filters do not get pushed down to table scan #3546
docs.rs cannot build datafusion-proto crate #3538
DataFusion serialization doesn't handle ScalarValue::Dictionary, Binary, LargeBinary, Time64, IntervalMonthDayNano, Struct #3531
What should be returned when trying to get a config in invalid format? #3505
Dividing decimal type gives wrong error: "170141183460469231731687303715884105727 is too large to store in a Decimal128 #3498
Add BitwiseXor in function from_proto_binary_op #3495
comparison operations with a scalar null and decimal array panics #3487
Union columns with different types #3467
Can't get the right logical plan after optimizer #3421
Fix conflict between simplify_expression rule and CAST expressions #3409
Empty array giving error #2439
Internal error: Unsupported data type in hasher: FixedSizeBinary(16) #1516
Predicates on to_timestamp do not work as expected with "naive" timestamp strings #765
Address performance/execution plan of TPCH query 19 #78
Bug fix: expr_visitor was not visiting aggregate filter expressions #3548 (andygrove)

Documentation updates:

Publish 8.0.0 user guide #2558
MINOR: Add Dask SQL to list of projects powered by DataFusion #3581 (andygrove)
Add Parseable as Datafusion user #3471 (nitisht)

Closed issues:

Upgrade to Arrow 24.0.0 #3689
what's the best practice to get a single value from arrow array? #3497
The data type of predicate in the row filter should be same in the binary expr #3469
Extend constant folding and parquet filtering support #188
Add FORMAT to explain plan and an easy to visualize format #96

Merged pull requests:

Build aggregate schema in Aggregate::try_new #3739 (andygrove)
delete type coercion for scalar udf in the physical phase #3735 (liukun4515)
Consolidate coercion code in datafusion_expr::type_coercion and submodules #3728 (alamb)
Skip filter push down on semi/anti joins #3723 (andygrove)
Raise Unsupported SQL type for Time(WithTimeZone) and Time(Tz) #3718 [sql] (waitingkuo)
Support column aliases specified by WITH statements #3717 [sql] (isidentical)
Reject recursive CTEs before processing the sub-expressions #3714 [sql] (isidentical)
Make column name consistent between Expr::name and Display/Debug #3712 [sql] (andygrove)
Fix aggregate type coercion bug #3710 (alamb)
MINOR: Add Expr::canonical_name and improve docs on Expr::name #3706 (andygrove)
Remove type coercions from ScalarValue and aggregation function code #3705 (ozankabak)
unwrap_cast_in_comparison: fix bug which can find the field for the schema #3699 (liukun4515)
bump sql-parser 0.25 #3698 [sql] (xudong963)
Move optimizer init to optimizer crate #3692 (andygrove)
Upgrade arrow parquet and arrow-flight to 24.0.0 #3691 [sql] (alamb)
Fix bug in dictionary coercion and allow better coercion #3688 (alamb)
[MINOR] Improve docstrings in binary_rule.rs #3687 (alamb)
[MINOR] Add ScalarValue::new_utf8, clean up creation of literals in casting tests #3680 (alamb)
Disable code coverage until we figure out why it is broken #3679 (alamb)
move type coercion for case when expr #3676 (liukun4515)
Update sqlparser to 0.24.0 #3675 [sql] (alamb)
Fail if field lengths are not same in INTERSECT and EXPECT #3674 (askoa)
Simplification Rules for Modulo Operator #3669 (askoa)
change pre_cast_lit_in_comparison to unwrap_cast_in_comparison #3662 (liukun4515)
restore optimization for between in simplify expression rule #3661 (liukun4515)
add timestamptz #3660 [sql] (waitingkuo)
remove the type coercion in the simplify_expressions rule #3657 (liukun4515)
Cache collected file statistics #3649 (mateuszkj)
make regexp_replace early abort with empty input #3648 (isidentical)
Check each query has same number of columns when building the UNION plan #3638 (HaoYang670)
move the type coercion to the beginning of the optimizer rule and support type coercion for subquery #3636 (liukun4515)
Add documentation for querying S3 data with CLI #3631 (andygrove)
Simplify multiplication by 0 and by null #3627 (HaoYang670)
Simplify null division. #3625 (HaoYang670)
support cast/try_cast expr in reduceOuterJoin #3621 (AssHero)
MINOR: fix TPC-H conversion function to not miss a row of data #3620 (kmitchener)
Document ObjectStoreProvider #3619 (tustvold)
[feat] Support using offset index in ParquetRecordBatchStream when pu… #3616 (Ted-Jiang)
Optimize regex_replace for scalar patterns #3614 (isidentical)
Simplify concat_ws(null, ..) to null #3608 (HaoYang670)
MINOR: improve docstrings on SessionContext #3603 (alamb)
Merge s3_success and s3_build_error tests into one test #3602 (Licht-T)
add register_batch and read_batch to SessionContext to register a single RecordBatch as a table #3600 (BaymaxHWY)
[CI] Fix the newly added linting errors to make clippy happy #3598 (isidentical)
Prevent over-allocations (and spills) on sorts with a fixed limit #3593 (isidentical)
update datafusion cli deps #3588 (Jimexist)
Update cranelift* dependencies 0.87 --> 0.88 #3586 (alamb)
Fix docs.rs #3580 (avantgardnerio)
Fix build #3576 (alamb)
Use consistent name for TimeUnit::Millisecond #3575 (alamb)
Fix logical plan serialization #3574 (thinkharderdev)
Custom window frame logic (support ROWS, RANGE, PRECEDING and FOLLOWING for window functions) #3570 [sql] (metesynnada)
fix comparison of decimal array with null scalar #3567 (kmitchener)
Reduce dependencies of datafusion-sql crate #3566 [sql] (mbrobbel)
Update pbjson-types requirement from 0.3 to 0.5 #3560 (dependabot[bot])
Update pbjson requirement from 0.3 to 0.5 #3559 (dependabot[bot])
Update pbjson-build requirement from 0.3 to 0.5 #3558 (dependabot[bot])
MINOR: enable q19 in TPCH #3553 (kmitchener)
MINOR: remove out-of-date is_dictionary checks from binary_rule.rs #3552 (kmitchener)
Make ParquetScanOptions public and add method to get a reference from… #3551 (thinkharderdev)
fix coercion of null for decimal math in binary_rules #3549 (kmitchener)
Use fetch limit in get_sorted_iter #3545 (Dandandan)
feat: allow object store registration from datafusion-cli #3540 (turbo1912)
Actually test that ScalarValues are the same after round trip serialization #3537 (alamb)
Add serialization of ScalarValue::Struct #3536 (alamb)
Add serialization of ScalarValue::IntervalMonthDayNano #3535 (alamb)
Add serialization of ScalarValue::Binary and ScalarValue::LargeBinary, ScalarValue::Time64 #3534 (alamb)
MINOR: Impl Debug for TableReference and ResolvedTableReference #3533 [sql] (andygrove)
Add support for serializing ScalarValue::Dictionary to datafusion-proto #3532 (alamb)
Push down limit to sort #3530 (Dandandan)
Execute sort in parallel when a limit is used after sort #3527 (Dandandan)
Config support type conversion #3522 (comphead)
MINOR: Add more execs to list of supported execs #3519 (andygrove)
fix divide by zero not throwing proper error for decimal #3517 (kmitchener)
Make FileStream and FileOpener public #3514 (thinkharderdev)
feat: Union types coercion #3513 [sql] (gandronchik)
[DataFrame] - Add cache function for DataFrame #3512 (francis-du)
type coercion: support is/is_not_bool/like/unknown expr #3510 (liukun4515)
MINOR: remove unused dependencies #3508 (waynexia)
Automate postrelease publishing to Homebrew #3507 (iajoiner)
Add additional DATE_PART units #3503 (jonmmease)
Add BitwiseXor in function from_proto_binary_op #3496 (askoa)
Make the function from_proto_binary_op public #3490 (askoa)
minor: fix bug in downcast_value! macro (T --> $T) #3486 (alamb)
add time_zone into ConfigOptions #3485 [sql] (waitingkuo)
[MINOR] Change downcast_value! macro so it does not need to use use std::any::type_name; #3484 (alamb)
Convert more cross joins to inner joins (Address performance/execution plan of TPCH query 19) #3482 (DhamoPS)
[minor] Remove unused arg in macro in Inlist #3474 (Ted-Jiang)
inlist: move type coercion to logical phase #3472 (liukun4515)
Use the column data type as the NULL data type in the row filter #3470 (liukun4515)
apply type coercion before filter pushdown #3459 (liukun4515)
add FixedSizeBinary support to create_hashes #3458 (mcassels)
Support ShowVariable Statement #3455 [sql] (waitingkuo)
Add additional pruning tests with casts, handle unsupported predicates better #3454 (alamb)
Add InList support for timestamp type. (#3449) #3450 (Ted-Jiang)
Evaluate expressions after type coercion #3444 (Dandandan)
remove type coercion in the binary physical expr #3396 (liukun4515)
Use arrow row format in SortPreservingMerge ~50-70% faster #3386 (tustvold)
Pushdown RowFilter in ParquetExec #3380 (thinkharderdev)

12.0.0 (2022-09-12)

Full Changelog

Breaking changes:

Pass return_type to AccumulatorFunctionImplementation for user defined aggregates #3428 (alamb)
Use usize rather than Option<usize> to represent Limit::skipand Limit::offset #3374 [sql] (HaoYang670)
Deprecate legacy datafusion::logical_plan module #3338 (andygrove)
Update signature for Expr.name so that schema is no longer required #3336 (andygrove)
MINOR: rename optimizer rule to ScalarSubqueryToJoin #3306 (kmitchener)
Add top-level Like, ILike, SimilarTo expressions in logical plan #3298 [sql] (andygrove)
Upgrade to sqlparser 0.22 #3278 [sql] (andygrove)
Expr variants for boolean operations #3275 [sql] (sarahyurick)
Upgrade to sqlparser 0.21 #3200 [sql] (andygrove)
Add SQL planner support for Like, ILike and SimilarTo, with optional escape character #3101 [sql] (andygrove)

Implemented enhancements:

support cast inside values #3446
update TPCH test schemas to use Decimal128 from Float #3435
Include Bitwise operators in the documentation #3434
How to read excel file with datafusion? #3433
Pass return type to the accumulator state factory in aggregates #3427
Support bitwise XOR operator (#) #3420
support InList with datatype Date32 #3412
add simplification for between expression during logical plan optimization #3402
Replace From trait with TryFrom trait for datafusion-proto crate #3401
update TPC-H benchmark to Decimal types from Float #3392
Use usize to represent Limit::skip #3369
Avoid coping in LogicalPlan::expressions #3368
Upgrade to Arrow 22 #3362
Eliminate OFFSET 0 in the logical plan optimization #3355
Add ability to get unoptimized logical plan from DataFrame #3340
Allow IDEs to recognize generated code #3332
CAST should not change the name of an expression #3326
add SQL support for unsigned integers #3325
Review use of panic in datafusion-proto crate #3318
Review use of panic in datafusion-sql crate #3315
Review use of panic in datafusion-optimizer crate #3314
Review use of panic in datafusion-expr crate #3312
Support registration of custom TableProviders through SQL #3310
Support binary data in sha hash functions #3308
add SQL support for tinyint and unsigned versions of all INTs #3307
Support binary types in InList expression #3300
Physical planner should map IsTrue and similar expressions to IsDistinctFrom #3288
Introduce physical plan version of Operator enum #3269
Introduce Expr variants for IS [NOT] TRUE / FALSE / UNKNOWN #3268
Add support for non-correlated subqueries #3266 [sql]
(Re-)add support for glob patterns in ListingTableUrl #3261
PreCastLitInComparisonExpressions should use ExprRewriter and supported nested expressions #3259
implement DROP VIEW #3251
Upgrade to Arrow 21 #3224
Add TypeCoercion optimizer rule #3221
Create bench for approx_percentile_cont aggregate #3217
Add SQL query planner support for DISTRIBUTED BY #3207
Support "IS [NOT] UNKNOWN" syntax #3195
sqlparser 0.21 upgrade #3192
Re-implement parsing/planning for SHOW TABLES due to sqlparser changes #3188
Support SUM AVG, MIN, MAX on Time columns. #3166
Support "IS TRUE/FALSE" syntax #3159
Support number of histogram bins in approx_percentile_cont #3145
Support create ApproxPercentileAccumulator with TDigest max_size #3142
Remove support for array function and only support array[] style postgres syntax #3115
Allow inline column aliases for create view #3108 [sql]
Add support for Postgres SIMILAR TO and ILIKE syntax #3099 [sql]
Update SQL reference in user guide to cover all supported syntax #3091
DataFusion prelude should import all logical expression functions #3068
Proposal: Add similar to operator #3016 [sql]
Release DataFusion 11.0.0 #3012
Implement "SHOW CREATE TABLE" for external tables #2848
Change java package names in protobuf files #2513
When creating DFField from Expr we should provide input plan not input schema #2456
Support "IS NOT TRUE/FALSE" syntax #2265
RFC: Spill-To-Disk Object Storage Download #2205
Support for BitwiseAnd &, BitOr | binary operators #1619
[Question] Usage of async object store APIs in consuming code #1313
Allow User Defined Aggregates to return multiple values / structs #600
Implement vectorized hashing for dictionary types #331

Fixed bugs:

Intermittent build error when changing selected features #3366
sql::timestamp::timestamp_add_interval_months failing since September 1st #3327
sql::timestamp::timestamp_add_interval_months test fails #3322
test case timestamp_add_interval_months failed on master branch #3321
datafusion-proto does not support untyped null scalar values #3302
ConfigOptions creation is slow #3295
FilterPushDown optimization through UNION ALL results in SchemaError #3281
Execute LogicalPlans after building for TPCH Benchmarks #3273
CREATE TABLE should return empty DataFrame #3265 [sql]
CREATE EXTERNAL TABLE from CSV creates a table with no columns if there is just a header row #3263
View TableProvider ignores projections, resulting in invalid plans #3240
CREATE VIEW should return an empty dataframe on success #3236
DISTRIBUTE BY expressions get removed during optimization #3234
datafusion cannot recognize chinese charactors. #3203
Panicked at 'byte index 1 is out of bounds on invalid query #3190
like_nlike_with_null_lt fails with latest sqlparser code #3187
Interval Literal output inconsistent date_type #3180
array function allows different data types #3123
eq operator doesn't work on binary data #3117
incorrect where clause comparison while using table alias #3073
Some functions are incorrectly declared as unary #3069
once now() is called in a statement, it forever returns the same value #3057
single_distinct_to_groupby panic when group by expr is a binaryExpr #2994
Cannot have order by expression that references complex group by expression #2360
Fix some bugs in TypeCoercion rule #3407 (andygrove)
MINOR: Stop ignoring AggregateFunction::distinct in protobuf serde code #3250 (andygrove)
Add assertion for invariant in create_physical_expression and fix ViewTable projection #3242 (andygrove)
Fix bug where optimizer was removing Partitioning::DistributeBy expressions #3229 (andygrove)

Documentation updates:

[minor] add Coverage Status in readme #3220 (Ted-Jiang)

Closed issues:

Add \i command to datafusion-cli #1906
TPC-H Query 15 #166

Merged pull requests:

minor: fix some typo. #3453 (jackwener)
Update criterion requirement from 0.3 to 0.4 #3452 (dependabot[bot])
Update object_store requirement from 0.4.0 to 0.5.0 #3451 (dependabot[bot])
add cast support inside values #3447 [sql] (kmitchener)
Use hash repartitioning for aggregates on dictionaries #3445 (isidentical)
Review unwrap and panic from the aggregate directory of datafusion-physical-expr #3443 (iajoiner)
MINOR: Implement protobuf serde for all binary operators #3441 (andygrove)
MINOR: Add accessor methods to DateTimeIntervalExpr #3440 (andygrove)
update TPCH-mimicking tests to Decimal data type from Float, matching the benchmark #3438 (kmitchener)
Include Bitwise operators in the documentation #3436 (askoa)
minor: make sql number parsing slightly more efficient + functional #3432 [sql] (alamb)
Implement bitwise XOR operator (#) #3430 [sql] (askoa)
Replace From trait with TryFrom trait for datafusion-proto crate #3401 #3429 (comphead)
Tests showing user defined aggregate returning a struct #3425 (alamb)
MINOR: update optimizer rule names to be consistent style as the rest #3415 (kmitchener)
Support date32 and date 64 in inlist node #3413 (Ted-Jiang)
Update sqlparser requirement from 0.22 to 0.23 #3411 [sql] (dependabot[bot])
simplify the between expr during logical plan optimization #3404 (kmitchener)
MINOR: Improve optimizer error #3403 (andygrove)
Review panics in the sql crate #3397 [sql] (HaoYang670)
changed TPC-H benchmark to use Decimal types #3393 (kmitchener)
minor: remove redundant code. #3389 (jackwener)
Add dictionary cases to merge bench #3384 (tustvold)
Implement Eq trait for Expr and nested types #3381 (jdye64)
Minor: Improvements to type coercion rule #3379 (alamb)
MINOR: Note that most communication happens on github #3375 (alamb)
minor fix: clean data type for negative operation #3370 (liukun4515)
Fix code generation for json feature #3367 (avantgardnerio)
Review use of panic in datafusion-proto crate #3365 (comphead)
Upgrade to arrow 22 #3363 [sql] (avantgardnerio)
return empty dataframe on create table, remove a duplicate optimize call #3361 (kmitchener)
Add SQL support for tinyint , smallint, and unsigned int variants #3359 [sql] (kmitchener)
Minor: add hint in README of example #3358 (jackwener)
Collect to HashSet directly in in_list #3356 (HaoYang670)
MINOR: Add comments about rewrite_disjunctive_predicate #3351 (alamb)
[MINOR] Add debug logging to plan teardown #3350 (alamb)
MINOR: add df.to_unoptimized_plan() to docs, remove erroneous comment #3348 (kmitchener)
Replace unwrap in convert_to_ordered_float and add downcast_value #3347 (iajoiner)
Remove panics from common_subexpr_eliminate #3346 (andygrove)
Remove Result.unwrap from single_distinct_to_groupby #3345 (andygrove)
Add to_unoptimized_plan #3344 (iajoiner)
Remove panics from simplify_expressions optimizer rule #3343 (andygrove)
Remove unreachable! from filter push down rule #3342 (andygrove)
Replace panic in datafusion-expr crate #3341 (iajoiner)
Re-implement ExprIdentifierVisitor::desc_expr to use Expr::Display #3339 (andygrove)
Fix the testtimestamp_add_interval_months #3337 (HaoYang670)
Bump lz4-sys from 1.9.3 to 1.9.4 in /datafusion-cli #3335 (dependabot[bot])
Make binary operator formatting consistent between logical and physical plans #3331 (andygrove)
Fix build: Ignore failing test #3329 (andygrove)
Add InList support for binary type. #3324 (HaoYang670)
MINOR: add github action trigger #3323 (waynexia)
add explain sql test for optimizer rule PreCastLitInComparisonExpressions #3320 (liukun4515)
Custom / Dynamic table provider factories #3311 [sql] (avantgardnerio)
fix: alias group_by exprs in single_distinct_to_groupby optimizer #3305 (waynexia)
Add support for serializing null scalar values #3303 (andygrove)
Finish integrating Expr::Is[Not]True and similar expressions #3301 [sql] (andygrove)
MINOR: Remove unwrap calls from single_distinct_to_groupby optimizer rule #3299 (andygrove)
docs: update the Python library repository #3297 (haoxins)
fix: speed up ConfigOptions creation #3296 (crepererum)
Execute LogicalPlans after building for TPCH Benchmarks #3290 (DaltonModlin)
support for non-correlated subqueries #3287 (kmitchener)
Add Aggregate::try new with validation checks #3286 (andygrove)
Fix SchemaError in FilterPushDown optimization with UNION ALL #3282 (jonmmease)
Allow sorting by aggregated groups #3280 (isidentical)
Add show external tables #3279 [sql] (psvri)
Return from task execution if send fails as there is nothing more to do (faster cancel / limit) #3276 (nvartolomei)
Let prelude import all expression functions #3274 (sadilet)
Fix no schema when CSV is only header #3272 (comphead)
support inlist for pre cast literal expression #3270 (liukun4515)
implement drop view #3267 [sql] (kmitchener)
Use ExprRewriter in pre_cast_lit_in_comparison #3260 (andygrove)
Add type coercion for UDFs in logical plan #3254 (andygrove)
Support "IS NOT TRUE/FALSE" syntax #3252 [sql] (sarahyurick)
Implement IS UNKNOWN/IS NOT UNKNOWN operators #3246 [sql] (isidentical)
support decimal data type for the optimizer rule of PreCastLitInComparisonExpressions #3245 (liukun4515)
chore: update cranelifts to 0.87.0 #3243 (yjshen)
Moved nullif out of unary functions #3241 (comphead)
MINOR: documentation updates #3239 (kmitchener)
MINOR: Add bounds check to Column physical expression #3238 (andygrove)
CREATE VIEW should return empty dataframe #3237 (kmitchener)
Support "IS TRUE/FALSE" syntax (redo) #3235 [sql] (sarahyurick)
Fix propagation of optimized predicates on nested projections #3228 (isidentical)
Add more trim test cases #3226 (ayushdg)
Upgrade to arrow 21 #3225 [sql] (avantgardnerio)
Add optimizer rule for type coercion (binary operations only) #3222 (andygrove)
[Improve] Use arrow::compute::sort in approx_percentile_cont #3219 (Ted-Jiang)
[minor] fix bench aggregate_query_sql meta #3218 (Ted-Jiang)
minor: refactor simplify negate #3213 (jackwener)
MINOR: update cargo.lock and rust-version for datafusion-cli #3212 (kmitchener)
fix issue with now() returning same value across statements #3210 (kmitchener)
Add support for inline column alias in CREATE VIEW #3209 [sql] (DaltonModlin)
Add SQL query planner support for DISTRIBUTE BY #3208 [sql] (andygrove)
minor: remove test code that's in the arrow library now #3206 (kmitchener)
Use .get() to avoid panic #3201 [sql] (jklamer)
[Minor] Reduce code duplication creating ScalarValue::List #3197 [sql] (alamb)
Clean up CI workflows by removing "matrix" strategy, simplifying names #3196 (alamb)
optimizer: add framework for the rule of pre-add cast to the literal in comparison binary #3185 (liukun4515)
Fix clippy #3182 (alamb)
MINOR: Add notes on writing release blog posts #3179 (andygrove)
add min/max for time #3178 (waitingkuo)
Recursively apply remove filter rule if filter is a true scalar value #3175 (byteink)
Update ahash requirement from 0.7 to 0.8 #3161 [sql] (alamb)
Support number of centroids in approx_percentile_cont #3146 (Ted-Jiang)
Introduce \i command to execute from a file #3136 (turbo1912)
impl binary ops between binary arrays and scalars #3124 (ozgrakkurt)

11.0.0 (2022-08-16)

Full Changelog

Breaking changes:

Implement exact median, add AggregateState #3009 [sql] (andygrove)

Implemented enhancements:

Make RowAccumulator public #3138
docs: proposal for consolidating docs into a Contributor Guide #3127
feat: support Timestamp +/- Interval #3103
a arrow_typeof like posgresql's pg_typeof #3095
Add DataFrame section to user guide #3066
Document all scalar SQL functions in user guide #3065
Simplify implementation of approx_median so that it can be exposed in Python #3063
Support double quoted literal strings for dialects(such as mysql,bigquery) #3055
Simplify / speed up implementation of character_length to unicode points #3049
Follow-up on Clickbench benchmark #3048
Why the PhysicalPlanner is an async trait ? #3032
Optimize file stream metrics. #3024
Proposal: Enable typed strings expressions for VALUES clause #3017
Proposal: Add date_bin function #3015
The upcoming release of Arrow (20?) breaks datafusion #3006
Can I select some files for query based on the filtering rules in the directory? #2993
Rename FormatReader to FileOpener #2990
Derive Hash trait for JoinType #2971
CAST from Utf8 to Boolean #2967
Add baseline_metrics for FileStream to record metrics like elapsed time, record output, etc #2961
Example to show how to convert query result into rust struct #2959
simplify not clause #2957
Implement Debug for ColumnarValue #2950
Parallel fetching of column chunks when reading parquet files #2949
Extension mechanism for SessionConfig #2939
Streaming CSV/JSON Object Store Read #2935
Support CSV Limit Pushdown to Object Storage #2930
Add support for pow scalar function #2926
Add support for exact median aggregate function #2925
Support mean as synonym for avg #2922
Rename a column name #2919
Move ScalarValue tests alongside implementation, move from_slice to core #2913
Fail gracefully if optimization rule fails #2908
Make ObjectStoreRegistry as a trait which can allow Ballista to introduce a self registry ObjectStoreRegistry #2905
Remove datafusion-data-access crate #2903
Improve formatting of logical plans containing subquery expressions #2898
Atan2 added to built-in functions #2897
The explain statements only print logical plans for debug/other purpose. #2894
JSON version of display_indent() #2889
It would be nice to have a way to generate unique IDs in optimizer rules #2886
Add support for TIME literal values #2883
Add h2o benchmark #2879
Implement from_unixtime function #2871
Add cast function for creating logical cast expression #2870
Release DataFusion 10.0.0 #2862
Implement information_schema.views #2857
Migrate from avro_rs to apache_avro #2783
Add optimizer rule to remove OFFSET 0 #2584
Preserve Element Name in ScalarValue::List #2450
Add EXISTS subquery support to Ballista #2338
Add documentation on supported functions to datafusion website #1487
documentations for datafusion-cli can be consolidated a bit more #1352
Optimizer: Predicate Rewrite pass for TPCH Q19 #217
feat: add optimize rule rewrite_disjunctive_predicate #2858 (xudong963)

Fixed bugs:

Regression in SQL support for ORDER BY and aliased expressions #3160
panic when deal with @ operator #3137
Incorrect type coercion rule for date + interval #3093
Cast string to timestamp crash while we input time before 1970 with floating number second #3082
INTEGER type does't work while importing csv #3059
Cannot GROUP BY Binary #3050
incorrect i32 coercion for to_timestamp #3046
Error pruning IsNull expressions: Column 'instance_null_count' is declared as non-nullable but contains null values #3042
I want to query some files in a directory. Is there any way? #3013
The expression to get an indexed field is only valid for List types (common_sub_expression_eliminate) #3002
Double to_timestamp_seconds produces abnormal result #2998
External parquet table fails when schema contains differing key / value metadata #2982
SELECT on column with uppercase column name fails with FieldNotFound error #2978
panic reading AWS-generated parquet file #2963
Can't filter rowgroup for parquet prune for some data type #2962
CI test is failing with final link failed: No space left on device #2947
bug: new ObjectStore breaks backward compatibility with contrib plugins #2931
bug: file types handled wrong #2929
bug: changing the number of partitions does not increase concurrency #2928
csv_explain fails on RC verifier #2916
index out of range error from datafusion_row::write::write_field #2910
Optimization rule CommonSubexprEliminate creates invalid projections #2907
serde_json requires that either std (default) or alloc feature is enabled #2896
Inconsistent type coercion rules with comparison expressions #2890
Doc Error: the test directory link 404 which is in CONTRIBUTING.md #2880
Round trips through ScalarValue's sometimes don't preserve types (e.g. change types from DictionaryArray) #2874
Error with CASE and DictionaryArrays: ArrowError(InvalidArgumentError("arguments need to have the same data type")) #2873
window functions not supported in expressions #2869
Unable to work with month intervals #2796
Discord invite link in communication page has expired #2743
Test (path normalization) failures while verifying release candidate 9.0.0 RC1 #2719
Reading parquet with (pre-release) arrow fails with "out of order projection is not supported" #2543
Fix SQL planner bug when resolving columns with same name as a relation #3003 [sql] (andygrove)
fix RowWriter index out of bounds error #2968 (comphead)
fix: support decimal statistic for row group prune #2966 (liukun4515)
Fix invalid projection in CommonSubexprEliminate #2915 (andygrove)

Documentation updates:

MINOR: Fix broken links in contrib guide #3135 (andygrove)
MINOR: User Guide: Move expressions to top-level page #3134 (andygrove)
User Guide: Combine CLI pages #3133 (andygrove)
User Guide: Add documentation for JOIN syntax #3130 (andygrove)
separate contributors guide #3128 (kmitchener)
minor: remove python docs, now they're in another project #3119 (kmitchener)
minor: doc fixes: fix link to datafusion-python project and add link to slides for rece… #3118 (kmitchener)
Add all scalar SQL functions to user guide #3090 (andygrove)
Add DataFrame reference to the user guide #3067 (andygrove)
MINOR: Add CeresDB to list of products using DataFusion #3060 (andygrove)
Minor: improve some docstrings about pruning #3041 (alamb)
doc: add a new video link about datafusion #3025 (xudong963)
Update README.md to add CnosDB into the Known Uses #2933 (cnoshb)

Performance improvements:

Use code points instead of grapheme clusters for string functions #3054 (Dandandan)

Closed issues:

Rename do_data_time_math() to do_date_time_math() #3172
Automatic version updates for github actions with dependabot #3106
[EPIC] Proposal for Date/Time enhancement #3100
Upgrade prost/tonic everywhere #3028
[Question] interested in helping with documentation #2866
Introducing a new optimizer framework for datafusion. #2633
Enable discussion tab? #2350
Add support for AVG(Timestamp) types #200
TPC-H Query 22 #175
TPC-H Query 21 #172
TPC-H Query 20 #171
TPC-H Query 17 #168
TPC-H Query 11 #163
TPC-H Query 4 #160
TPC-H Query 2 #159
[Datafusion] Optimize literal expression evaluation #106

Merged pull requests:

Rename do_data_time_math() to do_date_time_math() #3173 (JasonLi-cn)
[Minor] Remove some redundant code #3169 (alamb)
Support INTEGER again in addition to INT in CREATE TABLE and CAST statements #3167 [sql] (alamb)
Fix regression in SQL parser related to resolution of aliased expressions #3165 [sql] (andygrove)
update cargo lock #3164 (waitingkuo)
add test case for cast_timestamp_before_1970 #3163 (waitingkuo)
Return proper error message for ill formed variable reference #3162 (alamb)
Remove outdated license text left over from arrow repo #3154 (alamb)
Expose RowAccumulator in physical_plan #3151 (iajoiner)
Rename DateIntervalExpr to DateTimeIntervalExpr #3150 (alamb)
Bump actions/labeler from 4.0.0 to 4.0.1 #3144 (dependabot[bot])
User Guide: Add documentation for subquery syntax #3132 (andygrove)
MINOR: User Guide: Move Data Types and Information Schema to their own pages #3131 (andygrove)
Minor: Clean up array test #3121 (alamb)
add arrow_typeof #3120 (waitingkuo)
Bump actions/labeler from 2.2.0 to 4.0.0 #3114 (dependabot[bot])
Bump actions/checkout from 2 to 3 #3113 (dependabot[bot])
Bump actions/setup-node from 2 to 3 #3112 (dependabot[bot])
Bump actions/setup-python from 3 to 4 #3111 (dependabot[bot])
Feature/support timestamp plus minus interval #3110 (JasonLi-cn)
docs: fix typo #3109 (dzvon)
Remove offset if its zero #3102 (turbo1912)
Hash binary values #3098 [sql] (Dandandan)
Update to object_store 0.4 #3089 (tustvold)
Add cast function for creating cast expression #3084 (turbo1912)
Upgrade to arrow 20.0.0 (but no change to object_store), including prost, and tonic #3083 [sql] (avantgardnerio)
impl Debug for ColumnarValue, add some docs #3076 (alamb)
[Minor] run cargo update in datafusion-cli directory #3075 (alamb)
update cargo.lock in datafusion-cli #3074 (waitingkuo)
Update sql parser to v0.20.0 #3072 [sql] (waitingkuo)
Add opening, scanning, processing metrics in file stream #3070 (Ted-Jiang)
Simplify approx_median implementation, expose via DataFrame API #3064 [sql] (andygrove)
docs: fix PruningStatistics example and some typos #3062 (roeap)
feat: support double quoted literal strings for dialects(such as mysql,bigquery,spark) #3056 [sql] (Rachelint)
Allow Overriding AsyncFileReader used by ParquetExec #3051 (Cheappie)
to_timestamp i32 coerced to i64 #3047 (waitingkuo)
Fix IsNull pruning expression generation without null_count statistics #3044 (alamb)
feat: Support week, decade, century for Interval literal #3038 [sql] (ovr)
feat: Support Binary bitwise shift operators (<< and >>) #3037 [sql] (ovr)
Use concat_elements_utf8 from arrow rather than custom kernel #3036 (alamb)
minor: update minimal rust version to 1.62, matching arrow-rs #3035 [sql] (kmitchener)
feat: Add date_bin built-in function #3034 (stuartcarnie)
Split binary_expr.rs into smaller modules #3026 (alamb)
feat: Enable typed strings expressions for VALUES clause #3018 [sql] (stuartcarnie)
fix typo for PR3003 #3011 (waitingkuo)
feat: Add support for TIME literal values #3010 [sql] (stuartcarnie)
add TimeUnit::Second as signature for ToTimestampSeconds #3004 (waitingkuo)
Rename FileReader to FileOpener (#2990) #2991 (tustvold)
minor: collation the prune test #2986 (liukun4515)
Optionally skip metadata from schema when merging parquet files #2985 (alamb)
[Minor] Extract interval parsing logic, add unit tests #2984 [sql] (alamb)
Update sqlparser to 0.19 #2981 [sql] (alamb)
test: add file/SQL level test for pruning parquet row group with decimal data type. #2977 (liukun4515)
Derive Hash for JoinType #2972 (liurenjie1024)
Example that shows how to convert query result into rust struct #2959 #2969 (thomas-k-cameron)
Add baseline_metrics for FileStream to record metrics like elapsed ti… #2965 (Ted-Jiang)
test: add test for decimal and pruning for decimal column #2960 (liukun4515)
Simplify expressions with NOT clause #2958 (AssHero)
chore: update jit-related dependencies #2956 (xudong963)
Update to arrow 19.0.0 #2955 [sql] (alamb)
Remove CI Caching to preserve diskspace #2948 (alamb)
Add metadata_size_hint for optimistic fetching of parquet metadata #2946 (thinkharderdev)
Minor: Remove left over debugging statement #2944 (alamb)
add Atan2 #2942 (waitingkuo)
Use Arc<ObjectStoreRegistry> and remove ObjectStoreRegistry::clone #2941 (tustvold)
add extension system to SessionConfig #2940 (crepererum)
Update prost-build requirement from 0.7 to 0.10 #2937 (dependabot[bot])
Add streaming JSON and CSV reading, `NewlineDelimitedStream' (#2935) #2936 (tustvold)
feat(catalog): Implement information_schema.views #2934 [sql] (BaymaxHWY)
Support window functions in expressions by re-write projection after building window plan #2932 [sql] (AssHero)
Add pow as synonym for power #2927 (andygrove)
Add from_unixtime function #2924 (waitingkuo)
fix(aggregate): support mean as synonym avg #2923 (BaymaxHWY)
Add DataFrame::with_column_renamed #2920 (andygrove)
Run clippy with optional features #2918 (tustvold)
Fix release verification script by not overriding ARROW_TEST_DATA or PARQUET_TEST_DATA #2917 (alamb)
Move ScalarValue tests alongside implementation, move from_slice to datafusion_core #2914 (alamb)
Optimizer should have option to skip failing rules #2909 (andygrove)
Introduce ObjectStoreProvider to create an object store based on the url #2906 (yahoNanJing)
Remove datafusion-data-access crate #2904 (yahoNanJing)
Combine all comparison coercion rules #2901 (andygrove)
Add Projection::try_new and Projection::try_new_with_schema #2900 (andygrove)
Improve formatting of logical plans containing subqueries #2899 [sql] (andygrove)
add session option 'datafusion.explain.logical_plan'. when set to true, the explain statement will only print logical plans. #2895 (AssHero)
Preserve field name in ScalarValue::List #2893 [sql] (comphead)
Adds optional serde support to datafusion-proto #2892 (tustvold)
Implement ScalarValue::Dictionary and preserve type through conversion back/forth to Array #2891 (alamb)
Add an ID generator in preparation for PR 2885 #2887 (avantgardnerio)
Add support for correlated subqueries & fix all related TPC-H benchmark issues #2885 (avantgardnerio)
fix(doc): update test directory link in CONTRIBUTING.md #2882 (BaymaxHWY)
Add h2o bench groupby queries #2881 (andygrove)
Add support for month & year intervals #2797 (avantgardnerio)
Migrate from avro_rs (0.13) to apache_avro (0.14) #2784 (martin-g)

10.0.0-rc1 (2022-07-12)

Full Changelog

10.0.0 (2022-07-12)

Full Changelog

Breaking changes:

Convert batch_size to config option #2771 (andygrove)
MINOR: Remove Offset struct #2734 (andygrove)
feat: async extension planner #2713 (waynexia)
Switch to object_store crate (#2489) #2677 (tustvold)

Implemented enhancements:

update documentation, fix styling to match main Arrow project #2864
Update top-level README #2850
[Question]How to call an async function in ExecutionPlan::exec method? #2847
Add DataFrame::with_column #2844
Improve ergonomics of physical expr lit #2827
Add Python examples for reading CSV and query by SQL in Doc #2824
eliminate multi limit-offset nodes to EmptyRelation if possible #2822
Make LogicalPlan::Union be consistent with other plans #2816
Use coerced data type from value and list expressions during planning inlist expression #2793
Add configuration option to enable/disalbe CoalesceBatchesExec #2790
Simplify FilterNullJoinKeys rule #2780
Allow configuration settings to be specified with environment variables #2776
Automatically update configs.md in user guide #2770
Support multiple paths for ListingTableScanNode #2768
Reduce outer joins #2757
support data type coerced and decimal in INLIST expr #2755
Change ExtensionPlanner::plan_extension() to an async function #2749
Add IsNotNull filter to join inputs if one side of join condition does not allow null #2739
Sort preserving MergeJoin #2698
Improve readability of table scan projections in query plans #2697
DataFusion 9.0.0 Release #2676
Improve UX for UNION vs UNION ALL (introduce a LogicalPlan::Distinct) #2573 [sql]
Implement some way to show the sql used to create a view #2529
Consider adopting IOx ObjectStore abstraction #2489
Support sum0 as a built-in agg function #2067
implement grouping sets, cubes, and rollups #1327
Ruby bindings #1114
Support dates in hash join #2746 (andygrove)

Fixed bugs:

Docker Error #2851
Anti join ignores join filters #2842
Can't test or compile sub-model code after upgrade to arrow-rs 17.0.0 #2835
Not evaluate the set expr in the InList for the optimization #2820
CASE When: result type should be coercible to a common type #2818
IN/NOT IN List: NULL is not equal to NULL #2817
panic when case statement returns null #2798
InList: Can't cast the list expr data type to value expr data type directly #2774
InList Expr: expr and list values must can be converted to a same data type #2759
tpchgen docker syntax change prevents volume from binding #2751
Cannot join on date columns (Unsupported data type in hasher: Date32) #2744
rewrite_expression does not properly handle Exists and ScalarSubquery #2736
LocalFileSystem Not sorted by file name， As a result, the data lines queried in multiple files are out of order. #2730
Filter push down need consider alias columns #2725
Recent API change in GlobalLimitExec breaks compatibility with Ballista #2720
Common Subexpression Eliminiation pass errors if run twice on some plans: Schema contains duplicate unqualified field name 'IsNull-Column-sys.host' #2712
The data type is not compatible with other system, for example spark or PG database #1379

Documentation updates:

Fix docs styling #2865 (kmitchener)
Various updates to top-level README #2854 (andygrove)
MINOR: Add documentation for running integration tests #2839 (andygrove)
add csv registration and sql query to examples #2825 (waitingkuo)
[minor] refine doc #2753 (Ted-Jiang)

Closed issues:

Consider adding a prominent note in the readme about ballista #2853
support decimal in (NULL) #2800
InList: Don't treat Null as UTF8(None) #2782
InList: don't need to treat Null as UTF8 data type #2773
Implement extensible configuration mechanism #138

Merged pull requests:

Update CONTRIBUTING.md #2876 (waitingkuo)
Make LogicalPlan::Union be consistent with other plans #2868 (comphead)
minor: remove unneeded files from project root #2863 (kmitchener)
chore: make cargo clippy happy in nigtly #2860 [sql] (xudong963)
Update to arrow 18.0.0 #2856 [sql] (alamb)
chore: remove ballista-related docker-compose file #2852 (xudong963)
Adding dataframe with_column function #2849 (comphead)
anti joins now respect join filters #2843 (andygrove)
MINOR: make name meaningful and clean up code #2841 (liukun4515)
Make lit implementation more concise #2838 (alamb)
InList: set/list value must be evaluated to get the values #2834 (liukun4515)
Add SHOW CREATE TABLE with initial support for views #2830 [sql] (mrob95)
Improve ergonomics of physical expr lit #2828 (alamb)
Eliminate multi limit-offset nodes to emptyRelation #2823 (AssHero)
Fix the ci #2821 (liukun4515)
CaseWhen: coerce the all then and else data type to a common data type #2819 (liukun4515)
Fix ScalarValue::isNull calculation #2815 (alamb)
Fix nullability calculation for CASE expressions #2814 (alamb)
Bump numpy from 1.21.3 to 1.22.0 in /integration-tests #2811 (xudong963)
Fix data type calculation for CaseExpr s with NULLs #2810 (AssHero)
InList: fix bug for comparing with Null in the list using the set optimization #2809 (liukun4515)
Use specialized dictionary kernels (#1178) #2808 (tustvold)
fix schema nullability for information_schema schema #2804 (alamb)
fix: correctly calculate join output schema nullability #2803 (alamb)
Correct schema nullability declaration in tests #2802 (alamb)
Don't treat Null as UTF8(None) and change error info. #2801 (liukun4515)
MINOR: Remove reference to docker image that is no longer available #2795 (andygrove)
Use coerced type in inlist expr planning #2794 (viirya)
Add LogicalPlan::Distinct #2792 [sql] (mrob95)
Add config option for coalesce_batches physical optimization rule, make optional #2791 (andygrove)
Improve readability of table scan projections in query plans (remove Some and None) #2789 [sql] (comphead)
Simplify FilterNullJoinKeys rule #2781 (andygrove)
MINOR: re-export sqlparser from datafusion-sql crate #2779 [sql] (andygrove)
Update to arrow 17.0.0 #2778 [sql] (alamb)
Support multiple paths for ListingTableScanNode #2775 (Ted-Jiang)
Remove expr_sub_expressions and rewrite_expression functions #2772 (mrob95)
minor: update cranelift related dependencies #2769 (xudong963)
minor: panic rather than fail silently on bad dictionary in hash join #2767 (alamb)
MINOR: make prettier use consistent between CI and contributing guide #2766 (andygrove)
Rewrite subexpressions of InSubquery in rewrite_expression #2765 (mrob95)
Support DataType::Decimal for IN and NOT IN expressions #2764 (liukun4515)
Implement extensible configuration mechanism #2754 (andygrove)
Remove redundant docker argument #2752 (avantgardnerio)
Add optimizer pass to reduce left/right/full joins to inner join if possible #2750 [sql] (AssHero)
MINOR: Remove legacy CLI context enum #2748 (andygrove)
CSE unit test for duplicate fields #2747 (waynexia)
MINOR: Improve unsupported data type error message #2745 (andygrove)
Add optimizer rule to filter out null keys before a join #2740 (andygrove)
Sort file names in a directory #2730 #2735 (yourenawo)
fix: filter push down with InList expressions #2729 (Ted-Jiang)
[Minor] add debug info in optimizer.rs #2726 (Ted-Jiang)
Add public API for GlobalLimitExec and LocalLimitExec #2722 (andygrove)
Add additional data types are supported in hash join #2721 (AssHero)
Upgrade to arrow 16.0.0 #2718 [sql] (alamb)
Fix clippy warnings with toolchain 1.63 #2717 [sql] (waynexia)
Support for GROUPING SETS/CUBE/ROLLUP #2716 (thinkharderdev)
fix: check redundant fields while building projection plan #2715 (waynexia)
Sort preserving SortMergeJoin #2699 (korowa)
fix: union schema fix #2688 [sql] (gandronchik)
Support default precision and scale toCAST <EXPR> AS DECIMAL #2680 [sql] (gandronchik)

9.0.0 (2022-06-10)

Full Changelog

Breaking changes:

MINOR: Move simplify_expression rule to datafusion-optimizer crate #2686 (andygrove)
Move physical expression planning to datafusion-physical-expr crate #2682 (andygrove)
Create new datafusion-optimizer crate for logical optimizer rules #2675 (andygrove)
Remove ExecutionProps dependency from OptimizerRule #2666 (andygrove)
Remove ObjectStoreSchemaProvider (#2656) #2665 (tustvold)
Move LogicalPlanBuilder to datafusion-expr crate #2576 (andygrove)
LogicalPlanBuilder now uses TableSource instead of TableProvider #2569 (andygrove)
Remove scan_empty method from LogicalPlanBuilder #2568 (andygrove)
MINOR: Move expression utils from sql module to expr crate #2553 (andygrove)
Remove scan_json methods from LogicalPlanBuilder #2541 (andygrove)
Remove scan_avro methods from LogicalPlanBuilder #2540 (andygrove)
Remove scan_parquet methods from LogicalPlanBuilder #2539 (andygrove)
MINOR: Move ExprVisitable and exprlist_to_columns to datafusion-expr crate #2538 (andygrove)
Remove scan_csv methods from LogicalPlanBuilder #2537 (andygrove)
Fix Redundant ScalarValue Boxed Collection #2523 (comphead)
Support for OFFSET in LogicalPlan #2521 (jdye64)

Implemented enhancements:

[EPIC] JIT support for DataFusion #2703
Show column names instead of column indices in query plans #2689
Proposal: remove automated ballista CI checks from DataFusion #2679
Pass SessionState to TableProvider #2658
Is ObjectStoreSchemaProvider Still Needed? #2656
Add logical plan support to datafusion-proto #2630
Like, NotLike expressions work with literal NULL #2626
Move JOIN ON predicates push down logic from planner to optimizer #2619
Remove ExecutionProps from OptimizerRule trait #2614
Add, Minus, Multiply, divide, Modulo operator work with literal NULL #2609
Support DESCRIBE <table> to show table schemas #2606
Support CREATE OR REPLACE TABLE #2605
filter_push_down tests should not rely on TableProvider and ExecutionPlan #2600
Move logical optimizer rules out of the core datafusion crate #2599
Push Limit through outer Join #2579
datafusion_proto crate should have exhaustive match statements for handling Expr #2565
String representation of Expr variant #2563
File URI Scheme Interpretation #2562
Implement physical plan for OFFSET #2551
Update limit pushdown rule to support offsets #2550
Move LogicalPlanBuilder to datafusion-expr crate #2536
Logical optimizer rule "simplify expressions" should not depend on the core datafusion crate #2535
Support optional filter in Join #2509
Improve SQL planner & logical plan support for JOIN conditions #2496
Numeric, String, Boolean comparisons with literal NULL #2482
Redundant ScalarValue Boxed Collection #2449
ObjectStore Directory Semantics #2445
Add support for OFFSET in SQL query planner + logical plan #2377
SQL planner should use TableSource not TableProvider #2346
Move SQL query planning to new crate #2345
Update LogicalPlan rustdoc code to not use LogicalPlanBuilder #2308
[Optimizer] Refactor convert join #2256
[Optimizer] Infer is not null predicate from where clause #2254
Support ArrayIndex for ScalarValue(List) #2207
[Ballista] Fill functional gaps between datafusion and ballista #2062
[Ballista] support datafusion built_in UDAF work in ballista cluster #1985
Export C API #1113

Fixed bugs:

Fix Typos in Docs #2695
Unable to build a docker image #2691
Optimization pass AggregateStatistics changes type of output from Int64 to UInt64 #2673
ViewTable Circular Reference #2657
ScalarValue::to_array_of_size panics computing statistics for nested parquet file #2653
The result type of count/count_distinct #2635
limit_push_down is not working properly with OFFSET #2624
Avro Tests Fail To Compile #2570
Unused Window functions experssion is wrongly removed from LogicalPlan during optimalization #2542
Bug: ObjectStoreRegistry get_by_uri does not return correct path when "scheme" is provided #2525
There are duplicate and inconsistent copies of datafusion.proto #2514
Projection pushdown produces incorrect results when column names are reused #2462
Incorrect Parquet Projection For Nested Types #2453
LogicalPlanBuilder::scan_csv creates scans with invalid table names #2278
Inner join incorrectly pushdown predicate with OR operation #2271
Ignored alias for columns with aggregate function and incorrect results when collecting statistics is enabled #2176
Join on path partitioned columns fails with error #2145

Documentation updates:

Fix Ballista link #2654 (dsaxton)
MINOR: Add Blaze as a project using DataFusion #2618 (yjshen)
[MINOR] remove datafusion-cli's ballista feature from docs #2612 (Ted-Jiang)
chore(doc) remove ballista from datafusion-cli readme #2604 (ming535)

Closed issues:

[Question] Converting TableSource to custom TableProvider #2644
[Question] Why DataFusion is shipped with arrow version 9.1.0 on crates.io ? #2474

Merged pull requests:

Test optional features in CI #2708 (tustvold)
support indexed fields proto #2707 (nl5887)
Update sqlparser-rs to 0.18.0 #2705 (alamb)
[MINOR]: Add documentation to datafusion-row modules #2704 (alamb)
Make sure that the data types are supported in hashjoin before genera… #2702 (AssHero)
Move remaining code out of legacy core/logical_plan module #2701 (andygrove)
Move some tests from core to expr #2700 (andygrove)
MINOR: Improve Docs Readability #2696 (ryanrussell)
Combine limit and offset to fetch and skip and implement physical plan support #2694 (ming535)
MINOR: Add datafusion-sql example #2693 (andygrove)
Remove Ballista related lines from Dockerfile #2692 (mocknen)
Show column names instead of indices in query plans #2690 (andygrove)
MINOR: Remove uses of TryClone for Parquet #2681 (tustvold)
Fix AggregateStatistics optimization so it doesn't change output type #2674 (alamb)
If statistics of column Max/Min value does not exists in parquet file, sent Min/Max to None #2671 (AssHero)
MINOR: Move more expression code to datafusion-expr crate #2669 (andygrove)
MINOR: Rewrite imports in optimizer moduler #2667 (andygrove)
Update snmalloc-rs requirement from 0.2 to 0.3 #2663 (dependabot[bot])
Add module doc for RuntimeEnv, SessionContext, TaskContext, etc... #2655 (tustvold)
Prune unused dependencies from datafusion-proto #2651 (tustvold)
MINOR: Implement serde for join filter #2649 (andygrove)
pushdown support for predicates in ON clause of joins #2647 (korowa)
Move SortKeyCursor and RowIndex into modules, add sort_key_cursor test #2645 (alamb)
Implement DESCRIBE <table> #2642 (LiuYuHui)
Implement LogicalPlan serde in datafusion-proto #2639 (andygrove)
Fix limit + offset pushdown #2638 (ming535)
change result type of count/count_distinct from uint64 to int64 #2636 (liukun4515)
if none columns in window expr are needed, remove the window exprs #2634 (AssHero)
Like, NotLike expressions work with literal NULL #2627 (WinkerDu)
MINOR: Refactor datafusion-proto dependencies and imports #2623 (andygrove)
MINOR: add optimizer struct #2616 (jackwener)
Remove FilterPushDown dependency on physical plan #2615 (andygrove)
Support CREATE OR REPLACE TABLE #2613 (AssHero)
Support binary mathematical operators work with NULL literals #2610 (WinkerDu)
chore: try fix CI coverage #2608 (Ted-Jiang)
MINOR: Rename benchmark crate #2607 (andygrove)
chore(dep): bump cranelift to 0.84.0 #2598 (waynexia)
fix some typos #2597 (ming535)
Support limit pushdown through left right outer join #2596 (Ted-Jiang)
Unignore rustdoc code examples in datafusion-expr crate #2590 (andygrove)
Evaluate JIT'd expression over arrays #2587 (waynexia)
[minor]Fix ci clippy for unused import #2586 (Ted-Jiang)
[Doc]add doc for enable SIMD need cargo nightly #2577 (Ted-Jiang)
Add DataFrame union_distinct and fix documentation for distinct #2574 (andygrove)
Fix avro tests (#2570) #2571 (tustvold)
Make datafusion-proto match exhaustive #2567 (andygrove)
Support limit push down for offset_plan #2566 (Ted-Jiang)
Introduce Expr.variant_name() function #2564 (jdye64)
Fix some 404 links in the contribution guide #2561 (hi-rustin)
Update datafusion-cli readme cli version #2559 (hi-rustin)
MINOR: Move expr_rewriter.rs to datafusion-expr crate #2552 (andygrove)
Fix JOINs with complex predicates in ON (split ON expressions only by AND operator) #2534 (korowa)
Reduce duplication in file scan tests #2533 (tustvold)
Fix size_of_scalar test #2531 (alamb)
Update to arrow-rs 14.0.0 #2528 (alamb)
ObjectStoreRegistry get_by_uri now returns correct path when "scheme" is provided #2526 (timvw)
MINOR: Add ORDER BY clause to test #2524 (andygrove)
Remove unused binary_array_op_scalar! in binary.rs #2512 (alamb)
fix NULL <op> column evaluation, tests for same #2510 (alamb)
Fix projection pushdown produces incorrect results when column names are reused #2463 (jonmmease)
Benchmark for sort preserving merge #2431 (alamb)
Support GetIndexedFieldExpr for ScalarValue #2196 (ovr)

8.0.0 (2022-05-12)

Full Changelog

Breaking changes:

Add SQL planner support for ROLLUP and CUBE grouping set expressions #2446 (andygrove)
Make ExecutionPlan::execute Sync #2434 (tustvold)
Introduce new DataFusionError::SchemaError type #2371 (andygrove)
Add Expr::InSubquery and Expr::ScalarSubquery #2342 (andygrove)
Add Expr::Exists to represent EXISTS subquery expression #2339 (andygrove)
Move LogicalPlan enum to datafusion-expr crate #2294 (andygrove)
Remove dependency from LogicalPlan::TableScan to ExecutionPlan #2284 (andygrove)
Move logical expression type-coercion code from physical-expr crate to expr crate #2257 (andygrove)
feat: 2061 create external table ddl table partition cols #2099 [sql] (jychen7)
Reorganize the project folders #2081 (yahoNanJing)
Support more ScalarFunction in Ballista #2008 (Ted-Jiang)
Merge dataframe and dataframe imp #1998 (vchag)
Rename ExecutionContext to SessionContext, ExecutionContextState to SessionState, add TaskContext to support multi-tenancy configurations - Part 1 #1987 (mingmwang)
Add Coalesce function #1969 (msathis)
Add Create Schema functionality in SQL #1959 [sql] (matthewmturner)
omit some clone when converting sql to logical plan #1945 [sql] (doki23)
[split/16] move physical plan expressions folder to datafusion-physical-expr crate #1889 (Jimexist)
remove sync constraint of SendableRecordBatchStream #1884 (doki23)
[split/15] move built in window expr and partition evaluator #1865 (Jimexist)

Implemented enhancements:

Include Expr to datafusion::prelude #2347
Implement Serialization API for DataFusion #2340
Implement power function #1493
allow lit python function to support boolean and other types #1136
Automate dependency updates #37
Add CREATE VIEW #2279 (matthewmturner)
[Ballista] Support Union in ballista. #2098 (Ted-Jiang)
Change the DataFusion explain plans to make it clearer in the predicate/filter #2063 (Ted-Jiang)
Add write_json, read_json, register_json, and JsonFormat to CREATE EXTERNAL TABLE functionality #2023 (matthewmturner)
Qualified wildcard #2012 [sql] (doki23)
support bitwise or/'|' operation #1876 [sql] (liukun4515)
Introduce JIT code generation #1849 (yjshen)

Fixed bugs:

CASE expr with NULL literals panics 'WHEN expression did not return a BooleanArray' #1189
Function calls with NULL literals do not work #1188
Add SQL planner support for calling round function with two arguments #2503 (andygrove)
nested query fix #2402 (comphead)
fix issue#2058 file_format/json.rs attempt to subtract with overflow #2066 (silence-coding)
fix bug the optimizer rule filter push down #2039 (jackwener)
fix: replace ExecutionContex and ExecutionConfig with SessionContext and SessionConfig #2030 (xudong963)
Fixed parquet path partitioning when only selecting partitioned columns #2000 (pjmore)
Fix ambiguous reference error in filter plan #1925 (jonmmease)
platform aware partition parsing #1867 (korowa)
Fix incorrect aggregation in case that GROUP BY contains duplicate column names #1855 (alex-natzka)

Documentation updates:

MINOR: Make crate READMEs consistent #2437 (andygrove)
minor: Improve documentation for DFSchema join and merge functions #2367 (andygrove)
Change the code location and add annotation #2037 [sql] (jackwener)
Fix typos (Datafusion -> DataFusion) #1993 (andygrove)
Add examples to use MemTable and TableProvider (#1864) #1946 (PierreZ)
Add doc for building datafusion-cli when connect the ballista #1866 (liukun4515)
Add benchmarks section to DEVELOPERS.md #1838 (tustvold)

Performance improvements:

Avoid an Arc::clone per row in benchmark #1975 (jhorstmann)
Update datafusion-cli allocator #1878 (matthewmturner)

Closed issues:

Make expected result string in unit tests more readable #2412
remove duplicated fn aggregate() in aggregate expression tests #2399
split distinct_expression.rs into count_distinct.rs and array_agg_distinct.rs #2385
move sql tests in context.rs to corresponding test files in datafustion/core/tests/sql #2328
Date32/Date64 as join keys for merge join #2314
Error precision and scale for decimal coercion in logic comparison #2232
Support Multiple row layout #2188
TPC-H Query 18 #169
TPC-H Query 16 #167
Implement Sort-Merge Join #141
Split logical expressions out into separate source files #114

Merged pull requests:

Minor: remove code that is now included in arrow-rs #2511 (alamb)
MINOR: Enable multi-statement benchmark queries #2507 (andygrove)
MINOR: Add ignored tests for all remaining benchmark queries #2506 (andygrove)
Update to sqlparser 0.17.0 #2500 (alamb)
Add metrics for ParquetExec #2499 (Ted-Jiang)
Limit cpu cores used when generating changelog #2494 (andygrove)
Optimize MergeJoin by storing joined indices instead of creating small record batches for each match #2492 (richox)
Add SQL planner support for grouping() aggregate expressions #2486 (andygrove)
MINOR: Parameterize changelog script #2484 (jychen7)
Numeric, String, Boolean comparisons with literal NULL #2481 (WinkerDu)
Adds unit test cases of mathematical expressions working with null literal #2478 (WinkerDu)
Minor: Move test code from context.rs into sql_integration #2473 (alamb)
Minor: Use ExprVisitor to find columns referenced by expr #2471 (alamb)
minor: remove expr dependency from the row crate, update crate-deps.dot/svg #2470 (yjshen)
Fix read_from_registered_table_with_glob_path fails if path contains // #2465 #2468 (timvw)
Add support for list_dir() on local fs #2467 (wjones127)
MINOR: Partial fix for SQL aggregate queries with aliases #2464 (andygrove)
minor: move struct definition out of aggregate/mod.rs, etc #2458 (WinkerDu)
Fix bugs in SQL planner with GROUP BY scalar function and alias #2457 (andygrove)
feat: Support CompoundIdentifier as GetIndexedField access #2454 (ovr)
Table provider error propagation #2438 (jdye64)
MINOR: Improve error messages for GROUP BY / HAVING queries #2435 (andygrove)
minor: remove redundant code #2432 (jackwener)
minor: update versions and paths in changelog scripts #2429 (andygrove)
Fix Ballista executing during plan #2428 (tustvold)
minor: format table result vec & remove some unnecessary semicolons #2425 (WinkerDu)
Basic support for IN and NOT IN Subqueries by rewriting them to SEMI / ANTI Join #2421 (korowa)
Allow subqueries without aliases #2418 (andygrove)
Fix bug in subquery join filters referencing outer query #2416 (andygrove)
MINOR: remove duplicated function format_state_name() #2414 (WinkerDu)
Make expected result string in unit tests more readable #2413 (WinkerDu)
sum(distinct) support #2405 (WinkerDu)
Update ordered-float requirement from 2.10 to 3.0 #2403 (dependabot[bot])
remove duplicated fn aggregate() in aggregate expression tests #2400 (WinkerDu)
Support type-coercion from Decimal to Float64 #2396 (comphead)
minor: SchemaError code cleanup and improvements #2391 (andygrove)
Support struct_expr generate struct in sql #2389 (Ted-Jiang)
Re-organize and rename aggregates physical plan #2388 (yjshen)
refactor distinct_expressions.rs and split into count_distinct.rs and array_agg_distinct.rs #2386 (WinkerDu)
Allow CTEs to be referenced from subquery expressions #2384 (andygrove)
Upgrade to arrow 13 #2382 (alamb)
Grouped Aggregate in row format #2375 (yjshen)
Fix bugs with CTE aliasing and normalize all identifiers in the SQL planner #2373 (andygrove)
Stop optimizing queries twice #2369 (andygrove)
feat: Support casting to arrays to primitive type #2366 (ovr)
Add proper support for null literal by introducing ScalarValue::Null #2364 (WinkerDu)
minor: fix duplicate column bug in subquery support #2362 (andygrove)
Normalize subquery aliases #2359 (andygrove)
Implement physical planner support for DATE +/- INTERVAL #2357 (andygrove)
Add SQL query planner support for Scalar Subqueries #2354 (andygrove)
Add SQL query planner support for IN subqueries #2352 (andygrove)
Add Expr to prelude #2348 (alamb)
Add SQL planner support for EXISTS subqueries #2344 (andygrove)
Add public Serialization/Deserialization API for Expr to/from bytes #2341 (alamb)
Support for date32 and date64 in sort merge join #2336 (hntd187)
[physical-expr] move aggregate exprs and window exprs to their own modules #2335 (yjshen)
fix: union schema #2334 (gandronchik)
Improve sql integration test organization #2333 (alamb)
Support scalar values for func Array #2332 (Ted-Jiang)
move sql tests from context.rs to corresponding test files in tests/sql #2329 (WinkerDu)
deprecate index_of and make index_of_column_by_name public #2320 (jdye64)
Fix HashJoin evaluating during plan #2317 (tustvold)
minor: remove two source files that only had re-exports #2313 (andygrove)
Don't sort batches during plan #2312 (tustvold)
Move case/when expressions to datafusion-expr crate #2311 (andygrove)
Fix CrossJoinExec evaluating during plan #2310 (tustvold)
Make SortPreservingMerge Usable Outside Tokio (#2201) #2305 (tustvold)
chore: update cranelift to 0.83.0 #2304 (yjshen)
Always increment timer on record #2298 (tustvold)
Remove unnecessary env var for parquet_sql example #2297 (sergey-melnychuk)
Simplify sort streams #2296 (tustvold)
MINOR: beautify code with neat idents #2295 (WinkerDu)
Move FileType enum from sql module to logical_plan module #2290 (andygrove)
Remove Parquet Empty Projection Workaround #2289 (tustvold)
Add BatchPartitioner (#2285) #2287 (tustvold)
Make row its crate to make it accessible from physical-expr #2283 (yjshen)
Enable filter pushdown when using In_list on parquet #2282 (Ted-Jiang)
Update uuid requirement from 0.8 to 1.0 #2280 (dependabot[bot])
Add bytes scanned metric to ParquetExec #2273 (thinkharderdev)
Fix outer join output with all-null indices on empty batch #2272 (yjshen)
Re-export DataFusion crates #2264 (andygrove)
rewrite approx_median to approx_percentile_cont while planning phase #2262 (korowa)
Introduce RowLayout to represent rows for different purposes #2261 (yjshen)
fix string coercion missing in Eq/NotEq operator #2258 (WinkerDu)
Update to Arrow 12.0.0, update tonic and prost #2253 (alamb)
minor: move field_util from physical-expr crate to expr crate #2250 (andygrove)
Move identifer case tests to sql_integ, add negative cases, Debug for DataFrame #2243 (alamb)
Implement sort-merge join #2242 (richox)
fix: find the right wider decimal datatype for comparison operation #2241 (liukun4515)
Fix join without constraints #2240 (Dandandan)
Add type coercion rule for date + interval #2235 (andygrove)
support array with scalar arithmetic operation for decimal data type #2233 (liukun4515)
chore: add debug! log in some execution operators #2231 (NGA-TRAN)
Introduce new optional scheduler, using Morsel-driven Parallelism + rayon (#2199) #2226 (tustvold)
minor: add editor config file #2224 (jackwener)
minor: Refactor to avoid repeated code in replace_qualifier #2222 (andygrove)
update cli readme #2220 (liukun4515)
Use filter (filter_record_batch) instead of take to avoid using indices #2218 (Dandandan)
Add single line description of ExecutionPlan (#2216) #2217 (tustvold)
Remove tokio::spawn from HashAggregateExec (#2201) #2215 (tustvold)
Remove tokio::spawn from WindowAggExec (#2201) #2203 (tustvold)
Make ParquetExec usable outside of a tokio runtime (#2201) #2202 (tustvold)
add sql level test for decimal data type #2200 (liukun4515)
case when supports NULL constant #2197 (WinkerDu)
feat: Support simple Arrays with Literals #2194 (ovr)
[Ballista] Enable ApproxPercentileWithWeight in Ballista and fill UT #2192 (Ted-Jiang)
refactor: simplify prepare_select_exprs #2190 (jackwener)
Multiple row-layout support, part-1: Restructure code for clearness #2189 (yjshen)
make nightly clippy happy #2186 (xudong963)
[Ballista]Make PhysicalAggregateExprNode has repeated PhysicalExprNode #2184 (Ted-Jiang)
MINOR: handle NULL in advance to avoid value copy in string_concat #2183 (WinkerDu)
fix: Sort with a lot of repetition values #2182 (yjshen)
cli: update lockfile #2178 (happysalada)
Add LogicalPlan::SubqueryAlias #2172 (andygrove)
minor: Avoid per cell evaluation in Coalesce, use zip in CaseWhen #2171 (yjshen)
Handle merged schemas in parquet pruning #2170 (thinkharderdev)
Implement fast path of with_new_children() in ExecutionPlan #2168 (mingmwang)
enable explain for ballista #2163 (doki23)
Add delimiter for create external table #2162 (matthewmturner)
[MINOR] enable EXTRACT week and add test (after sqlparser update to 0.16) #2157 (Ted-Jiang)
Optimize the evaluation of IN for large lists using InSet #2156 (Ted-Jiang)
Update sqlparser requirement from 0.15 to 0.16 #2152 (dependabot[bot])
fix not(null) with constant null #2144 (WinkerDu)
Add IF NOT EXISTS to CREATE TABLE and CREATE EXTERNAL TABLE #2143 (matthewmturner)
implement 'StringConcat' operator to support sql like "select 'aa' || 'b' " #2142 (WinkerDu)
#2109 By default, use only 1000 rows to infer the schema #2139 (jychen7)
[CLI] Add show tables in ballista for datafusion-cli #2137 (gaojun2048)
fix: incorrect memory usage track for sort #2135 (yjshen)
Update quarterly roadmap for Q2 #2133 (matthewmturner)
Reduce SortExec memory usage by void constructing single huge batch #2132 (yjshen)
MINOR: fix concat_ws corner bug #2128 (WinkerDu)
Minor add clarifying comment in parquet #2127 (alamb)
Minor: make disk_manager public #2126 (yjshen)
JIT-compille DataFusion expression with column name #2124 (Dandandan)
minor: replace array_equals in case evaluation with eq_dyn from arrow-rs #2121 (alamb)
Serialize timezone in timestamp scalar values #2120 (thinkharderdev)
minor: fix some clippy warnings from nightly rust #2119 (alamb)
Fix case evaluation with NULLs #2118 (alamb)
issue#1967 ignore channel close #2113 (silence-coding)
cli: add cargo.lock #2112 (happysalada)
doc: update release schedule #2110 (jychen7)
fix df union all bug #2108 [sql] (WinkerDu)
Reduce repetition in Decimal binary kernels, upgrade to arrow 11.1 #2107 (alamb)
update zlib version to 1.2.12 #2106 (waitingkuo)
Create jit-expression from datafusion expression #2103 (Dandandan)
Add CREATE DATABASE command to SQL #2094 [sql] (matthewmturner)
Refactor SessionContext, BallistaContext to support multi-tenancy configurations - Part 3 #2091 (mingmwang)
minor: remove duplicate test #2089 (jackwener)
minor: remove repeated test #2085 (jackwener)
Fix lost filters and projections in ParquetExec, CSVExec etc #2077 (Ted-Jiang)
Remove dependency of common for the storage crate #2076 (yahoNanJing)
[MINOR] fix doc in `EXTRACT(field FROM source) #2074 (Ted-Jiang)
[Bug][Datafusion] fix TaskContext session_config bug #2070 (gaojun2048)
Short-circuit evaluation for CaseWhen #2068 (yjshen)
split datafusion-object-store module #2065 (yahoNanJing)
Allow CatalogProvider::register_catalog to return an error #2052 (alamb)
Add test in register_catalog and change to use named symbolic constants #2050 (alamb)
Update to arrow/parquet 11.0 #2048 (alamb)
minor: format comments (// to // ) #2047 (jackwener)
use cargo-tomlfmt to check Cargo.toml formatting in CI #2033 (WinkerDu)
feat: #2004 approx percentile with weight #2031 (jychen7)
Refactor SessionContext, SessionState and SessionConfig to support multi-tenancy configurations - Part 2 #2029 (mingmwang)
Simplify prerequisites for running examples #2028 (doki23)
Replace usage of println! with logger macros #2020 (silence-coding)
Automatically test examples in user guide #2018 (vchag)
return VecDeque for DFParser::parse_sql #2017 [sql] (doki23)
Eliminate the scalar value filter #2002 (jackwener)
Fixing a typo in documentation #1997 (psvri)
Correct documentation of ExprVisitor #1996 (alamb)
Make it possible to only scan part of a parquet file in a partition #1990 (yjshen)
Update Dockerfile to fix integration tests #1982 (andygrove)
Remove some more unecessary cloning in sql_expr_to_logical_expr #1981 [sql] (alamb)
Add ticket reference to clippy allow #1978 [sql] (alamb)
Implement EXTRACT expression with week, month, day, hour #1974 (Ted-Jiang)
Address typo in ExprVisitable trait documentation #1970 (jdye64)
Update sqlparser requirement from 0.14 to 0.15 #1966 (dependabot[bot])
PruningPredicate should take owned Expr #1960 (thinkharderdev)
Update to arrow 10.0.0, pyo3 0.16 #1957 (alamb)
update jit-related dependencies #1953 (xudong963)
minor code refinement: if_exists name change, wildcard field for logical plan, etc. #1951 [sql] (xudong963)
Allow different types of query variables (@@var) rather than just string #1943 [sql] (maxburke)
Pruning serialization #1941 (thinkharderdev)
Add write_parquet to DataFrame #1940 (matthewmturner)
Fix select from EmptyExec always return 0 row after optimizer passes #1938 (Ted-Jiang)
Add debug log when waiting for spilling on other consumers #1933 (viirya)
Add db benchmark script #1928 (matthewmturner)
Add write_csv to DataFrame #1922 (matthewmturner)
[MINOR] Update copyright year in Docs #1918 (alamb)
add metadata to DFSchema, close #1806. #1914 [sql] (jiacai2050)
Clippy fix on nightly #1907 (yjshen)
Updated Rust version to 1.59 in all the files #1903 (NaincyKumariKnoldus)
support extract second and minute in expr. #1901 (Ted-Jiang)
Update crate descriptions #1899 (alamb)
Remove uneeded Mutex in Ballista Client #1898 (alamb)
[split/17] move the rest of physical expr to datafusion-physical-expr crate #1892 (Jimexist)
Avoid unnecessary branching in row read/write if schema is null-free #1891 (yjshen)
Make parquet support optional for datafusion-common crate #1886 (jonmmease)
Fix clippy lints #1885 (HaoYang670)
Add support for ~/.datafusionrc and cli option for overriding it to datafusion-cli #1875 (matthewmturner)
[Minor] Clean up DecimalArray API Usage #1869 [sql] (alamb)
Changes after went through "Datafusion as a library section" #1868 (nonontb)
Enhance MemorySchemaProvider to support register_listing_table #1863 (matthewmturner)
Increase default partition column type from Dict(UInt8) to Dict(UInt16) #1860 (Igosuki)
Update to arrow 9.1.0 #1851 (alamb)
move some tests out of context and into sql #1846 (alamb)
[split/14] create datafusion-physical-expr module #1843 (Jimexist)
Return Error when parquet reader fails rather than no data with println! #1837 (alamb)
determine build side in hash join by total_byte_size instead of num_rows #1831 (xudong963)
Make ballista support an optional feature to datafusion-cli #1816 (alamb)
Update documentation example for change in API #1812 (alamb)
rename references of expr in physical plan module after datafusion-expr split #1798 (Jimexist)
DataFusion + Conbench Integration #1791 (dianaclarke)
The returned path value of get_by_uri should be self-described with entire path #1779 (yahoNanJing)
Useeq_dyn, neq_dyn, lt_dyn, lt_eq_dyn, gt_dyn, gt_eq_dyn kernels from arrow #1475 (alamb)

7.1.0 (2022-04-10)

Full Changelog

Fixed bugs:

By default, use only 1000 rows to infer the schema #2159

7.0.0 (2022-02-14)

Full Changelog

Breaking changes:

Consolidate various configurations options, remove unrelated batch_size #1565
Extract logical plans in LogicalPlan as independent struct #1228
Update ExecutionPlan to know about sortedness and repartitioning optimizer pass respect the invariants #1776 (alamb)
Update to arrow 8.0.0 #1673 (alamb)
Remove non idiomatic DataFusionError::into_arrow_external_error in favor of From conversion #1645 (alamb)
Remove Accumulator::update and Accumulator::merge #1582 (Jimexist)
implement Hash for various types and replace PartialOrd #1580 (Jimexist)
Replace DatafusionError with GenericError in ObjectStore interface #1541 (matthewmturner)
Make FLOAT SQL type map to Float32 rather than Float64 #1423 [sql] (liukun4515)
Map REAL SQL type to Float32 rather than Float64 to be consistent with pg #1390 [sql] (hntd187)

Implemented enhancements:

Create new datafusion_expr crate #1753
Create new datafusion_common crate #1752
API to get Expr's type and nullability without a DFSchema #1725
Cleaner API to create Expr::ScalarFunction programatically #1718
Introduce a Vec<u8> based row-wise representation for DataFusion #1708
Simplify creating new ListingTable #1705
Implement TableProvider for DataFrameImpl to allow registration of logical plans #1698
Public Expr simplification API #1694
Query Optimizer: Add OUTER --> INNER join conversion #1670
Support reading from CSV, Avro and Json files that have mergeable/compatible, but not identical schemas #1669
Remove DataFusionError::into_arrow_external_error in favor of From conversion #1644
Include join type in display implementation for logical plan #1620
Switch datafusion to using eq_dyn_scalar, etc kernels #1610
Proposal: Remove Accumulator::update and Accumulator::merge #1549
Replace DataFusionError/Result with impl Error for ObjectStore and Reader #1540
Add approx_quantile support #1538
support sorting decimal data type #1522
Keep all datafusion's packages up to date with Dependabot #1472
ExecutionContext support init ExecutionContextState with new(state: Arc<Mutex<ExecutionContextState>>) method #1439
support the decimal scalar value #1393
Documentation for using scalar functions with the the DataFrame API #1364
Support boolean == boolean and boolean != boolean operators #1159
Support DataType::Decimal(15, 2) in TPC-H benchmark #174
Make MemoryStream public #150
Add support for Parquet schema merging #132
Add SQL support for IN expression #118
Add logging to datafusion-cli #1789 (alamb)
Add approx_median() aggregate function #1729 (realno)
Add join type for logical plan display #1674 [sql] (xudong963)
Fix null comparison for Parquet pruning predicate #1595 (viirya)
Add corr aggregate function #1561 (realno)
Add covar, covar_pop and covar_samp aggregate functions #1551 (realno)
Add approx_quantile() aggregation function #1539 (domodwyer)
Initial MemoryManager and DiskManager APIs for query execution + External Sort implementation #1526 (yjshen)
Add stddev and variance #1525 (realno)
Add rem operation for Expr #1467 (liukun4515)
support decimal data type in create table #1431 [sql] (liukun4515)
Ordering by index in select expression #1419 [sql] (hntd187)
Add support for ORDER BY on unprojected columns #1415 (viirya)
Support decimal for min and max aggregate #1407 (liukun4515)
Consolidate ConstantFolding and SimplifyExpression #1375 (alamb)
Datafusion cli quiet mode command to contain option bool #1345 (Jimexist)
Implement array_agg aggregate function #1300 (viirya)
Add a command to switch output format in cli #1284 (capkurmagati)
Support =, <, <=, >, >=, !=, is distinct from, is not distinct from for BooleanArray #1163 (alamb)

Fixed bugs:

Unsupported data type in hasher: Timestamp(Second, None) #1768
SQL column identifiers should be converted to lowercase when unquoted #1746
Data type Dictionary(Int32, Utf8) not supported for binary operation 'eq' on dyn arrays #1605
datafusion doesn't process predicate pushdown correctly when there is outer join #1586
casting Int64 to Float64 unsuccessfully caused tpch8 to fail #1576
CTE/WITH .. UNION ALL confuses name resolution in WHERE #1509
ORDER BY min(x) results in error Plan("No field named 'foo.x'. Valid fields are 'MIN(foo.x)'.") #1479
Sort discards field metadata on the output schema #1476
Datafusion should not strip out timezone information from existing types #1454
Error on some queries: "column types must match schema types, expected XXX but found YYY" #1447
Query failing to return any results when filter is an equality check on strings (bad statistics in parquet) #1433
Field names containing period such as f.c1 cannot be named in SQL query #1432
Select * returns an unexpected result #1412
Turn off unused default features of chrono and ahash #1398
real data type is float32 in PG database, but in the datafusion it is as float64 #1380
TPC-H q10 performance regression (expression for filter with added alias is not pushed down) #1367
ProjectionExec Loses Field Metadata #1361
Support Filter on unprojected columns #1351
NULLS ORDER is inconsistent with postgres #1343
Fix bug while merging RecordBatch, add SortPreservingMerge fuzz tester #1678 (alamb)
fix a cte block with same name for many times #1639 [sql] (xudong963)
fix: casting Int64 to Float64 unsuccessfully caused tpch8 to fail #1601 (xudong963)
Fix single_distinct_to_groupby for arbitrary expressions #1519 (james727)
Fix SortExec discards field metadata on the output schema #1477 (alamb)
fix calculate in many_to_many_hash_partition test. #1463 (Ted-Jiang)
Add Timezone to Scalar::Time* types, and better timezone awareness to Datafusion's time types #1455 (maxburke)
Support identifiers with . in them #1449 [sql] (alamb)
Fixes for working with functions in dataframes, additional documentation #1430 (tobyhede)
[Minor] Fix send_time metric for hash-repartition #1421 (Dandandan)
fix: Select * returns an unexpected result #1413 [sql] (xudong963)
Make cli handle multiple whitespaces #1388 (capkurmagati)
Metadata is kept in projections for non-derived columns #1378 (hntd187)
Fix Predicate Pushdown: split_members should be able to split aliased predicate #1368 (viirya)
Change the arg names and make parameters more meaningful #1357 (liukun4515)
collect table stats by default for listing table #1347 (houqp)
fix: make nulls-order consistent with postgres #1344 [sql] (xudong963)
Avoid changing expression names during constant folding #1319 (viirya)
improve error message for invalid create table statement #1294 [sql] (houqp)
Forbid creating the table with the same name #1288 (liukun4515)

Documentation updates:

Clarify docs about Accumulator::update and Accumulator::update_batch #1542 (alamb)
Fix duplicated cargo run --example parquet_sql #1482 (sergey-melnychuk)
add documentation to Datafusion cli's new commands #1348 (liukun4515)
fix some clippy warnings from nightly channel #1277 [sql] (Jimexist)

Performance improvements:

Parquet pruning predicate for IS NULL #1591
Fix predicate pushdown for outer joins #1618 (james727)
fix: sql planner creates cross join instead of inner join from select predicates #1566 [sql] (xudong963)
Split fetch_metadata into fetch_statistics and fetch_schema #1365 (Dandandan)
Optimize the performance queries with a single distinct aggregate #1315 (ic4y)
Left join could use bitmap for left join instead of Vec<bool> #1291 (boazberman)

Closed issues:

Add release compile to CI #1728
DiskManager and TempFiles getting created several times per query #1690
Add a test for the pyarrow feature in CI #1635
SQL tests for when sorting exceeded available memory and had to spill to disk #1573
Consolidate the N-way merging code and SortPreservingMergeStream (which has quite good tests of what is often quite tricky code, and it will be performance critical) #1572
Consolidate the SortExec code (so there is only a single sort operator that does in memory sorting if it has enough memory budget but then spills to disk if needed). #1571
Track memory usage in Non Limited Operators #1569
[Question] Why does ballista store tables in the client instead of in the SchedulerServer #1473
Consolidate Projection for Schema and RecordBatch #1425
Support Sort on unprojected columns #1372
Unused code in hash_aggregate #1362
Why use the expr types before coercion to get the result type? #1358
A problem about the projection_push_down optimizer gathers valid columns #1312
apply constant folding to LogicalPlan::Values #1170
reduce usage of IntoIterator<Item = Expr> in logical plan builder window fn #372
Why does DataFusion throw a Tokio 0.2 runtime error? #176
TPC-H Query 14 #165
Length kernel returns bytes not character length #156
Split the logical operators out into separate source files #115

Merged pull requests:

Fixup some doc warnings #1811 (alamb)
Ensure most of links in docs are correct #1808 [sql] (HaoYang670)
Update CHANGELOG.md, update release scripts #1807 (alamb)
Update versions for split crates #1803 (matthewmturner)
Improve the error message and UX of tpch benchmark program #1800 (alamb)
rename references of expr in logical plan module after datafusion-expr split #1797 (Jimexist)
Update to sqlparser 0.14 #1796 [sql] (alamb)
[split/13] move rest of expr to expr_fn in datafusion-expr module #1794 (Jimexist)
Update datafusion versions #1793 (matthewmturner)
Less verbose plans in debug logging #1787 (alamb)
[split/11] split expr type and null info to be expr-schemable #1784 (Jimexist)
Introduce Row format backed by raw bytes #1782 (yjshen)
rewrite predicates before pushing to union inputs #1781 (korowa)
Update datafusion to use arrow 9.0.0 #1775 (alamb)
[split/10] split up expr for rewriting, visiting, and simplification traits #1774 [sql] (Jimexist)
#1768 Support TimeUnit::Second in hasher #1769 (jychen7)
TPC-H benchmark can optionally write JSON output file with benchmark summary #1766 (andygrove)
[split/8] move Accumulator and ColumnarValue to datafusion-expr #1765 (Jimexist)
[split/7] move built-in scalar function to datafusion-expr #1764 (Jimexist)
[split/6] move signature, type signature, volatility to datafusion-expr #1763 (Jimexist)
[split/9+12] move udf, udaf, Expr to datafusion-expr module #1762 [sql] (Jimexist)
[split/5] move window frame and operator to datafusion-expr module #1761 (Jimexist)
[split/4] move scalar value to datafusion-common #1760 (Jimexist)
[split/3] split datafusion expr module and move aggregate and window function expr #1759 (Jimexist)
[split/2] move column and dfschema to datafusion-common module #1758 (Jimexist)
Use ordered-float 2.10 #1756 (andygrove)
[split/1] split datafusion-common module #1751 (Jimexist)
use clap 3 style args parsing for datafusion cli #1749 (Jimexist)
fix: Case insensitive unquoted identifiers in SQL #1747 [sql] (mkmik)
Move more tests out of context.rs #1743 (alamb)
Move optimize test out of context.rs #1742 (alamb)
Fix typos in crate documentation #1739 (r4ntix)
add cargo check --release to ci #1737 (xudong963)
Update parking_lot requirement from 0.11 to 0.12 #1735 (dependabot[bot])
Create built-in scalar functions programmatically #1734 (HaoYang670)
Prevent repartitioning of certain operator's direct children (#1731) #1732 (tustvold)
API to get Expr's type and nullability without a DFSchema #1726 (alamb)
minor: fix cargo run --release error #1723 (xudong963)
substitute parking_lot::Mutex for std::sync::Mutex #1720 (xudong963)
Convert boolean case expressions to boolean logic #1719 (tustvold)
Add Expression Simplification API #1717 (alamb)
Create ListingTableConfig which includes file format and schema inference #1715 (matthewmturner)
make select_to_plan clearer #1714 [sql] (xudong963)
Add upper bound for public function signature #1713 (HaoYang670)
Add tests and CI for optional pyarrow module #1711 (wjones127)
Create SchemaAdapter trait to map table schema to file schemas #1709 (thinkharderdev)
refine test in repartition.rs & coalesce_batches.rs #1707 (xudong963)
Fuzz test for spillable sort #1706 (yjshen)
Support create_physical_expr and ExecutionContextState or DefaultPhysicalPlanner for faster speed #1700 (alamb)
Implement TableProvider for DataFrameImpl #1699 (cpcloud)
Move timestamp related tests out of context.rs and into sql integration test #1696 (alamb)
Lazy TempDir creation in DiskManager #1695 (alamb)
Add MemTrackingMetrics to ease memory tracking for non-limited memory consumers #1691 (yjshen)
(minor) Reduce memory manager and disk manager logs from info! to debug! #1689 (alamb)
Make SortPreservingMergeStream stable on input stream order #1687 (alamb)
Incorporate dyn scalar kernels #1685 (matthewmturner)
Move information_schema tests out of execution/context.rs to sql_integration tests #1684 (alamb)
Add a new metric type: Gauge + CurrentMemoryUsage to metrics #1682 (yjshen)
refactor array_agg to not to have update and merge #1681 (Jimexist)
Use NamedTempFile rather than String in DiskManager #1680 (alamb)
upgrade clap to version 3 #1672 (Jimexist)
Improve configuration and resource use of MemoryManager and DiskManager #1668 (alamb)
feat: Support quarter granularity in date_trunc function #1667 (ovr)
Fix can not load parquet table form spark in datafusion-cli. #1665 (Ted-Jiang)
Make MemoryManager and MemoryStream public #1664 (yjshen)
[Cleanup] Move AggregatedMetricsSet to metrics for further reuse #1663 (yjshen)
fix: substr - correct behaivour with negative start pos #1660 (ovr)
suppport bitwise and as an example #1653 [sql] (liukun4515)
refine match pattern related code #1650 (xudong963)
update md-5, sha2, blake2 #1647 (xudong963)
Add DataFusionError -> ArrowError conversion #1643 (alamb)
Add spill_count and spilled_bytes to BaselineMetrics, test sort with spill #1641 (yjshen)
support hash decimal array and group by #1640 (liukun4515)
Consolidate Schema and RecordBatch projection #1638 (alamb)
Update hashbrown requirement from 0.11 to 0.12 #1631 (dependabot[bot])
Update pyo3 requirement from 0.14 to 0.15 #1627 (dependabot[bot])
Optimize SortPreservingMergeStream to avoid SortKeyCursor sharing #1624 (yjshen)
Handle merging of evolved schemas in ParquetExec #1622 (thinkharderdev)
feat: Support Substring(str [from int] [for int]) #1621 [sql] (ovr)
feat: Support complex interval via IntervalMonthDayNano #1615 [sql] (ovr)
consolidate binary_expr coercion rule code into binary_rule.rs module #1607 (alamb)
Fix comparison of dictionary arrays #1606 (alamb)
add test for decimal to decimal #1603 (liukun4515)
update nightly version #1597 (Jimexist)
Consolidate sort and external_sort #1596 (yjshen)
support from_slice for binary, string, and boolean array types #1589 (Jimexist)
add from_slice trait to ease arrow2 migration #1588 (Jimexist)
Implement ARRAY_AGG(DISTINCT ...) #1579 (james727)
Rename sql integration tests from mod to sql_integration #1575 (alamb)
minor: improve the benchmark readme #1567 (xudong963)
Consolidate batch_size configuration in ExecutionConfig, RuntimeConfig and PhysicalPlanConfig #1562 (yjshen)
Update to rust 1.58 #1557 (xudong963)
support mathematics operation for decimal data type #1554 (liukun4515)
Address clippy warnings #1553 (sergey-melnychuk)
enhance arithmetic operation for array with scalar #1552 (liukun4515)
Remove unused update and merge implementations from Aggregates and supporting ScalarValue arithmetic #1550 (alamb)
Add batch operations to stddev #1547 (realno)
Mark ARRAY_AGG(DISTINCT ...) not implemented #1534 (james727)
Update to arrow-7.0.0 #1523 (alamb)
Fix ORDER BY on aggregate #1506 (viirya)
Add example on how to query multiple parquet files #1497 (nitisht)
Refactor testing modules #1491 (hntd187)
add rfcs for datafusion #1490 (xudong963)
support comparison for decimal data type and refactor the binary coercion rule #1483 (liukun4515)
Minor: Rename predicate_builder --> pruning_predicate for consistency #1481 (alamb)
Tests for support try_cast/cast decimal to numeric #1465 (liukun4515)
Avoid send empty batches for Hash partitioning. #1459 (Ted-Jiang)
Planner code cleanup #1450 [sql] (alamb)
Fix bug in projection: "column types must match schema types, expected XXX but found YYY" #1448 (alamb)
Update arrow-rs to 6.4.0 and replace boolean comparison in datafusion with arrow compute kernel #1446 (xudong963)
support cast/try_cast for decimal: signed numeric to decimal #1442 (liukun4515)
Consolidate decimal error checking and improve error messages #1438 [sql] (alamb)
use 0.13 sql parser #1435 (Jimexist)
Minor Code cleanups #1428 (alamb)
Clarify communication on bi-weekly sync #1427 (alamb)
support sum/avg agg for decimal, change sum(float32) --> float64 #1408 [sql] (liukun4515)
Fix bugs with nullability during rewrites: Combine simplify and Simplifier #1401 (alamb)
Minimize features #1399 (carols10cents)
Update rust vesion to 1.57 #1395 [sql] (xudong963)
support decimal scalar value #1394 (liukun4515)
Add coercion rules for AggregateFunctions #1387 (liukun4515)
upgrade the arrow-rs version #1385 (liukun4515)
add array agg name #1382 (liukun4515)
Make tests for simplify and Simplifer consistent #1376 (alamb)
Refactor: Consolidate expression simplification code in simplify_expression.rs #1374 (alamb)
remove unused code in hash_aggregate #1370 (ic4y)
Use BufReader for LocalFileReader to revert performance regression in parquet reading #1366 (Dandandan)
Add unit test for constant folding on values #1355 (viirya)
Extract logical plan: rename the plan name (follow up) #1354 [sql] (liukun4515)
Moved aggr_test_schema to test_utils #1338 (rdettai)
upgrade arrow-rs to 6.2.0 #1334 (liukun4515)
Update release instructions #1331 (alamb)
#1268: allow datafusion-cli to toggle quiet flag within CLI #1330 (jgoday)
Extract Aggregate, Sort, and Join to struct from AggregatePlan #1326 (matthewmturner)
Extract EmptyRelation, Limit, Values from LogicalPlan #1325 (liukun4515)
Extract CrossJoin, Repartition, Union in LogicalPlan #1322 (liukun4515)
Fifth batch of updating sql tests to use assert_batches_eq #1318 (matthewmturner)
Extract Explain, Analyze, Extension in LogicalPlan as independent struct #1317 [sql] (xudong963)
Extract CreateMemoryTable, DropTable, CreateExternalTable in LogicalPlan as independent struct #1311 [sql] (liukun4515)
Extract Projection, Filter, Window in LogicalPlan as independent struct #1309 (ic4y)
Add PSQL comparison tests for except, intersect #1292 (mrob95)
Extract logical plans in LogicalPlan as independent struct: TableScan #1290 (xudong963)
Add statement helper command to cli #1285 (matthewmturner)
Python bindings for window functions #819 [sql] (jgoday)

6.0.0 (2021-11-13)

Full Changelog

Breaking changes:

Removed deprecated with_concurrency #1200 (rdettai)
File partitioning for ListingTable #1141 (rdettai)
Add function volatility to Signature #1071 [sql] (pjmore)
fix: allow duplicate field names in table join, fix output with duplicated names #1023 (houqp)
Make TableProvider.scan() and PhysicalPlanner::create_physical_plan() async #1013 (rdettai)
Reorganize table providers by table format #1010 (rdettai)
Make Metrics::labels() public #999 (alamb)
Rename NthValue::{first_value,last_value,nth_value} to satisfy clippy in Rust 1.55 #986 (alamb)
Move CBOs and Statistics to physical plan #965 (rdettai)
Update to sqlparser v 0.10.0 #934 [sql] (alamb)
FilePartition and PartitionedFile for scanning flexibility #932 [sql] (yjshen)
Improve SQLMetric APIs, port existing metrics #908 (alamb)
Add support for EXPLAIN ANALYZE #858 [sql] (alamb)
Rename concurrency to target_partitions #706 (andygrove)

Implemented enhancements:

Add booleans support to the CASE statement #1156
Implement General Purpose Constant Folding with the Expression Evaluator #1070
Mark volatility categories of functions #1069
Add "show" support to DataFrame API #937
Add support for TRIM BOTH/LEADING/TRAILING #935
Add "baseline" metrics to all built in operators #866
Add SQL support for referencing fields in structs #119
add filename completer for create table statement #1278 (Jimexist)
Add drop table support #1266 [sql] (viirya)
Dataframe supports except and update readme #1261 (xudong963)
Implement EXCEPT & EXCEPT DISTINCT #1259 [sql] (xudong963)
Add DataFrame support for INTERSECT and update readme #1258 (xudong963)
use arrow 6.1.0 #1255 (Jimexist)
fix 1250, add editor support for datafusion cli with validation #1251 (Jimexist)
Add support for create table as via MemTable #1243 [sql] (Dandandan)
Add cli show columns command to describe tables #1231 (Jimexist)
datafusion-cli to add list table command #1229 (Jimexist)
datafusion cli to handle EoF and interrupt signal #1225 (Jimexist)
add \q as quit command and add ? for help #1224 (Jimexist)
Add algebraic simplifications to constant_folding #1208 (matthewmturner)
Improve GetIndexedFieldExpr adding utf8 key based access for struct v… #1204 [sql] (Igosuki)
Fix between in select query #1202 [sql] (capkurmagati)
Move code to fold Stable functions like now() from Simplifier to ConstEvaluator #1176 (alamb)
DataFrame supports window function #1167 [sql] (xudong963)
add values list expression #1165 [sql] (Jimexist)
Add booleans support to the CASE statement #1161 (xudong963)
Improve error messages when operations are not supported #1158 (alamb)
Generic constant expression evaluation #1153 (alamb)
python lit function to support bool and byte vec #1152 (Jimexist)
[nit] simplify datafusion optimizer module codes #1146 (panarch)
Add ScalarValue support for arbitrary list elements #1142 (jonmmease)
Multiple files per partitions for CSV Avro Json #1138 (rdettai)
Implement INTERSECT & INTERSECT DISTINCT #1135 [sql] (xudong963)
Simplify file struct abstractions #1120 (rdettai)
Implement is [not] distinct from #1117 [sql] (Dandandan)
Clean up spawned task on drop for RepartitionExec, SortPreservingMergeExec, WindowAggExec #1112 (crepererum)
add hyperloglog implementation (add and count) #1095 (Jimexist)
Add ScalarValue::Struct variant #1091 (jonmmease)
add digest(utf8, method) function and refactor all current hash digest functions #1090 (Jimexist)
[crypto] add blake3 algorithm to digest function #1086 (Jimexist)
[crypto] add blake2b and blake2s functions #1081 (Jimexist)
[nit] make schema qualifier error message in field lookup more readable #1079 (Jimexist)
[window function] add percent_rank window function #1077 (Jimexist)
[window function] add cume_dist implementation #1076 (Jimexist)
Add a LogicalPlanBuilder::schema() function #1075 (alamb)
Add support for UNION [DISTINCT] sql #1068 [sql] (xudong963)
fix: fix joins on Float32/Float64 columns bug #1054 (francis-du)
Update sqlparser-rs to 0.11 #1052 [sql] (alamb)
Support querying CSV files without providing the schema #1050 [sql] (xudong963)
remove hard coded partition count in ballista logicalplan deserialization #1044 (xudong963)
feat: add lit_timestamp_nanosecond #1030 (NGA-TRAN)
Ignore metadata on schema merge #1024 (Smurphy000)
add ExecutionConfig.with_optimizer_rules #1022 (seddonm1)
Add baseline execution stats to WindowAggExec and UnionExec, and fixup CoalescePartitionsExec #1018 (alamb)
Derive PartialOrd for Expr #1015 (alamb)
Indexed field access for List #1006 [sql] (Igosuki)
Add metrics for Limit and Projection, and CoalesceBatches #1004 (alamb)
Update DataFusion to arrow 6.0 #984 (alamb)
Implement Display for Expr, improve operator display #971 [sql] (matthewmturner)
Add metrics for FilterExec #960 (alamb)
Change compound column field name rules #952 (waynexia)
ObjectStore API to read from remote storage systems #950 (yjshen)
Add baseline metrics to SortPreservingMergeExec #948 (alamb)
Add support for TRIM LEADING/TRAILING/BOTH syntax #947 [sql] (adsharma)
fixes #933 replace placeholder fmt_as fr ExecutionPlan impls #939 (tiphaineruy)
Add metrics for SortExect + HashAggregateExec #938 (alamb)
Add some additional asserts in utils::from_plan #930 (alamb)
Avro Table Provider #910 [sql] (Igosuki)
Add BaselineMetrics, Timestamp metrics, add for CoalescePartitionsExec, rename output_time -> elapsed_compute #909 (alamb)
add cross join support to ballista #891 (houqp)
Add Ballista support to DataFusion CLI #889 (andygrove)
support like on DictionaryArray #876 (b41sh)
Register table based on known schema without file IO #872 (Dandandan)
Add support for PostgreSQL regex match #870 [sql] (b41sh)
Include planning time in datafusion-cli printing #860 (Dandandan)
Implement basic common subexpression eliminate optimization #792 (waynexia)
Impl ops::Not for expr #763 (Jimexist)

Fixed bugs:

Can not use between in the select list: #1196
ORDER BY does not work with literals: Sort operation is not applicable to scalar value 'foo' #1195
window functions with NULL literals in partition by and order by do not work: Internal("Sort operation is not applicable to scalar value NULL") #1194
Operation name not included in internal errors -- Internal("Data type Boolean not supported for binary operation on dyn arrays") #1157
Physical plan explain UNION query says "ExecutionPlan(PlaceHolder)" #933
Can not use LIKE on DictionaryArray encoded strings #815
physical_plan::repartition::tests::repartition_with_dropping_output_stream failing locally #614
Fix some BuiltinScalarFunction panics with zero arguments #1249 (capkurmagati)
fix: not do boolean folding on NULL and/or expr #1245 (NGA-TRAN)
ignore case of with header row in sql when creating external table #1237 [sql] (lichuan6)
fix: Min/Max aggregation data type should not be dictionary #1235 (NGA-TRAN)
Fix build with --no-default-features #1219 (alamb)
Prevent "future cannot be sent between threads safely" compilation error #1155 (jonmmease)
Clean up spawned task on drop for AnalyzeExec, CoalescePartitionsExec, HashAggregateExec #1121 (crepererum)
Clean up spawned task on SortStream drop #1105 (crepererum)
fix UNION ALL bug: thread 'main' panicked at 'index out of bounds: the len is 1 but the index is 1', ./src/datatypes/schema.rs:165:10 #1088 (xudong963)
python: fix generated table name in dataframe creation #1078 (houqp)
fix subquery alias #1067 [sql] (xudong963)
fix pattern handling in regexp_match function #1065 (houqp)
fix: joins on Timestamp columns #1055 (francis-du)
Fix metric name typo #943 (alamb)
EXPLAIN ANALYZE should run all Optimizer passes #929 (alamb)

Documentation updates:

update docs to fix DataFusion User Guide link #1238 (jiangzhx)
[docs] datafusion cli run via homebrew #1198 (Jimexist)
add support for unary and binary values in values list, update docs #1172 [sql] (Jimexist)
Add additional docstring comments to from_plan #1168 (alamb)
[nit] fix document issue for approx_distinct #1110 (Jimexist)
implement approx_distinct function using HyperLogLog #1087 (Jimexist)
Remove unused use statements from examples #1032 (alamb)
consolidate datafusion docs with sphinx #993 (houqp)
Updated user-guide library docs with optimized config #976 (matthewmturner)
Improve User Guide #954 (andygrove)
[MINOR] Fix typos in doc comments #945 (alamb)
[DataFusion] - Add show and show_limit function for DataFrame #923 (francis-du)
Typo fix in DataFusion crate documentation #914 (antoinewdg)

Performance improvements:

Improve avro reader performance by avoiding some cloning on avro_rs::Value #1206 (Igosuki)
optimize build profile for datafusion python binding, cli and ballista #1137 (houqp)
Avoid stack overflow by reducing stack usage of BinaryExpr::evaluate in debug builds #1047 (alamb)
Add ScalarValue::eq_array optimized comparison function #844 (alamb)
Rework GroupByHash to for faster performance and support grouping by nulls #808 (alamb)

Closed issues:

InList expr with NULL literals do not work #1190
update the homepage README to include values, approx_distinct, etc. #1171
[Python]: Inconsistencies with Python package name #1011
Wanting to contribute to project where to start? #983
delete redundant code #973
How to build DataFusion python wheel #853
Add support for partition pruning #204
[Datafusion] Support joins on TimestampMillisecond columns #187
TPC-H Query 21 #173
TPC-H Query 13 #164
TPC-H Query 8 #162
implement split_part(string, delimiter, position) #157
Join Statement: Schema contains duplicate unqualified field name #155
ParquetTable should avoid scanning all files twice #136
Add support for reading partitioned Parquet files #133
Add support for Parquet schema merging #132
Catalog abstraction #126
Optimizer rules should work with qualified column names #125
Add optional qualifier to Expr::Column #121
Implement modulus expression #99
[Rust] Add constant folding to expressions during logically planning #98
[Rust] Implement pretty print for physical query plan #93
Can not group by boolean columns (add boolean to valid keys of groupBy) #91
improve performance of building literal arrays #90
[rust][datafusion] optimize count(*) queries on parquet sources #89
Produce a design for a metrics framework #21

Merged pull requests:

Add timezome string to stablize test #1265 (viirya)
numerical_coercion pattern match optimize #1256 (Jimexist)
fix and update window function sql tests #1059 (Jimexist)
reduce ScalarValue from trait boilerplate with macro #989 (houqp)

For older versions, see apache/arrow/CHANGELOG.md

5.0.0 (2021-08-10)

Full Changelog

Breaking changes:

Box ScalarValue:Lists, reduce size by half size #788 (alamb)
JOIN conditions are order dependent #778 (seddonm1)
Show the result of all optimizer passes in EXPLAIN VERBOSE #759 (alamb)
#723 Datafusion add option in ExecutionConfig to enable/disable parquet pruning #749 (lvheyang)
Update API for extension planning to include logical plan #643 (alamb)
Rename MergeExec to CoalescePartitionsExec #635 (andygrove)
fix 593, reduce cloning by taking ownership in logical planner's from fn #610 (Jimexist)
fix join column handling logic for On and Using constraints #605 (houqp)
Rewrite pruning logic in terms of PruningStatistics using Array trait (option 2) #426 (alamb)
Support reading from NdJson formatted data sources #404 (heymind)
Add metrics to RepartitionExec #398 (andygrove)
Use 4.x arrow-rs from crates.io rather than git sha #395 (alamb)
Return Vec<bool> from PredicateBuilder rather than an Fn #370 (alamb)
Refactor: move RowGroupPredicateBuilder into its own module, rename to PruningPredicateBuilder #365 (alamb)
[Datafusion] NOW() function support #288 (msathis)
Implement select distinct #262 (Dandandan)
Refactor datafusion/src/physical_plan/common.rs build_file_list to take less param and reuse code #253 (Jimexist)
Support qualified columns in queries #55 (houqp)
Read CSV format text from stdin or memory #54 (heymind)
Use atomics for SQLMetric implementation, remove unused name field #25 (returnString)

Implemented enhancements:

Allow extension nodes to correctly plan physical expressions with relations #642
Filters aren't passed down to table scans in a union #557
Support pruning for boolean columns #490
Implement SQLMetrics for RepartitionExec #397
DataFusion benchmarks should show executed plan with metrics after query completes #396
Use published versions of arrow rather than github shas #393
Add Compare to GroupByScalar #364
Reusable "row group pruning" logic #363
Add an Order Preserving merge operator #362
Implement Postgres compatible now() function #251
COUNT DISTINCT does not support dictionary types #249
Use standard make_null_array for CASE #222
Implement date_trunc() function #203
COUNT DISTINCT does not support for Float64 #199
Update SQLMetric to use atomics rather than a Mutex #30
Implement PartialOrd for ScalarValue #838 (viirya)
Support date datatypes in max/min #820 (viirya)
Implement vectorized hashing for DictionaryArray types #812 (alamb)
Convert unsupported conditions in left right join to filters #796 [sql] (Dandandan)
Implement streaming versions of Dataframe.collect methods #789 (andygrove)
impl from str for column and scalar #762 (Jimexist)
impl fmt::Display for PlanType #752 (Jimexist)
Remove unnecessary projection in logical plan optimization phase #747 (waynexia)
Support table columns alias #735 (Dandandan)
Derive PartialEq for datasource enums #734 (alamb)
Allow filetype to be lowercase, Implement FromStr for FileType #728 (Jimexist)
Update to use arrow 5.0 #721 (alamb)
#554: Lead/lag window function with offset and default value arguments #687 (jgoday)
dedup using join column in wildcard expansion #678 (houqp)
Implement metrics for HashJoinExec #664 (andygrove)
Show physical plan with metrics in benchmark #662 (andygrove)
Allow non-equijoin filters in join condition #660 (Dandandan)
Add End-to-end test for parquet pruning + metrics for ParquetExec #657 (alamb)
Add support for leading field in interval #647 (Dandandan)
Remove hard-coded PartitionMode from Ballista serde #637 (andygrove)
Ballista: Implement scalable distributed joins #634 (andygrove)
implement rank and dense_rank function and refactor built-in window function evaluation #631 (Jimexist)
Improve "field not found" error messages #625 (andygrove)
Support modulus op #577 (gangliao)
implement std::default::Default for execution config #570 (Jimexist)
to_timestamp_millis(), to_timestamp_micros(), to_timestamp_seconds() #567 (velvia)
Filter push down for Union #559 (Dandandan)
Implement window functions with partition_by clause #558 (Jimexist)
support table alias in join clause #547 (houqp)
Not equal predicate in physical_planning pruning #544 (jgoday)
add error handling and boundary checking for window frames #530 (Jimexist)
Implement window functions with order_by clause #520 (Jimexist)
support group by column positions #519 [sql] (jychen7)
Implement constant folding for CAST #513 (msathis)
Add window frame constructs - alternative #506 (Jimexist)
Add partition by constructs in window functions and modify logical planning #501 (Jimexist)
Add support for boolean columns in pruning logic #500 (alamb)
#215 resolve aliases for group by exprs #485 (jychen7)
Support anti join #482 (Dandandan)
Support semi join #470 (Dandandan)
add order by construct in window function and logical plans #463 (Jimexist)
Remove reundant filters (e.g. c> 5 AND c>5 --> c>5) #436 (jgoday)
fix: display the content of debug explain #434 (NGA-TRAN)
implement lead and lag built-in window function #429 (Jimexist)
add support for ndjson for datafusion-cli #427 (Jimexist)
add first_value, last_value, and nth_value built-in window functions #403 (Jimexist)
export both now and random functions #389 (Jimexist)
Function to create ArrayRef from an iterator of ScalarValues #381 (alamb)
Sort preserving merge (#362) #379 (tustvold)
Add support for multiple partitions with SortExec (#362) #378 (tustvold)
add window expression stream, delegated window aggregation to aggregate functions, and implement row_number #375 (Jimexist)
Add PartialOrd and Ord to GroupByScalar (#364) #368 (tustvold)
Implement readable explain plans for physical plans #337 (alamb)
Add window expression part 1 - logical and physical planning, structure, to/from proto, and explain, for empty over clause only #334 (Jimexist)
Use NullArray to Pass row count to ScalarFunctions that take 0 arguments #328 (Jimexist)
add --quiet/-q flag and allow timing info to be turned on/off #323 (Jimexist)
Implement hash partitioned aggregation #320 (Dandandan)
Support COUNT(DISTINCT timestamps) #319 (charlibot)
add random SQL function #303 (Jimexist)
allow datafusion cli to take -- comments #296 (Jimexist)
Add json print format mode to datafusion cli #295 (Jimexist)
Add print format param with support for tsv print format to datafusion cli #292 (Jimexist)
Add print format param and support for csv print format to datafusion cli #289 (Jimexist)
allow datafusion-cli to take a file param #285 (Jimexist)
add param validation for datafusion-cli #284 (Jimexist)
[breaking change] fix 265, log should be log10, and add ln #271 (Jimexist)
Implement count distinct for dictionary arrays #256 (alamb)
Count distinct floats #252 (pjmore)
Add rule to eliminate LIMIT 0 and replace it with an EmptyRelation #213 (Dandandan)
Allow table providers to indicate their type for catalog metadata #205 (returnString)
Use arrow eq kernels in CaseWhen expression evaluation #52 (Dandandan)
Re-export Arrow and Parquet crates from DataFusion #39 (returnString)
[DataFusion] Optimize hash join inner workings, null handling fix #24 (Dandandan)
[ARROW-12441] [DataFusion] Cross join implementation #11 (Dandandan)

Fixed bugs:

Projection pushdown removes unqualified column names even when they are used #617
Panic while running join datatypes/schema.rs:165:10 #601
Indentation is incorrect for joins in formatted physical plans #345
Error while running COUNT DISTINCT (timestamp): 'Unexpected DataType for list #314
When joining two tables, get Error: Plan("Schema contains duplicate unqualified field name 'xxx'") #311
Incorrect answers with SELECT DISTINCT queries #250
Intermitent failure in CI join_with_hash_collision #227
Concat from Dataframe API no longer accepts multiple expressions #226
Fix right, full join handling when having multiple non-matching rows at the left side #845 (Dandandan)
Qualified field resolution too strict #810 [sql] (seddonm1)
Better join order resolution logic #797 [sql] (seddonm1)
Produce correct answers for Group BY NULL (Option 1) #793 (alamb)
Use consistent version of string_to_timestamp_nanos in DataFusion #767 (alamb)
#723 limit pruning rule to simple expression #764 (lvheyang)
#699 fix return type conflict when calling builtin math fuctions #716 (lvheyang)
Fix Date32 and Date64 parquet row group pruning #690 (alamb)
Remove qualifiers on pushed down predicates / Fix parquet pruning #689 (alamb)
use Weak ptr to break catalog list <> info schema cyclic reference #681 (crepererum)
honor table name for csv/parquet scan in ballista plan serde #629 (houqp)
fix 621, where unnamed window functions shall be differentiated by partition and order by clause #622 (Jimexist)
RFC: Do not prune out unnecessary columns with unqualified references #619 (alamb)
[fix] select * on empty table #613 (rdettai)
fix 592, support alias in window functions #607 (Jimexist)
RepartitionExec should not error if output has hung up #576 (alamb)
Fix pruning on not equal predicate #561 (alamb)
hash float arrays using primitive usigned integer type #556 (houqp)
Return errors properly from RepartitionExec #521 (alamb)
refactor sort exec stream and combine batches #515 (Jimexist)
Fix display of execution time in datafusion-cli #514 (Dandandan)
Wrong aggregation arguments error. #505 (jgoday)
fix window aggregation with alias and add integration test case #454 (Jimexist)
fix: don't duplicate existing filters #409 (e-dard)
Fixed incorrect logical type in GroupByScalar. #391 (jorgecarleitao)
Fix indented display for multi-child nodes #358 (alamb)
Fix SQL planner to support multibyte column names #357 (agatan)
Fix wrong projection 'optimization' #268 (Dandandan)
Fix Left join implementation is incorrect for 0 or multiple batches on the right side #238 (Dandandan)
Count distinct boolean #230 (pjmore)
Fix Filter / where clause without column names is removed in optimization pass #225 (Dandandan)

Documentation updates:

No way to get to the examples from docs.rs #186
Update docs to use vendored version of arrow #772 (alamb)
Fix typo in DEVELOPERS.md #692 (lvheyang)
update stale documentations related to window functions #598 (Jimexist)
update readme to reflect work on window functions #471 (Jimexist)
Add examples section to datafusion crate doc #457 (mluts)
add invariants spec #443 (houqp)
add output field name rfc #422 (houqp)
Update more docs and also the developer.md doc #414 (Jimexist)
use prettier to format md files #367 (Jimexist)
Add new logo svg with white background #313 (parthsarthy)
Add projects (Squirtle and Tensorbase) to list in readme #312 (parthsarthy)
docs - fix the ballista link #274 (haoxins)
misc(README): Replace Cube.js with Cube Store #248 (ovr)
Initial docs for SQL syntax #242 (Dandandan)
Deduplicate README.md #79 (msathis)

Performance improvements:

Speed up inlist for strings and primitives #813 (Dandandan)
perf: improve performance of SortPreservingMergeExec operator #722 (e-dard)
Optimize min/max queries with table statistics #719 (b41sh)
perf: Improve materialisation performance of SortPreservingMergeExec #691 (e-dard)
Optimize count(*) with table statistics #620 (Dandandan)
optimize window function's find_ranges_in_range #595 (Jimexist)
Collapse sort into window expr and do sort within logical phase #571 (Jimexist)
Use repartition in window functions to speed up #569 (Jimexist)
Constant fold / optimize to_timestamp function during planning #387 (msathis)
Speed up create_batch_from_map #339 (Dandandan)
Simplify math expression code (use unary kernel) #309 (Dandandan)

Closed issues:

Confirm git tagging strategy for releases #770
arrow::util::pretty::pretty_format_batches missing #769
move the assert_batches_eq! macros to a non part of datafusion #745
fix an issue where aliases are not respected in generating downstream schemas in window expr #592
make the planner to print more succinct and useful information in window function explain clause #526
move window frame module to be in logical_plan #517
use a more rust idiomatic way of handling nth_value #448
create a test with more than one partition for window functions #435
COUNT DISTINCT does not support for Boolean #202
Read CSV format text from stdin or memory #198
Fix null handling hash join #195
Allow TableProviders to indicate their type for the information schema #191
Make DataFrame extensible #190
TPC-H Query 19 #170
TPC-H Query 7 #161
Upgrade hashbrown to 0.10 #151
Implement vectorized hashing for hash aggregate #149
More efficient LEFT join implementation #143
Implement vectorized hashing #142
RFC Roadmap for 2021 (DataFusion) #140
Implement hash partitioning #131
Grouping by column position #110
[Datafusion] GROUP BY with a high cardinality doesn't seem to finish #107
[Rust] Add support for JSON data sources #103
[Rust] Implement metrics framework #95
Publically export Arrow crate from datafusion #36
Implement hash-partitioned hash aggregate #27
Consider using GitHub pages for DataFusion/Ballista documentation #18
Update "repository" in Cargo.toml #16

Merged pull requests:

Use RawTable API in hash join #827 (Dandandan)
Add test for window functions on dictionary #823 (alamb)
Update dependencies: prost to 0.8 and tonic to 0.5 #818 (alamb)
Move hash_array into hash_utils.rs #807 (alamb)
Remove GroupByScalar and use ScalarValue in preparation for supporting null values in GroupBy #786 (alamb)
fix 226, make concat, concat_ws, and random work with Python crate #761 (Jimexist)
Test for parquet pruning disabling #754 (alamb)
Add explain verbose with limit push down #751 (Jimexist)
Move assert_batches_eq! macros to test_utils.rs #746 (alamb)
Show optimized physical and logical plans in EXPLAIN #744 (alamb)
update python crate to support latest pyo3 syntax and gil sematics #741 (Jimexist)
update python crate dependencies #740 (Jimexist)
provide more details on required .parquet file extension error message #729 (Jimexist)
split up windows functions into a dedicated module with separate files #724 (Jimexist)
Use pytest in integration test #715 (Jimexist)
replace once iter chain with array::IntoIter #704 (houqp)
avoid iterator materialization in column index lookup #703 (houqp)
Fix build with 1.52.1 #696 (alamb)
Fix test output due to logical merge conflict #694 (alamb)
add more integration tests #668 (Jimexist)
Bump arrow and parquet versions to 4.4 #654 (toddtreece)
Add query 15 to TPC-H queries #645 (Dandandan)
Improve error message and comments #641 (alamb)
add integration tests for rank, dense_rank, fix last_value evaluation with rank #638 (Jimexist)
round trip TPCH queries in tests #630 (houqp)
use Into<String> as argument type wherever applicable #615 (houqp)
reuse alias map in aggregate logical planning and refactor position resolution #606 (Jimexist)
fix clippy warnings #581 (Jimexist)
Add benchmarks to window function queries #564 (Jimexist)
reuse code for now function expr creation #548 (houqp)
turn on clippy rule for needless borrow #545 (Jimexist)
Refactor hash aggregates's planner building code #539 (Jimexist)
Cleanup Repartition Exec code #538 (alamb)
reuse datafusion physical planner in ballista building from protobuf #532 (Jimexist)
remove redundant into_iter() calls #527 (Jimexist)
Fix 517 - move window_frames module to logical_plan #518 (Jimexist)
Refactor window aggregation, simplify batch processing logic #516 (Jimexist)
Add datafusion::test_util, resolve test data paths without env vars #498 (mluts)
Avoid warnings in tests when compiling without default features #489 (alamb)
update cargo.toml in python crate and fix unit test due to hash joins #483 (Jimexist)
use prettier check in CI #453 (Jimexist)
Optimize nth_value, remove first_value, last_value structs and use idiomatic rust style #452 (Jimexist)
Fixed typo / logical merge conflict #433 (jorgecarleitao)
include test data and add aggregation tests in integration test #425 (Jimexist)
Add some padding around the logo #411 (parthsarthy)
Benchmark subcommand to distinguish between DataFusion and Ballista #402 (jgoday)
refactor datafusion/scalar_value to use more macro and avoid dup code #392 (Jimexist)
Update TPC-H benchmark to show physical plan when debug mode is enabled #386 (andygrove)
Update arrow dependencies again #341 (alamb)
Update arrow-rs deps #317 (alamb)
Update PR template by commenting out instructions #315 (alamb)
fix clippy warning #286 (Jimexist)
add integration test to compare datafusion-cli against psql #281 (Jimexist)
Update arrow deps #269 (alamb)
Use multi-stage build dockerfile in datafusion-cli and reduce image size from 2.16GB to 89.9MB #266 (Jimexist)
Enable redundant_field_names clippy lint #261 (Dandandan)
fix clippy lint #259 (alamb)
Move datafusion-cli to new crate #231 (Dandandan)
Make test join_with_hash_collision deterministic #229 (Dandandan)
Update arrow-rs deps (to fix build due to flatbuffers update) #224 (alamb)
Use standard make_null_array for CASE #223 (alamb)
update arrow-rs deps to latest master #216 (alamb)
MINOR: Remove empty rust dir #61 (andygrove)

* This Changelog was automatically generated by github_changelog_generator

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CHANGELOG.md

CHANGELOG.md

Changelog

15.0.0 (2022-12-01)

14.0.0-rc1 (2022-11-04)

14.0.0 (2022-11-04)

13.0.0-rc1 (2022-10-07)

13.0.0 (2022-10-06)

12.0.0 (2022-09-12)

11.0.0 (2022-08-16)

10.0.0-rc1 (2022-07-12)

10.0.0 (2022-07-12)

9.0.0 (2022-06-10)

8.0.0 (2022-05-12)

7.1.0 (2022-04-10)

7.0.0 (2022-02-14)

6.0.0 (2021-11-13)

5.0.0 (2021-08-10)

Files

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

Changelog

15.0.0 (2022-12-01)

14.0.0-rc1 (2022-11-04)

14.0.0 (2022-11-04)

13.0.0-rc1 (2022-10-07)

13.0.0 (2022-10-06)

12.0.0 (2022-09-12)

11.0.0 (2022-08-16)

10.0.0-rc1 (2022-07-12)

10.0.0 (2022-07-12)

9.0.0 (2022-06-10)

8.0.0 (2022-05-12)

7.1.0 (2022-04-10)

7.0.0 (2022-02-14)

6.0.0 (2021-11-13)

5.0.0 (2021-08-10)