Skip to content

Latest commit

 

History

History
2749 lines (2562 loc) · 367 KB

CHANGELOG.md

File metadata and controls

2749 lines (2562 loc) · 367 KB

Changelog

15.0.0 (2022-12-01)

Full Changelog

Breaking changes:

  • Expose remaining parquet config options into ConfigOptions (try 2) #4427 (alamb)
  • Config Cleanup: Remove TaskProperties and KV structure, keep key=value serialization #4382 (alamb)
  • add {TDigest,ScalarValue,Accumulator}::size #4342 (crepererum)
  • API-break: Support SubqueryAlias and remove Alias in Projection #4333 [sql] (jackwener)
  • split try_new_with_schema_alias from original code #4284 (jackwener)
  • Collapse statistics in normal explain plan #4157 (alamb)
  • Linearize binary expressions to reduce proto tree complexity #4115 (isidentical)
  • support SET Timezone #4107 [sql] (waitingkuo)

Implemented enhancements:

  • Refactor Built-in, Aggregate window functions to increase code reuse. #4440
  • Helper to get "root" error #4435
  • Do NOT convert intermediate/source errors to strings. #4434
  • Estimate the total_byte_size of the filter expression's result when selectivity is available #4374
  • refactor the code of the HashJoin #4356
  • CoalesceBatchesExec reports no ordering #4331
  • Introduce tournament tree to achieve better k-way sort-merging #4300
  • Add a checker to confirm physical optimizer rules will keep the physical plan schema immutable #4299
  • Remove the macro rule unary_scalar_expr from expr_fn.rs #4298
  • Remove Alias-in-Projection, replace it with SubqueryAlias #4291
  • reimplement reduce_outer_join #4270
  • Reimplement filter_push_down #4266
  • Reimplement eliminate_limit #4264
  • Reimplement limit_push_down #4263
  • Make a data driven SQL testing tool (so we can reuse duckdb test suite, example) #4248
  • upgrade chrono to 0.4.23 #4224
  • support scan non-string columns partitioned parquet files #4218
  • Allow optimizer rules to skip optimizing plans #4209
  • Supporting specifying schema when create tables #4183
  • Improve ergonomics of creating ListingOptions #4178
  • Add ability to specify external sort information for ParquetExec #4169
  • Add another method to collect referenced columns from an expression #4152
  • Improve EXPLAIN ANALYZE output for parquet exec #4144
  • TableProviderFactory::create should have Optional<DFSchemaRef> parameter #4142
  • Support more expressions in equality join #4140
  • JoinSelection Rule to choose physical join implementation: HashJoin(Partitioned or CollectLeft) or SortMergeJoin base on Stats #4139
  • Allow TPCH tooling to create a combined result for easier processing by outside tools #4127
  • Allow additional options when creating an external table #4125
  • reuse code utils::optimize_children instead of redundant implementation #4120
  • Add test field to PR template #4113
  • Allow for automatic registration of ListingTables #4111
  • Add CI check that configs.md is up-to-date #4108
  • Support SET timezone to non-UTC time zone #4106
  • Parquet predicates contains and true expressions #4091
  • Replace RwLock<HashMap> and Mutex<HashMap> by using DashMap #4077
  • add support for .xz compressed files #4074
  • add a feature gate to make support for compressed files optional #4073
  • Support serializing more deeply nested AND / OR expressions #4066
  • Use f64::total_cmp instead of OrderedFloat #4051
  • Add documentation to make it clear that decimal support is still experimental #4036
  • Simplify Pushed Down Predicates #4020
  • Improve HashJoinExec metrics #4009
  • Move physical plan serde from Ballista to DataFusion #3949
  • Support SubqueryAlias better in planner #3927
  • A framework for expression boundary analysis (and statistics) #3898
  • Replace Filter: Boolean(false) with EmptyRelation #3864
  • Implement statistics estimation for FilterExec #3845
  • Support parquet page filtering for more types: String, Binary(Decimal), Int96 #3833
  • Allow configuring parquet filter pushdown dynamically #3821
  • Unable to register tables in non-cloud S3 servers #3640
  • support more data type in prune for cast/try_cast #3442
  • Disable spill to disk globally #3264
  • Consider to categorize Operator #3216
  • Replace Projection.alias with SubqueryAlias #2212
  • [Optimizer] Eliminate the distinct #2045
  • beautify datafusion's site: https://arrow.apache.org/datafusion/ #1819
  • split datafusion-logical-plan sub-module #1755
  • convert outer join to inner join to improve performance #1585
  • Add sqllogictest for datafusion #1453
  • Add additional simplification rules #1406
  • support more subqueries #1209
  • Add baseline metrics for remaining execution plan nodes #1019
  • Make ExecutionPlan implementations immutable #987
  • Architecture overview may be insufficient in README #980
  • Add a separate configuration setting for parallelism of scanning parquet files #924
  • Support hash repartion elimination #41

Fixed bugs:

  • pyarrow CI failed #4448
  • UnwrapCastInComparison exist bug #4430
  • The CLI panics when passing an invalid explain query #4378
  • HashJoin should return Err when the right side input stream produce Err #4362
  • Optimizer check errors if resulting schema has different metadata #4346
  • Panic with function to_hex #4339
  • LimitPushDown pushdown into limit, result is wrong #4308
  • DESCRIBE statement issue with qualified table references #4303
  • Panic with window function LAST_VALUE #4297
  • CI failed in Compare to postgres #4294
  • Field alias can't work in where clause #4288
  • Some valid filters are not pushed down to parquet scan #4282
  • The type renaming pub type NullColumnarValue = ColumnarValue makes no sense #4271
  • Current limit_push_down can't support cross_join #4256
  • Cargo test fail #4253
  • RightSemi/RightAnti HashJoin has bug, the left_indices is never populated, causing failure to apply join filters. #4247
  • Clippy failures #4245
  • Cannot query s3 data from datafusion-cli #4239
  • Bug parsing interval with negative values #4237
  • cargo test reports errors on the master branch. #4236
  • Doc of the expression functionlog2 is incorrect #4231
  • HashJoin with mode PartitionMode:CollectLeft has bug and can produce wrong result #4230
  • Add ambiguous check when generate projection plan #4210
  • What happened for NDJSON support on CLI? #4198
  • Add ambiguous check when generate join plan #4197
  • Clippy failing on master : error: use of deprecated associated function chrono::NaiveDate::from_ymd: use from_ymd_opt() instead #4187
  • Reimplement the eliminate_cross_join #4176
  • Incorrect handling of column names #4166
  • Update release scripts to support datafusion-benchmarks #4134
  • Bug in interpreting correctly parsed SQL with aliases #4123
  • The percentile argument for ApproxPercentileCont must be Float64, not Decimal128(2, 1) #4103
  • Panic when using array_agg #4080
  • Wrong result for FIRST_VALUE AND LAST_VALUE window functions #4076
  • Round error when casting float to decimal #4071
  • Predicate still has cast when comparing Timestamp(Nano, None) to a timestamp literal, so can't be pushed down or used for pruning #3938
  • Revisit required_child_distribution(), output_partitioning(), output_ordering() implementations in ExecutionPlan's implementations #3653
  • Can't push down projection after do type coercion #3583
  • In some circumstances cast expression is not working #3499
  • output_partitioning() and output_ordering() implementations are wrong in some physical plan implementations with alias #3400
  • Interval Literal doesn't work for timeunit less than millisecond #3204
  • INTERVAL literal with duplicated interval types should raise error #3183
  • Error occurs when only using partition columns in query #1999
  • regex_match does not compile using the g flag #1429
  • between with NULL literals does not work: can't be evaluated because there isn't a common type to coerce the types to #1193
  • [Datafusion] Error with CAST: Unsupported SQL type Time #193

Closed issues:

  • SQL level coverage for when memory limit is exceeded #4404
  • Throw error (not panic) if a listing table specifies an missing partition column #4350
  • Page index pruning fail on complex_expr #4317
  • optimize limit-full join in the limit push down rule #4275
  • infer_schema function is not working with s3 Urls or http endpoints #4269
  • Add support binary boolean operators with nulls #4241
  • Add additional testing to parquet predicate pushdown integration tests #4087
  • Add metrics for parquet page level skipping #4086
  • Add parquet page index pushdown metrics #4058
  • Throw a runtime error if the memory allocated to GroupByHash exceeds a limit #3940
  • support unsigned numeric data type in UnwrapCastInBinaryComparison rule #3702
  • Support type cast in union #2125
  • [EPIC] Memory Limited Sort (Externalized / Spill) #1568
  • Maintain partition information in Union #189
  • Add coercion support for NULL literals #185

Merged pull requests:

  • Make datafusion-sql depend on arrow-schema instead of arrow #4456 [sql] (mbrobbel)
  • replace the comparator for decimal array op scalar using arrow kernel #4453 (liukun4515)
  • Fix pyarrow test #4450 (mvanschellebeeck)
  • Replace &Option<T> with Option<&T> #4446 [sql] (askoa)
  • Improve error handling for array downcasting #4445 (retikulum)
  • Refactor Builtin Window Function Implementation #4441 (mustafasrepo)
  • feat: DataFusionError::find_root #4437 (crepererum)
  • fix: do NOT convert errors to strings but keep the type #4436 (crepererum)
  • The CLI panics when passing an invalid explain query #4429 (comphead)
  • [minor] use arrow kernel concat_batches instead combine_batches #4423 (Ted-Jiang)
  • fix panic on to_hex function for negative numbers #4422 (retikulum)
  • Optimize filter executor in pull-based executor #4421 (xudong963)
  • optimize limit push for join case #4411 (liukun4515)
  • Add integration test for erroring when memory limits are hit #4406 (alamb)
  • feat: ResourceExhausted for memory limit in AggregateStream #4405 (crepererum)
  • Update to arrow 28 #4400 [sql] (tustvold)
  • Update rstest requirement from 0.15.0 to 0.16.0 #4399 (dependabot[bot])
  • Add sqllogictests (v0) #4395 (mvanschellebeeck)
  • improve hashjoin execution metrics #4394 (AssHero)
  • Add with_new_inputs for LogicalPlan #4393 (jackwener)
  • Clean the code in limit.rs. #4391 (HaoYang670)
  • Move physical plan serde from Ballista to DataFusion #4390 (Kikkon)
  • Fix page index pruning fail on complex_expr #4387 (Ted-Jiang)
  • Add check for nested types in equivalent names and types #4380 (alamb)
  • refine the code of build schema for ambiguous check, factor this out into a function #4379 [sql] (AssHero)
  • Refactor the Hash Join #4377 (liukun4515)
  • Minor: Fix typos in the documentation #4376 (martin-g)
  • Include byte size estimates in the filter statistics #4375 (isidentical)
  • HashJoin should return Err when the right side input stream produce Err, add more join UTs to cover different join types #4373 [sql] (mingmwang)
  • feat: ResourceExhausted for memory limit in GroupedHashAggregateStream #4371 (crepererum)
  • Use limit() function instead of show_limit() in the first example #4369 (martin-g)
  • Update env_logger requirement from 0.9 to 0.10 #4367 (dependabot[bot])
  • reimplement push_down_filter to remove global-state #4365 (jackwener)
  • Support to use Schedular in tpch benchmark #4361 (xudong963)
  • Adding more dataframe example to read csv files #4360 (DataPsycho)
  • minor: correct name and typo #4359 (jackwener)
  • Do not log error if page index can not be evaluated #4358 (alamb)
  • Clean the expr_fn - use scalar_expr to create unary scalar expr functions, remove macro unary_scalar_functions #4357 (HaoYang670)
  • Throw error (not panic) if a listing table specifies an missing partition column #4354 (doki23)
  • Improve error handling and add some more types for proper downcasting #4352 (retikulum)
  • Add check to avoid underflow in memory manager #4351 (askoa)
  • Improve error messages when memory is exhausted while sorting #4348 (alamb)
  • Do not error in optimizer if resulting schema has different metadata #4347 (alamb)
  • minor: improve optimizer logging and do not repeat rule name #4345 (alamb)
  • minor: fix typos in test names #4344 [sql] (alamb)
  • Minor: Add docstrings to EliminateOuterJoins optimizer pass #4343 (alamb)
  • Minor: refactor: isolate common memory accounting utils #4341 (crepererum)
  • minor: make plan_from_tables return one plan instead of Vec #4336 [sql] (jackwener)
  • enhancement: when fetch == 0, pushdown limit 0 instead skip+fetch. #4334 (jackwener)
  • Teach optimizer that CoalesceBatchesExec does not destroy output order #4332 (alamb)
  • Add ability to disable DiskManager #4330 (tustvold)
  • Update cli.md #4329 (psvri)
  • fix bug: right semi join can't support the filter #4327 (liukun4515)
  • reimplment eliminate_limit to remove global-state. #4324 (jackwener)
  • Refine Err propagation and avoid unwrap in transform closures #4318 (mingmwang)
  • Add a checker to confirm physical optimizer rules will keep the physical plan schema immutable #4316 (mingmwang)
  • Refactor downcasting functions with downcastvalue macro and improve error handling of ListArray downcasting #4313 (retikulum)
  • minor: add another test case to cover join ambiguous check #4305 [sql] (ygf11)
  • Fix DESCRIBE statement qualified table issue #4304 [sql] (gruuya)
  • Use tournament loser tree for k-way sort-merging, increase merge speed by 50% #4301 (richox)
  • Pin Python setuptools in the CI to fix integration tests #4296 (isidentical)
  • Support SubqueryAlias in optimizer, physcial planner. #4293 (jackwener)
  • minor: avoid a clone into string when checking ambiguous #4292 [sql] (ygf11)
  • replace the comparison op for decimal array op using the arrow-rs kernel #4290 (liukun4515)
  • MINOR: replace {..} with (_), typo, remove outdated TODO #4286 (jackwener)
  • Reduce Expr copies in ParquetExec #4283 (alamb)
  • Fix issue in filter pushdown with overloaded projection index #4281 (thinkharderdev)
  • Skip useless pruning predicates in ParquetExec #4280 (alamb)
  • Push down more predicates into ParquetExec #4279 (alamb)
  • Fix EXPLAIN plan for ParquetExec to show pruning_predicate #4278 (alamb)
  • reimplement limit_push_down to remove global-state, enhance optimize and simplify code. #4276 (jackwener)
  • Bump actions/labeler from 4.0.2 to 4.1.0 #4274 (dependabot[bot])
  • Remove the type alias NullColumnarValue #4273 (HaoYang670)
  • reimplement eliminate_outer_join #4272 (jackwener)
  • Fix bugs in parsing with header row and partitioned by #4268 [sql] (HaoYang670)
  • improve error messages while downcasting UInt32Array, UInt64Array and BooleanArray #4261 (retikulum)
  • add ambiguous check for projection #4260 [sql] (AssHero)
  • Add ambiguous check for join #4258 [sql] (ygf11)
  • support cross_join in limit_push_down #4257 (jackwener)
  • Support parquet page filtering on min_max for decimal128 and string columns #4255 (Ted-Jiang)
  • fix conflict and UT, cleanup redundant legacy code #4252 (jackwener)
  • Minor: remove unecessary clone() in planner #4249 [sql] (alamb)
  • Fix nightly clippy failures #4246 (mvanschellebeeck)
  • Improve Error Handling and Readibility for downcasting Float32Array, Float64Array, StringArray #4244 (retikulum)
  • Use defaults for ListingOptions builder #4243 (mvanschellebeeck)
  • Support binary boolean operators with nulls #4242 (Ted-Jiang)
  • Fixing doc of the expression #4240 (Creampanda)
  • Fix negative interval parsing bug #4238 (Jefffrey)
  • remove duplicate or redundant code #4235 (jackwener)
  • add a checker to confirm optimizer can keep plan schema immutable. #4233 (jackwener)
  • Fix the percentile argument for ApproxPercentileCont must be Float64, not Decimal128(2, 1) #4228 (comphead)
  • refactor how we create listing tables #4227 (timvw)
  • Update sqlparser requirement from 0.26 to 0.27 #4226 [sql] (alamb)
  • upgrade required chrono version to 0.4.23 #4225 (waitingkuo)
  • Support types other than String for partition columns on ListingTables #4221 (doki23)
  • [CBO] JoinSelection Rule, select HashJoin Partition Mode based on the Join Type and available statistics, option for SortMergeJoin #4219 (mingmwang)
  • Remove alias in Union #4212 (jackwener)
  • Add try_optimize method #4208 (andygrove)
  • Provide a builder for ListingOptions with fixups #4207 (alamb)
  • Avoid error with empty iterators used for ScalarValue::iter_to_array #4206 (GrandChaman)
  • Improve error message for regexp_match 'g' flag #4203 (Jefffrey)
  • Return ResourceExhausted errors when memory limit is exceed in GroupedHashAggregateStreamV2 (Row Hash) #4202 (crepererum)
  • Add additional expr boolean simplification rules #4200 (Jefffrey)
  • Update to arrow and parquet 27.0.0 #4199 [sql] (tustvold)
  • Support create table with explicit column definitions #4194 [sql] (doki23)
  • Support all equality predicates in equality join #4193 [sql] (ygf11)
  • add propagate_empty_relation optimizer rule #4192 (jackwener)
  • fix clippy #4190 [sql] (jackwener)
  • Fix clippy by avoiding deprecated functions in chrono #4189 (alamb)
  • Disallow duplicate interval types during parsing #4188 (Jefffrey)
  • Parse nanoseconds for intervals #4186 (Jefffrey)
  • Add rule to reimplement Eliminate cross join and remove it in planner #4185 [sql] (jackwener)
  • [FOLLOWUP] Enforcement Rule: resolve review comments, refactor adjust_input_keys_ordering() #4184 (mingmwang)
  • Simplify boolean parquet pushdown predicate #4182 (Jefffrey)
  • Minor: consolidate parquet custom_reader integration test into parquet_exec #4175 (alamb)
  • minor: remove redundant println and cleanup #4173 (jackwener)
  • Add ability to specify external sort information for ListingTables #4170 (alamb)
  • Improve Error Handling and Readibility for downcasting Decimal128Array #4168 (retikulum)
  • Minor: Remove completed comment on parquet row group pruning #4167 (alamb)
  • Update hashbrown requirement from 0.12 to 0.13 #4164 (dependabot[bot])
  • MINOR: enable dyn_cmp_dict feature on arrow for physical expr crate #4163 (isidentical)
  • Derive filter statistic estimates from the predicate expression #4162 (isidentical)
  • Minor: pass ParquetFileMetrics to build_row_filter in parquet #4161 (alamb)
  • Minor: Extract parquet row group pruning code into its own module #4160 (alamb)
  • Full support for time32 and time64 literal values (ScalarValue) #4156 (andre-cc-natzka)
  • Window frame GROUPS mode support #4155 (zembunia)
  • Improve error messages while downcasting Int64Array #4154 (retikulum)
  • Add another method to collect referenced columns from an expression #4153 [sql] (ygf11)
  • Remove BoxedAsyncFileReader #4150 (tustvold)
  • Support unsigned integers in unwrap_cast_in_comparison Optimizer rule #4149 (alamb)
  • Add support for DataType::Timestamp casts in unwrap_cast_in_comparison optimizer pass #4148 (alamb)
  • Add additional testing for unwrap_cast_in_comparison #4147 (alamb)
  • improve error messages while downcasting Int32Array #4146 (retikulum)
  • Minor: Update docstring on unwrap_cast_in_comparison #4145 (alamb)
  • add schema parameter to table provider factory create method #4143 (milenkovicm)
  • fix: shouldn't pass alias through into subquery. #4141 [sql] (jackwener)
  • Preserve the Cast expression in columnize_expr #4137 [sql] (HaoYang670)
  • Set versions to dependencies with path in benchmarks Cargo.toml file #4136 (ArkashaJavelin)
  • Fix links #4135 (mvanschellebeeck)
  • Use f64::total_cmp instead of OrderedFloat #4133 (comphead)
  • Add parquet integration tests for explicitly smaller page sizes, page pruning #4131 (alamb)
  • Consolidate ParquetExec tests in parquet_exec integration test #4130 (alamb)
  • Minor: Use upstream BooleanArray::true_count #4129 (alamb)
  • Combined TPCH runs & uniformed summaries for benchmarks #4128 (isidentical)
  • Enable TableProviderFactories to receive additional options when creating an external table #4126 [sql] (timvw)
  • Add CI check that configs.md is up-to-date #4124 (mvanschellebeeck)
  • [Part3] Partition and Sort Enforcement, Enforcement rule implementation #4122 (mingmwang)
  • reuse code utils::optimize_children but affect inline. #4121 (jackwener)
  • reuse code utils::optimize_children instead of redundant implementation #4119 (jackwener)
  • Allow listing tables to be created via TableFactories #4112 (avantgardnerio)
  • Update SQL reference to state that decimal support is currently experimental #4109 (andygrove)
  • Add metrics for parquet page level skipping #4105 (Ted-Jiang)
  • Add parser option for parsing SQL numeric literals as decimal #4102 [sql] (andygrove)
  • Replace RwLock<HashMap> and Mutex<HashMap> by using DashMap #4079 (yahoNanJing)
  • Custom window frame support extended to built-in window functions #4078 (mustafasrepo)
  • Enable tests for page index filtering in parquet filter pushdown test #4062 (alamb)
  • [Part2] Partition and Sort Enforcement, ExecutionPlan enhancement #4043 (mingmwang)
  • add support for xz file compression and compression feature #3993 [sql] (Jimexist)
  • Expression boundary analysis framework #3912 (isidentical)

14.0.0-rc1 (2022-11-04)

Full Changelog

14.0.0 (2022-11-04)

Full Changelog

Breaking changes:

  • Improve FieldNotFound errors #4084 [sql] (andygrove)
  • Refactor: move simplify_expression.rs and expr_simplifier.rs to a new mod simplify_expressions #3951 (HaoYang670)
  • Support for non-u64 types for Window Bound #3916 [sql] (mustafasrepo)
  • Expose parquet reader settings using normal DataFusion ConfigOptions #3822 (alamb)
  • Add Filter::try_new with validation #3796 [sql] (andygrove)
  • Change public simplify API and add a public coerce API #3758 (alamb)

Implemented enhancements:

  • Automatically register tables if ObjectStore root is configured #4094
  • Simplify small InList expressions #4089
  • Support SET command #4067
  • add uuid() function to generate unique uuid per row #4045
  • Publish benchmark crate so that it can be used as a library in Ballista #4016
  • Add statistics methods to TableProvider trait for use in cost-based optimizations in the logical plan #3983
  • Implement current_time Function #3982
  • Implement current_date Function #3981
  • Put common code used for testing code into datafusion/test_utils.rs #3960
  • Print the configurations of ConfigOptions in an ordered way so that we can directly compare the equality of two ConfigOptions by their debug strings #3952
  • Don't make dependants install protoc #3947
  • Implement right anti join and support it in HashBuildProbeOrder #3946
  • Implement right semi join and support it in HashBuildProbeOrder #3945
  • Refactor simplify_expressions and expr_simplifier #3934
  • Implement serialization for ScalarValue::FixedSizeBinary #3928
  • Support inlining view / dataframes logical plan #3913
  • Plans with tables from TableProviderFactorys can't be serialized #3906
  • Simplify a AND a and a OR a. #3895
  • Allow configuring statistics on TPC-H benchmarks #3888
  • CI checks stuck in queued mode #3883
  • Multiple optimizer passes #3879
  • datafusion-proto does not support view table scan #3874
  • TableProviderFactories need to be async and return a Result to be useful #3866
  • Factorize common AND factors out of OR predicates to support filterPushDown as possible #3858
  • Replace concat_ws with concat when the delimiter is empty string #3857
  • Concatenate contiguous literal arguments of concat_ws when doing the expression simplification #3856
  • Partition and Sort Enforcement #3854
  • Enable mimalloc by default in benchmarks #3851
  • Add collect statistics configuration #3847
  • [SQL] - Support cache/uncache table syntax #3842
  • Filter pushdown doesn't seem to apply for filter on TPC-H Q17 #3839
  • Support pushdown multi-columns in PageIndex pruning. #3834
  • Consolidate Expr manipulation code so it is more discoverable and make it easier to use #3808
  • Leverage input array's null buffer for regex replace to optimize sparse arrays #3803
  • Improve join cardinality estimation when there is no overlap in the min/max values #3802
  • datafusion-cli up to date check is failing on master #3798
  • Optimize benchmark q2 subquery filter #3789
  • Benchmark should infer schema when running against Parquet #3776
  • Allow specialized physical functions to provide hints for the array adapter #3762
  • [User Guide] Add EXPLAIN to SQL reference #3755
  • move type coercion for agg/agg udf #3752
  • Prevent Cargo.lock for datafusion-cli being out-of-date #3744
  • Add example of expr apis including simplification and coercion #3740
  • support type coercion for ScalarFunction expr in the logical phase #3731
  • Add support for DISTINCT projections in decorrelate_where_exists #3724
  • Add type coercion rule for CONCAT and CONCAT_WS #3720
  • Expose and document a simpler public API for simplify expressions #3709
  • Expose + document the type coercion API publicly #3708
  • Concatenate contiguous literal arguments of CONCAT during the expression simplification. #3683
  • DataFusion 13.0.0 Release #3671
  • Add division by 0 rules in the expression simplification #3663
  • Compressed CSV/JSON Read #3641
  • remove type coercion for agg #3623
  • extract or clause as predicate for join rels #3577
  • Improve performance of regex_replace #3518
  • Add benchmarks for parquet queries with filter pushdown enabled #3457
  • Make type coercion rule more robust #3390
  • ViewTable::scan ignores filters and limits #3249
  • Add CREATE VIEW documentation to user guide #3211
  • Push additional parquet filtering into the parquet scan [EPIC] #3147
  • Remove core/logical_plan module #2683
  • Datafusion Optimizer Enhancement #2255
  • [Optimizer] Eliminate self compare self #2252
  • Break datafusion crate into smaller crates #1750
  • Benchmark constellation-rs/amadeus's parquet implementation #1341
  • Use parquet2 async reader in physical_plan/parquet #1058
  • Table Scan Enhancement Plan #944
  • Implement parquet page-level skipping with column index, using min/max stats #847
  • Support min/max statistics in ParquetTable and ParquetExec #537

Fixed bugs:

  • Clippy failing on master #4100
  • Panic when the number of partitions of the pipeline that throws the exception is inconsistent with the number of partitions output by the query #4096
  • FieldNotFound when field is available #4083
  • SingleDistinctToGroupBy being applied too broadly #4082
  • single_distinct_to_groupby strips qualifiers from group-by expressions #4049
  • Another Internal error when parquet predicate pushdown is enabled "Error evaluating filter predicate: #4046
  • Decimal multiplied by Float produces incorrect results #4035
  • Cannot query external table - TableScan replaced with EmptyExec #4027
  • benchmark q17 produces incorrect result #4026
  • benchmark q14 produces incorrect result #4025
  • benchmark q11 producing incorrect results #4023
  • Internal error when parquet predicate pushdown is enabled "Error evaluating filter predicate:" #4006
  • Incorrect results with parquet filtering pushdown enabled #4005
  • Wrong results when parquet page index filtering is enabled #4002
  • Output schema of semi join has invalid projection added after HashBuildProbeOrder #4001
  • async deserialization functions are unintuitive and possibly insecure #3977
  • Expr::to_bytes can produce output that hits Expr::from_bytes recursion limit #3968
  • Bug on propagating arrow field metadata #3964
  • Predicate still has cast when comparing Timestamp(Nano, None) to a timestamp literal, so can't be pushed down or used for pruning #3938
  • Error using IN list on dictionary encoded data: InList does not support datatype Dictionary(Int32, Utf8). #3936
  • Internal error in CAST from Timestamp[us] #3922
  • ScalarValue not implemented for FixedSizeBinary types #3910
  • [DOC] - There are unsupported DDL in the official documentation #3904
  • datafusion-proto deserialize with Substring(str [from int] [for int]) fails #3901
  • count(Literal) gives wrong column name #3891
  • projection_push_down adds duplicate projections with multiple passes #3881
  • Default physical planner generates empty relation for DROP TABLE, CREATE MEMORY TABLE, etc #3873
  • Binary expression canonical names are incorrect in some cases #3865
  • Using the window function lag causes panic. #3830
  • chrono crate : specify 0.4.22 as the minimum version due to spurious build failures #3827
  • datafusion-proto deserialize with q16 sql fails #3820
  • Filter predicates should not be aliased #3795
  • Write csv not save all lines of dataframe #3783
  • Regression in simplifying expressions in subqueries #3760
  • DataFusionError(Internal("The size of the sorted batch is larger than the size of the input batch: 2120 > 2312")) #3747
  • "labeler" PR check is broken #3743
  • DataFrame::select_columns doesn't work with names containing "." #3733
  • TPC-H Query 1 has regressed #3729
  • [RUST][Datafusion] What causes "Error: Execution("file size of 4 is less than footer")" error? #3800
  • Field names containing periods such as f.c cannot work #3682
  • TableProvider implementation for DataFrame does not support filter pushdown #3681
  • using Decimal(0) make system panicked #3665
  • Cannot query some parquet files in S3, but they work locally #3633
  • col / col returns 1 when col = 0 #3615
  • register_csv allow space in table_path #3589
  • Hardcoded u64 for WindowFrameBound fields #3571
  • docs.rs cannot build datafusion-proto crate #3538
  • Row Hash loads whole aggregation state to memory before sending #3460
  • approx_percentile_cont return wrong result when scan multi parquet files. #3140
  • User guide is incorrect regarding using CLI to register CSV files using schema inference #3001
  • Exception: Internal error, Exception: Schema error #2938
  • Version 0.6.0 Panic error during SQL execution #2738
  • wrong result when operation parquet #2044
  • Local object store accepts file:/// as base path, but LocalStore returns meta without the prefix. #1923
  • Reading nested parquet files results in index out of bounds #1383
  • - (negation) with NULL literals does not work: can't be evaluated because the expression's type is Utf8, not signed #1192
  • Inconsistent cast behavior #957
  • single_distinct_to_groupby no longer drops qualifiers #4050 [sql] (andygrove)

Documentation updates:

  • Clarify in docs that Identifiers are made lower-case in SQL query #2374
  • Fix broken links in contributor guide #3956 (Jefffrey)
  • add create view explanation #3925 (retikulum)
  • Update datafusion-examples README #3814 (alamb)
  • Add Seafowl to list of projects using DataFusion #3792 (mildbyte)

Closed issues:

  • [QUESTION] How many times should be the function create_name called when executing a query? #3900
  • Improve the Expr string format #3878
  • Simplify division by zero (division by one / multiplication by zero / multiplication by one) for Decimal types as well #3643
  • InList: merge check branch #2833
  • Optimization InList: compare the float data type using OrderedFloat<T> #2831
  • Outdated section of the add function of the contribution guide #2560
  • Optimize InList implementation with native types rather than ScalarValue #2165
  • Improve testing of optimizers using EXPLAIN #1118
  • Crash on parsing sql query with Cyrillic letters #184
  • [EPIC] Support all TPC-H queries in benchmark #158
  • Implement optional second argument to ltrim and rtrim functions #144
  • Benchmark crate does not have a SIMD feature #124
  • ColumnarValue::into_array should not require batch #113
  • [Rust] Parquet data source does not support complex types #83

Merged pull requests:

  • Appease new clippy #4101 (alamb)
  • minor: Split parquet reader up into smaller modules #4099 (alamb)
  • [MINOR] Update SET in cli.md #4098 (waitingkuo)
  • fix: Scheduler panic routing errors #4097 (yukkit)
  • Automatically register tables if ObjectStore root is configured #4095 (avantgardnerio)
  • minor: Use Operator::swap #4092 (alamb)
  • Simplify small InListExpr #4090 (Dandandan)
  • Minor: Add arrow-rs ticket reference and turn some comments into docstrings #4088 (alamb)
  • Support Dictionary in InListExpr #4070 (tustvold)
  • support SET variable #4069 [sql] (waitingkuo)
  • Add in list bench #4068 (tustvold)
  • Improve Error Handling and Readibility for downcasting StructArray #4061 (retikulum)
  • Build tests separately from running #4060 (alamb)
  • Simplify InListExpr ~20-70% Faster #4057 (tustvold)
  • MINOR: Print unoptimized logical plan in execute_query of tpch benchmark #4056 (viirya)
  • Minor: clean the code in eliminate_filter #4055 (HaoYang670)
  • Implement current_time scalar function #4054 (naosense)
  • Cleanup hash_utils adding support for decimal256 and f16 #4053 (tustvold)
  • Fix multicolumn parquet predicate pushdown (#4046) #4048 (tustvold)
  • Add CI checks that we can serde all benchmark queries #4047 (andygrove)
  • Enable more benchmark verification tests #4044 (andygrove)
  • Extract common parquet testing code to parquet-test-util crate #4042 (alamb)
  • add uuid() function #4041 (Jimexist)
  • Update to arrow 26, change timezones #4039 [sql] (tustvold)
  • Fix Decimal and Floating type coerce rule #4038 (viirya)
  • Reserve the literal expression of Count function #4031 [sql] (HaoYang670)
  • Implement current_date scalar function #4022 (comphead)
  • Fix predicate pushdown bugs: project columns within DatafusionArrowPredicate (#4005) (#4006) #4021 (tustvold)
  • minor: remove redundant code/TODO #4019 (jackwener)
  • Add CI check to verify that benchmark queries return the expected results #4015 (andygrove)
  • Minor: Add TODO and tracking ticket reference #4012 (alamb)
  • Add right anti join support and support it in HashBuildProbeOrder #4011 (Dandandan)
  • MINOR: Generate expected benchmark query results #4010 (andygrove)
  • Minor: remove unecessary clippy allow #4008 (alamb)
  • Minor: Do what clippy says and clean up some code #4007 (alamb)
  • Improve Error Handling and Readibility for downcasting Date32Array #4004 (retikulum)
  • Don't add projection for semi joins in HashBuildProbeOrder #4000 (Dandandan)
  • Minor: use DataType::is_nested #3995 (alamb)
  • [minor] bump prettier version #3992 (Jimexist)
  • Add parquet predicate pushdown metrics #3989 (alamb)
  • Pin datafusion-proto build dependencies #3987 (tustvold)
  • Add TableProvider.statistics method #3986 (andygrove)
  • Add Pull Request guidelines to contributor guide #3985 (alamb)
  • Update protos #3979 (tustvold)
  • Revert async changes but keep deltalake working #3978 (avantgardnerio)
  • Correctness integration test for parquet filter pushdown #3976 (alamb)
  • MINOR: Stop pretty printing batches in benchmark when there are no results #3974 (andygrove)
  • MINOR: Re-export Cast struct #3971 (andygrove)
  • fix: check recursion limit in Expr::to_bytes #3970 (crepererum)
  • [Part1] Partition and Sort Enforcement, PhysicalExpr enhancement #3969 (mingmwang)
  • Support pushdown multi-columns in PageIndex pruning. #3967 (Ted-Jiang)
  • Fix benchmarks README formatting #3966 (Jefffrey)
  • Bug fix on DFField to Field conversion: preserve metadata #3965 (metesynnada)
  • Informative Error Message for LAG and LEAD functions #3963 (mustafasrepo)
  • Minor: Add some docstrings to FileScanConfig and RuntimeEnv #3962 (alamb)
  • Move common code used for testing code into datafusion/test_utils #3961 (alamb)
  • Update minimum chrono dependency to 0.4.22 #3959 (alamb)
  • Implement right semi join and support in HashBuildProbeorder #3958 (Dandandan)
  • Print the configurations of ConfigOptions in an ordered way so that we can directly compare the equality of two ConfigOptions by their debug strings #3953 (yahoNanJing)
  • Vendor Generated Protobuf Code (#3947) #3950 (tustvold)
  • Implement serialization for ScalarValue::FixedSizeBinary #3943 (retikulum)
  • Consolidate physical join code into datafusion/core/src/physical_plan/joins #3942 (alamb)
  • Add optimizer test for simplifying predicates on timestamps #3939 (alamb)
  • Add test for querying predicate on dictionary #3937 (alamb)
  • fix: return error for unsupported SQL #3933 (Kikkon)
  • doc: fix doc about CREATE TABLE IF NOT EXISTS #3932 (jackwener)
  • Refactor Expr::Cast to use a struct. #3931 [sql] (jackwener)
  • minor: fix some typo. #3930 (jackwener)
  • chore: update cranelift-related dependencies #3926 (xudong963)
  • Change cast error from Internal to NotImplemented #3924 (alamb)
  • Support inlining view / dataframes logical plan #3923 (Dandandan)
  • Add test for Simplify redundant predicates #3915 (src255)
  • Implement ScalarValue for FixedSizeBinary #3911 (maxburke)
  • Add serde for plans with tables from TableProviderFactorys #3907 (avantgardnerio)
  • Support filter/limit pushdown for views/dataframes #3905 (Dandandan)
  • Factorize common AND factors out of OR predicates to support filterPu… #3903 (Ted-Jiang)
  • Add Substring(str [from int] [for int]) support in datafusion-proto #3902 (r4ntix)
  • Revert "Factorize common AND factors out of OR predicates to supportfilter Pu… (#3859)" #3897 (alamb)
  • MINOR: Add notes on Apache Reporter #3893 (andygrove)
  • Allow configuring collection of statistics during TPC-H benchmarks #3889 (isidentical)
  • Improve formatting of binary expressions #3884 [sql] (andygrove)
  • Multiple optimizer passes #3880 (andygrove)
  • [MINOR] Update docs with newly added configuration values #3877 (alamb)
  • [MINOR] Add a hint about how to resolve the Cargo.lock CI check #3876 (alamb)
  • Add LogicalPlan::ViewTable support in datafusion-proto #3875 (r4ntix)
  • Optimize the concat_ws function #3869 (HaoYang670)
  • Implement foundational filter selectivity analysis #3868 (isidentical)
  • Update TableProviderFactory trait to support real-world use-cases #3867 (avantgardnerio)
  • put subquery's equal clause into join on clauses instead of filter cl… #3862 (AssHero)
  • Factorize common AND factors out of OR predicates to support filterPu… #3859 (Ted-Jiang)
  • Enable mimalloc by default in benchmark #3853 (Dandandan)
  • Refactor Expr::Between to use a struct #3850 [sql] (b41sh)
  • Handle cardinality estimation for disjoint inner and outer joins #3848 (isidentical)
  • Add setting for statistics collection #3846 (Dandandan)
  • Update to arrow 25.0.0 #3844 [sql] (tustvold)
  • Tweak list of optimization rules #3841 (Dandandan)
  • Refactor Expr::GetIndexedField to use a struct #3838 [sql] (ygf11)
  • Infer the count of maximum distinct values from min/max #3837 (isidentical)
  • Refactor Expr::Like, Expr::ILike, Expr::SimilarTo to use a struct #3836 [sql] (b41sh)
  • Refactor Expr::BinaryExpr to use a struct #3835 [sql] (zhoudongyan)
  • update postgres version to 15 in integration test #3831 (Jimexist)
  • Fix the panic when lpad/rpad parameter is negative #3829 (ZuoTiJia)
  • MINOR: Document SHOW ALL in the users guide #3826 (alamb)
  • MINOR: Add datafusion-cli documentation on showing configuration #3825 (alamb)
  • Add/Remove Division Rules #3824 (retikulum)
  • Minor: Sort the output of SHOW ALL by config name #3823 [sql] (alamb)
  • Add precision != 0 check when making decimal type #3818 [sql] (HaoYang670)
  • Infer schema when running benchmarks against parquet #3817 (andygrove)
  • Finish removing deprecated datafusion::logical_plan module #3816 (andygrove)
  • Clarify initial example with respect to capitalization #3815 (alamb)
  • Improve expression simplification by running it twice #3811 (alamb)
  • Make expression manipulation consistent and easier to use: combine/split filter conjunction, etc #3810 (alamb)
  • Consolidate expression manipulation functions into datafusion_optimizer #3809 (alamb)
  • Optimize regexp_replace when the input is a sparse array #3804 (isidentical)
  • Stop ignoring errors when writing DataFrame to csv, parquet, json #3801 (andygrove)
  • Update datafusion-cli Cargo.lock to fix CI check on master #3799 (alamb)
  • MINOR: Benchmark regression tests #3790 (andygrove)
  • MINOR: Optimizer example and docs, deprecate Expr::name #3788 (andygrove)
  • Join cardinality computation for cost-based nested join optimizations #3787 (isidentical)
  • Optimizer now simplifies multiplication, division, module arg is a literal Decimal zero or one #3782 (drrtuy)
  • Implement parquet page-level skipping with column index, using min/ma… #3780 (Ted-Jiang)
  • Bump actions/labeler from 4.0.1 to 4.0.2 #3779 (dependabot[bot])
  • MINOR: correct ListingOptions.try_new docs to include the enabled stat collection #3775 (isidentical)
  • Teach a negative NULL expression to return NULL instead of an error #3771 (drrtuy)
  • Add benchmarks for testing row filtering #3769 (thinkharderdev)
  • move type coercion of agg and agg_udaf to logical phase #3768 (liukun4515)
  • User Guide: Add EXPLAIN to SQL reference #3767 (unvalley)
  • Allow specialized implementations to produce hints for the array adapter #3765 (isidentical)
  • Fix optimizer regression with simplifying expressions in subquery filters #3764 (andygrove)
  • Run all datafusion-examples in CI tests #3761 (alamb)
  • MINOR: Remove deprecated module datafusion::logical_plan::plan #3759 (andygrove)
  • Refactor Expr::Case to use a struct #3757 [sql] (andygrove)
  • Do not run labeler CI check if it would fail due to permissions #3756 (alamb)
  • MINOR: Improvements to scalar_subquery_to_join error handling #3754 (andygrove)
  • Always track the final size of the in-mem sorted arrays #3753 (isidentical)
  • Fix DataFrame::select_columns to handle column names with a period #3751 (zhoudongyan)
  • Fix ListingTableUrl to decode percent #3750 (unvalley)
  • remove type coercion for physical ScalarFunction #3749 (liukun4515)
  • CI: Add a new run to check whether datafusion-cli lock file is up-to-date #3745 (isidentical)
  • Add datafusion example of expression apis #3741 (alamb)
  • fix subquery where exists distinct #3732 (b41sh)
  • Remove some uneeded code in CommonSubexprEliminate #3730 (alamb)
  • Consolidate and better tests for expression re-rewriting / aliasing #3727 (alamb)
  • Fix output schema generated by CommonSubExprEliminate #3726 (alex-natzka)
  • Add type coercion rule for concat and concat_ws #3721 (HaoYang670)
  • Expose and document a simpler public API for simplify expressions #3719 (ygf11)
  • Remove dead code in UnwrapCastExprRewriter that may mask errors #3703 (alamb)
  • Fix DataFrame::with_column to handle creating column names with a period #3700 (alamb)
  • Add simplification rules for the CONCAT function #3684 (HaoYang670)
  • Compressed CSV/JSON support #3642 [sql] (Licht-T)
  • Simplify serialization by removing redundant PrimitiveScalarValue #3612 (alamb)
  • Pushdown single column predicates from ON join clauses #3578 (AssHero)
  • Simplify the serialization of ScalarValue::List #3547 (alamb)
  • Generate hash aggregation output in smaller record batches #3461 (milenkovicm)
  • Improve doc on lowercase treatment of columns on SQL #3385 (nanicpc)

13.0.0-rc1 (2022-10-07)

Full Changelog

13.0.0 (2022-10-06)

Full Changelog

Breaking changes:

  • Make ObjectStoreProvider fallible (return Result rather than Option) #3584 (tustvold)
  • Make OptimizerConfig a builder style API #3525 (alamb)

Implemented enhancements:

  • remove type coercion for ScalarUDF in the physical phase #3734
  • Allow with statements to specify their columns alongside their expression names #3716
  • Support SQLDataType::Timestamp(TimezoneInfo) #3693
  • support type coercion for case when expr #3673
  • Add simplification rules for the Modulo operator #3664
  • Add TIMESTAMPTZ #3659
  • Simplify A * 0 and A * null. #3626
  • change rule of PreCastLitInComparisonExpressions to unwrap cast rule after #3582 #3622
  • Optimize regex_replace with a known pattern / replacement #3613
  • Simplify CONCAT_WS(NULL, ..) to NULL #3607
  • Add OctoSQL to list of systems powered by DataFusion #3605
  • Prevent over-allocation (and spills) on TopK queries #3596
  • Allow ObjectStoreProvider to return None (return Result<Option> rather than Result) #3594
  • simplify between expr should consider the data type #3587
  • make type coercion simple and remove the evaluate logic #3585
  • ReduceOuterJoin optimizer support cast or try_cast expr. #3565
  • Support type coercion for subquery #3557
  • Make ParquetScanOptions public and expose a reference to the scan options from ParquetExec #3550
  • Use fetch limit in get_sorted_iter #3544
  • Push limit to sort #3528
  • Execute sorts in parallel when limit is used after sort #3526
  • Consolidate optimizer passes in optimizer module for better testing #3524
  • Support Top-K query optimization for `ORDER BY <EXPR> [ASC #3515
  • support the type coercion for like unlike istrue isfalse isunknown #3509
  • Automate the pushing of releases to Homebrew #3506
  • Add extra DATE_PART units that are already supported in arrow-rs #3502
  • Release datafusion-cli 12.0.0 on Homebrew #3501
  • Make from_proto_binary_op public #3489
  • coercion between decimal and other types lacking, compared to other numeric types #3479
  • move type coercion for inlist from physical phase to logical phase #3468
  • Make datafusion::physical_plan::file_format::file_strean::FileStream public #3466
  • Support using offset index in ParquetRecordBatchStream when pushing down RowFilter #3456
  • Support timestamp data type in In_list node #3449
  • Evaluate expressions after type coercion #3431
  • Make a convenience function to register a single RecordBatch as a table from SessionContext #3426
  • add datafusion-cli support of external table locations that object_store supports #3424
  • pruning support cast/try_cast expr #3414
  • Add documentation on querying against files in object store such as S3 #3399
  • Remove type-coercion from physical planner #3388
  • support Statement::ShowVariable to show session configs #3364
  • Support RowFilter in ParquetExec #3360
  • Apply TypeCoercion rule before FilterPushDown #3289
  • Add support for get / show timezone #3255
  • Consider adding DataFusion to ClickBench benchmarks #2902
  • filter_push_down panics on semi/anti join with join filters #2888
  • Migrate the cross join -> inner join optimization from the planner to the optimizer #2859
  • ObjectStore write support #2185
  • DataFusion should scan Parquet statistics once per query #871
  • Extend & generalize constant folding / evaluation in logical optimizer #237

Fixed bugs:

  • projection_push_down produces invalid aggregate plans in some cases #3738
  • Time With Time Zone should raise error until DataType::Time64 support tz #3715
  • SQL Planner doesn't distinguish normal CTEs from the recursive ones. #3713
  • Fix inconsistency between column name formats #3711
  • Optimizer rule 'projection_push_down' failed due to unexpected error: Error during planning: Aggregate schema has wrong number of fields. Expected 3 got 8 #3704
  • Optimizer regressions in unwrap_cast_in_comparison #3690
  • Internal error when evaluating a predicate = "The type of Dictionary(Int16, Utf8) = Int64 of binary physical should be same" #3685
  • Specialized regexp_replace should early-abort when the the input arrays are empty #3647
  • Internal error: Failed to coerce types Decimal128(10, 2) and Boolean in BETWEEN expression #3646
  • Internal error: Failed to coerce types Decimal128(10, 2) and Boolean in BETWEEN expression #3645
  • Type coercion error: The type of Boolean AND Decimal128(10, 2) of binary physical should be same #3644
  • LEFT JOIN not working as expected, error message is confusing #3639
  • INTERSECT and EXCEPT don't return an error when 2 sets have the different number of columns #3632
  • The datafusion-cli panics when union 2 table with different number of columns. #3630
  • The expression col(a) / null is not optimized. #3624
  • s3_build_error test may fail in some environments #3601
  • New clippy errors appears to be break the CI on the master #3597
  • StringConcat gives inconsistent result with concat when containing null #3569
  • simplify_expressions don't support different data type for binary #3556
  • Broken logical plan serialization for aggregation queries #3555
  • Aggregate filters do not get pushed down to table scan #3546
  • docs.rs cannot build datafusion-proto crate #3538
  • DataFusion serialization doesn't handle ScalarValue::Dictionary, Binary, LargeBinary, Time64, IntervalMonthDayNano, Struct #3531
  • What should be returned when trying to get a config in invalid format? #3505
  • Dividing decimal type gives wrong error: "170141183460469231731687303715884105727 is too large to store in a Decimal128 #3498
  • Add BitwiseXor in function from_proto_binary_op #3495
  • comparison operations with a scalar null and decimal array panics #3487
  • Union columns with different types #3467
  • Can't get the right logical plan after optimizer #3421
  • Fix conflict between simplify_expression rule and CAST expressions #3409
  • Empty array giving error #2439
  • Internal error: Unsupported data type in hasher: FixedSizeBinary(16) #1516
  • Predicates on to_timestamp do not work as expected with "naive" timestamp strings #765
  • Address performance/execution plan of TPCH query 19 #78
  • Bug fix: expr_visitor was not visiting aggregate filter expressions #3548 (andygrove)

Documentation updates:

  • Publish 8.0.0 user guide #2558
  • MINOR: Add Dask SQL to list of projects powered by DataFusion #3581 (andygrove)
  • Add Parseable as Datafusion user #3471 (nitisht)

Closed issues:

  • Upgrade to Arrow 24.0.0 #3689
  • what's the best practice to get a single value from arrow array? #3497
  • The data type of predicate in the row filter should be same in the binary expr #3469
  • Extend constant folding and parquet filtering support #188
  • Add FORMAT to explain plan and an easy to visualize format #96

Merged pull requests:

12.0.0 (2022-09-12)

Full Changelog

Breaking changes:

Implemented enhancements:

  • support cast inside values #3446
  • update TPCH test schemas to use Decimal128 from Float #3435
  • Include Bitwise operators in the documentation #3434
  • How to read excel file with datafusion? #3433
  • Pass return type to the accumulator state factory in aggregates #3427
  • Support bitwise XOR operator (#) #3420
  • support InList with datatype Date32 #3412
  • add simplification for between expression during logical plan optimization #3402
  • Replace From trait with TryFrom trait for datafusion-proto crate #3401
  • update TPC-H benchmark to Decimal types from Float #3392
  • Use usize to represent Limit::skip #3369
  • Avoid coping in LogicalPlan::expressions #3368
  • Upgrade to Arrow 22 #3362
  • Eliminate OFFSET 0 in the logical plan optimization #3355
  • Add ability to get unoptimized logical plan from DataFrame #3340
  • Allow IDEs to recognize generated code #3332
  • CAST should not change the name of an expression #3326
  • add SQL support for unsigned integers #3325
  • Review use of panic in datafusion-proto crate #3318
  • Review use of panic in datafusion-sql crate #3315
  • Review use of panic in datafusion-optimizer crate #3314
  • Review use of panic in datafusion-expr crate #3312
  • Support registration of custom TableProviders through SQL #3310
  • Support binary data in sha hash functions #3308
  • add SQL support for tinyint and unsigned versions of all INTs #3307
  • Support binary types in InList expression #3300
  • Physical planner should map IsTrue and similar expressions to IsDistinctFrom #3288
  • Introduce physical plan version of Operator enum #3269
  • Introduce Expr variants for IS [NOT] TRUE / FALSE / UNKNOWN #3268
  • Add support for non-correlated subqueries #3266 [sql]
  • (Re-)add support for glob patterns in ListingTableUrl #3261
  • PreCastLitInComparisonExpressions should use ExprRewriter and supported nested expressions #3259
  • implement DROP VIEW #3251
  • Upgrade to Arrow 21 #3224
  • Add TypeCoercion optimizer rule #3221
  • Create bench for approx_percentile_cont aggregate #3217
  • Add SQL query planner support for DISTRIBUTED BY #3207
  • Support "IS [NOT] UNKNOWN" syntax #3195
  • sqlparser 0.21 upgrade #3192
  • Re-implement parsing/planning for SHOW TABLES due to sqlparser changes #3188
  • Support SUM AVG, MIN, MAX on Time columns. #3166
  • Support "IS TRUE/FALSE" syntax #3159
  • Support number of histogram bins in approx_percentile_cont #3145
  • Support create ApproxPercentileAccumulator with TDigest max_size #3142
  • Remove support for array function and only support array[] style postgres syntax #3115
  • Allow inline column aliases for create view #3108 [sql]
  • Add support for Postgres SIMILAR TO and ILIKE syntax #3099 [sql]
  • Update SQL reference in user guide to cover all supported syntax #3091
  • DataFusion prelude should import all logical expression functions #3068
  • Proposal: Add similar to operator #3016 [sql]
  • Release DataFusion 11.0.0 #3012
  • Implement "SHOW CREATE TABLE" for external tables #2848
  • Change java package names in protobuf files #2513
  • When creating DFField from Expr we should provide input plan not input schema #2456
  • Support "IS NOT TRUE/FALSE" syntax #2265
  • RFC: Spill-To-Disk Object Storage Download #2205
  • Support for BitwiseAnd &, BitOr | binary operators #1619
  • [Question] Usage of async object store APIs in consuming code #1313
  • Allow User Defined Aggregates to return multiple values / structs #600
  • Implement vectorized hashing for dictionary types #331

Fixed bugs:

  • Intermittent build error when changing selected features #3366
  • sql::timestamp::timestamp_add_interval_months failing since September 1st #3327
  • sql::timestamp::timestamp_add_interval_months test fails #3322
  • test case timestamp_add_interval_months failed on master branch #3321
  • datafusion-proto does not support untyped null scalar values #3302
  • ConfigOptions creation is slow #3295
  • FilterPushDown optimization through UNION ALL results in SchemaError #3281
  • Execute LogicalPlans after building for TPCH Benchmarks #3273
  • CREATE TABLE should return empty DataFrame #3265 [sql]
  • CREATE EXTERNAL TABLE from CSV creates a table with no columns if there is just a header row #3263
  • View TableProvider ignores projections, resulting in invalid plans #3240
  • CREATE VIEW should return an empty dataframe on success #3236
  • DISTRIBUTE BY expressions get removed during optimization #3234
  • datafusion cannot recognize chinese charactors. #3203
  • Panicked at 'byte index 1 is out of bounds on invalid query #3190
  • like_nlike_with_null_lt fails with latest sqlparser code #3187
  • Interval Literal output inconsistent date_type #3180
  • array function allows different data types #3123
  • eq operator doesn't work on binary data #3117
  • incorrect where clause comparison while using table alias #3073
  • Some functions are incorrectly declared as unary #3069
  • once now() is called in a statement, it forever returns the same value #3057
  • single_distinct_to_groupby panic when group by expr is a binaryExpr #2994
  • Cannot have order by expression that references complex group by expression #2360
  • Fix some bugs in TypeCoercion rule #3407 (andygrove)
  • MINOR: Stop ignoring AggregateFunction::distinct in protobuf serde code #3250 (andygrove)
  • Add assertion for invariant in create_physical_expression and fix ViewTable projection #3242 (andygrove)
  • Fix bug where optimizer was removing Partitioning::DistributeBy expressions #3229 (andygrove)

Documentation updates:

Closed issues:

  • Add \i command to datafusion-cli #1906
  • TPC-H Query 15 #166

Merged pull requests:

11.0.0 (2022-08-16)

Full Changelog

Breaking changes:

Implemented enhancements:

  • Make RowAccumulator public #3138
  • docs: proposal for consolidating docs into a Contributor Guide #3127
  • feat: support Timestamp +/- Interval #3103
  • a arrow_typeof like posgresql's pg_typeof #3095
  • Add DataFrame section to user guide #3066
  • Document all scalar SQL functions in user guide #3065
  • Simplify implementation of approx_median so that it can be exposed in Python #3063
  • Support double quoted literal strings for dialects(such as mysql,bigquery) #3055
  • Simplify / speed up implementation of character_length to unicode points #3049
  • Follow-up on Clickbench benchmark #3048
  • Why the PhysicalPlanner is an async trait ? #3032
  • Optimize file stream metrics. #3024
  • Proposal: Enable typed strings expressions for VALUES clause #3017
  • Proposal: Add date_bin function #3015
  • The upcoming release of Arrow (20?) breaks datafusion #3006
  • Can I select some files for query based on the filtering rules in the directory? #2993
  • Rename FormatReader to FileOpener #2990
  • Derive Hash trait for JoinType #2971
  • CAST from Utf8 to Boolean #2967
  • Add baseline_metrics for FileStream to record metrics like elapsed time, record output, etc #2961
  • Example to show how to convert query result into rust struct #2959
  • simplify not clause #2957
  • Implement Debug for ColumnarValue #2950
  • Parallel fetching of column chunks when reading parquet files #2949
  • Extension mechanism for SessionConfig #2939
  • Streaming CSV/JSON Object Store Read #2935
  • Support CSV Limit Pushdown to Object Storage #2930
  • Add support for pow scalar function #2926
  • Add support for exact median aggregate function #2925
  • Support mean as synonym for avg #2922
  • Rename a column name #2919
  • Move ScalarValue tests alongside implementation, move from_slice to core #2913
  • Fail gracefully if optimization rule fails #2908
  • Make ObjectStoreRegistry as a trait which can allow Ballista to introduce a self registry ObjectStoreRegistry #2905
  • Remove datafusion-data-access crate #2903
  • Improve formatting of logical plans containing subquery expressions #2898
  • Atan2 added to built-in functions #2897
  • The explain statements only print logical plans for debug/other purpose. #2894
  • JSON version of display_indent() #2889
  • It would be nice to have a way to generate unique IDs in optimizer rules #2886
  • Add support for TIME literal values #2883
  • Add h2o benchmark #2879
  • Implement from_unixtime function #2871
  • Add cast function for creating logical cast expression #2870
  • Release DataFusion 10.0.0 #2862
  • Implement information_schema.views #2857
  • Migrate from avro_rs to apache_avro #2783
  • Add optimizer rule to remove OFFSET 0 #2584
  • Preserve Element Name in ScalarValue::List #2450
  • Add EXISTS subquery support to Ballista #2338
  • Add documentation on supported functions to datafusion website #1487
  • documentations for datafusion-cli can be consolidated a bit more #1352
  • Optimizer: Predicate Rewrite pass for TPCH Q19 #217
  • feat: add optimize rule rewrite_disjunctive_predicate #2858 (xudong963)

Fixed bugs:

  • Regression in SQL support for ORDER BY and aliased expressions #3160
  • panic when deal with @ operator #3137
  • Incorrect type coercion rule for date + interval #3093
  • Cast string to timestamp crash while we input time before 1970 with floating number second #3082
  • INTEGER type does't work while importing csv #3059
  • Cannot GROUP BY Binary #3050
  • incorrect i32 coercion for to_timestamp #3046
  • Error pruning IsNull expressions: Column 'instance_null_count' is declared as non-nullable but contains null values #3042
  • I want to query some files in a directory. Is there any way? #3013
  • The expression to get an indexed field is only valid for List types (common_sub_expression_eliminate) #3002
  • Double to_timestamp_seconds produces abnormal result #2998
  • External parquet table fails when schema contains differing key / value metadata #2982
  • SELECT on column with uppercase column name fails with FieldNotFound error #2978
  • panic reading AWS-generated parquet file #2963
  • Can't filter rowgroup for parquet prune for some data type #2962
  • CI test is failing with final link failed: No space left on device #2947
  • bug: new ObjectStore breaks backward compatibility with contrib plugins #2931
  • bug: file types handled wrong #2929
  • bug: changing the number of partitions does not increase concurrency #2928
  • csv_explain fails on RC verifier #2916
  • index out of range error from datafusion_row::write::write_field #2910
  • Optimization rule CommonSubexprEliminate creates invalid projections #2907
  • serde_json requires that either std (default) or alloc feature is enabled #2896
  • Inconsistent type coercion rules with comparison expressions #2890
  • Doc Error: the test directory link 404 which is in CONTRIBUTING.md #2880
  • Round trips through ScalarValue's sometimes don't preserve types (e.g. change types from DictionaryArray) #2874
  • Error with CASE and DictionaryArrays: ArrowError(InvalidArgumentError("arguments need to have the same data type")) #2873
  • window functions not supported in expressions #2869
  • Unable to work with month intervals #2796
  • Discord invite link in communication page has expired #2743
  • Test (path normalization) failures while verifying release candidate 9.0.0 RC1 #2719
  • Reading parquet with (pre-release) arrow fails with "out of order projection is not supported" #2543
  • Fix SQL planner bug when resolving columns with same name as a relation #3003 [sql] (andygrove)
  • fix RowWriter index out of bounds error #2968 (comphead)
  • fix: support decimal statistic for row group prune #2966 (liukun4515)
  • Fix invalid projection in CommonSubexprEliminate #2915 (andygrove)

Documentation updates:

Performance improvements:

  • Use code points instead of grapheme clusters for string functions #3054 (Dandandan)

Closed issues:

  • Rename do_data_time_math() to do_date_time_math() #3172
  • Automatic version updates for github actions with dependabot #3106
  • [EPIC] Proposal for Date/Time enhancement #3100
  • Upgrade prost/tonic everywhere #3028
  • [Question] interested in helping with documentation #2866
  • Introducing a new optimizer framework for datafusion. #2633
  • Enable discussion tab? #2350
  • Add support for AVG(Timestamp) types #200
  • TPC-H Query 22 #175
  • TPC-H Query 21 #172
  • TPC-H Query 20 #171
  • TPC-H Query 17 #168
  • TPC-H Query 11 #163
  • TPC-H Query 4 #160
  • TPC-H Query 2 #159
  • [Datafusion] Optimize literal expression evaluation #106

Merged pull requests:

10.0.0-rc1 (2022-07-12)

Full Changelog

10.0.0 (2022-07-12)

Full Changelog

Breaking changes:

Implemented enhancements:

  • update documentation, fix styling to match main Arrow project #2864
  • Update top-level README #2850
  • [Question]How to call an async function in ExecutionPlan::exec method? #2847
  • Add DataFrame::with_column #2844
  • Improve ergonomics of physical expr lit #2827
  • Add Python examples for reading CSV and query by SQL in Doc #2824
  • eliminate multi limit-offset nodes to EmptyRelation if possible #2822
  • Make LogicalPlan::Union be consistent with other plans #2816
  • Use coerced data type from value and list expressions during planning inlist expression #2793
  • Add configuration option to enable/disalbe CoalesceBatchesExec #2790
  • Simplify FilterNullJoinKeys rule #2780
  • Allow configuration settings to be specified with environment variables #2776
  • Automatically update configs.md in user guide #2770
  • Support multiple paths for ListingTableScanNode #2768
  • Reduce outer joins #2757
  • support data type coerced and decimal in INLIST expr #2755
  • Change ExtensionPlanner::plan_extension() to an async function #2749
  • Add IsNotNull filter to join inputs if one side of join condition does not allow null #2739
  • Sort preserving MergeJoin #2698
  • Improve readability of table scan projections in query plans #2697
  • DataFusion 9.0.0 Release #2676
  • Improve UX for UNION vs UNION ALL (introduce a LogicalPlan::Distinct) #2573 [sql]
  • Implement some way to show the sql used to create a view #2529
  • Consider adopting IOx ObjectStore abstraction #2489
  • Support sum0 as a built-in agg function #2067
  • implement grouping sets, cubes, and rollups #1327
  • Ruby bindings #1114
  • Support dates in hash join #2746 (andygrove)

Fixed bugs:

  • Docker Error #2851
  • Anti join ignores join filters #2842
  • Can't test or compile sub-model code after upgrade to arrow-rs 17.0.0 #2835
  • Not evaluate the set expr in the InList for the optimization #2820
  • CASE When: result type should be coercible to a common type #2818
  • IN/NOT IN List: NULL is not equal to NULL #2817
  • panic when case statement returns null #2798
  • InList: Can't cast the list expr data type to value expr data type directly #2774
  • InList Expr: expr and list values must can be converted to a same data type #2759
  • tpchgen docker syntax change prevents volume from binding #2751
  • Cannot join on date columns (Unsupported data type in hasher: Date32) #2744
  • rewrite_expression does not properly handle Exists and ScalarSubquery #2736
  • LocalFileSystem Not sorted by file name, As a result, the data lines queried in multiple files are out of order. #2730
  • Filter push down need consider alias columns #2725
  • Recent API change in GlobalLimitExec breaks compatibility with Ballista #2720
  • Common Subexpression Eliminiation pass errors if run twice on some plans: Schema contains duplicate unqualified field name 'IsNull-Column-sys.host' #2712
  • The data type is not compatible with other system, for example spark or PG database #1379

Documentation updates:

Closed issues:

  • Consider adding a prominent note in the readme about ballista #2853
  • support decimal in (NULL) #2800
  • InList: Don't treat Null as UTF8(None) #2782
  • InList: don't need to treat Null as UTF8 data type #2773
  • Implement extensible configuration mechanism #138

Merged pull requests:

9.0.0 (2022-06-10)

Full Changelog

Breaking changes:

  • MINOR: Move simplify_expression rule to datafusion-optimizer crate #2686 (andygrove)
  • Move physical expression planning to datafusion-physical-expr crate #2682 (andygrove)
  • Create new datafusion-optimizer crate for logical optimizer rules #2675 (andygrove)
  • Remove ExecutionProps dependency from OptimizerRule #2666 (andygrove)
  • Remove ObjectStoreSchemaProvider (#2656) #2665 (tustvold)
  • Move LogicalPlanBuilder to datafusion-expr crate #2576 (andygrove)
  • LogicalPlanBuilder now uses TableSource instead of TableProvider #2569 (andygrove)
  • Remove scan_empty method from LogicalPlanBuilder #2568 (andygrove)
  • MINOR: Move expression utils from sql module to expr crate #2553 (andygrove)
  • Remove scan_json methods from LogicalPlanBuilder #2541 (andygrove)
  • Remove scan_avro methods from LogicalPlanBuilder #2540 (andygrove)
  • Remove scan_parquet methods from LogicalPlanBuilder #2539 (andygrove)
  • MINOR: Move ExprVisitable and exprlist_to_columns to datafusion-expr crate #2538 (andygrove)
  • Remove scan_csv methods from LogicalPlanBuilder #2537 (andygrove)
  • Fix Redundant ScalarValue Boxed Collection #2523 (comphead)
  • Support for OFFSET in LogicalPlan #2521 (jdye64)

Implemented enhancements:

  • [EPIC] JIT support for DataFusion #2703
  • Show column names instead of column indices in query plans #2689
  • Proposal: remove automated ballista CI checks from DataFusion #2679
  • Pass SessionState to TableProvider #2658
  • Is ObjectStoreSchemaProvider Still Needed? #2656
  • Add logical plan support to datafusion-proto #2630
  • Like, NotLike expressions work with literal NULL #2626
  • Move JOIN ON predicates push down logic from planner to optimizer #2619
  • Remove ExecutionProps from OptimizerRule trait #2614
  • Add, Minus, Multiply, divide, Modulo operator work with literal NULL #2609
  • Support DESCRIBE <table> to show table schemas #2606
  • Support CREATE OR REPLACE TABLE #2605
  • filter_push_down tests should not rely on TableProvider and ExecutionPlan #2600
  • Move logical optimizer rules out of the core datafusion crate #2599
  • Push Limit through outer Join #2579
  • datafusion_proto crate should have exhaustive match statements for handling Expr #2565
  • String representation of Expr variant #2563
  • File URI Scheme Interpretation #2562
  • Implement physical plan for OFFSET #2551
  • Update limit pushdown rule to support offsets #2550
  • Move LogicalPlanBuilder to datafusion-expr crate #2536
  • Logical optimizer rule "simplify expressions" should not depend on the core datafusion crate #2535
  • Support optional filter in Join #2509
  • Improve SQL planner & logical plan support for JOIN conditions #2496
  • Numeric, String, Boolean comparisons with literal NULL #2482
  • Redundant ScalarValue Boxed Collection #2449
  • ObjectStore Directory Semantics #2445
  • Add support for OFFSET in SQL query planner + logical plan #2377
  • SQL planner should use TableSource not TableProvider #2346
  • Move SQL query planning to new crate #2345
  • Update LogicalPlan rustdoc code to not use LogicalPlanBuilder #2308
  • [Optimizer] Refactor convert join #2256
  • [Optimizer] Infer is not null predicate from where clause #2254
  • Support ArrayIndex for ScalarValue(List) #2207
  • [Ballista] Fill functional gaps between datafusion and ballista #2062
  • [Ballista] support datafusion built_in UDAF work in ballista cluster #1985
  • Export C API #1113

Fixed bugs:

  • Fix Typos in Docs #2695
  • Unable to build a docker image #2691
  • Optimization pass AggregateStatistics changes type of output from Int64 to UInt64 #2673
  • ViewTable Circular Reference #2657
  • ScalarValue::to_array_of_size panics computing statistics for nested parquet file #2653
  • The result type of count/count_distinct #2635
  • limit_push_down is not working properly with OFFSET #2624
  • Avro Tests Fail To Compile #2570
  • Unused Window functions experssion is wrongly removed from LogicalPlan during optimalization #2542
  • Bug: ObjectStoreRegistry get_by_uri does not return correct path when "scheme" is provided #2525
  • There are duplicate and inconsistent copies of datafusion.proto #2514
  • Projection pushdown produces incorrect results when column names are reused #2462
  • Incorrect Parquet Projection For Nested Types #2453
  • LogicalPlanBuilder::scan_csv creates scans with invalid table names #2278
  • Inner join incorrectly pushdown predicate with OR operation #2271
  • Ignored alias for columns with aggregate function and incorrect results when collecting statistics is enabled #2176
  • Join on path partitioned columns fails with error #2145

Documentation updates:

Closed issues:

  • [Question] Converting TableSource to custom TableProvider #2644
  • [Question] Why DataFusion is shipped with arrow version 9.1.0 on crates.io ? #2474

Merged pull requests:

  • Test optional features in CI #2708 (tustvold)
  • support indexed fields proto #2707 (nl5887)
  • Update sqlparser-rs to 0.18.0 #2705 (alamb)
  • [MINOR]: Add documentation to datafusion-row modules #2704 (alamb)
  • Make sure that the data types are supported in hashjoin before genera… #2702 (AssHero)
  • Move remaining code out of legacy core/logical_plan module #2701 (andygrove)
  • Move some tests from core to expr #2700 (andygrove)
  • MINOR: Improve Docs Readability #2696 (ryanrussell)
  • Combine limit and offset to fetch and skip and implement physical plan support #2694 (ming535)
  • MINOR: Add datafusion-sql example #2693 (andygrove)
  • Remove Ballista related lines from Dockerfile #2692 (mocknen)
  • Show column names instead of indices in query plans #2690 (andygrove)
  • MINOR: Remove uses of TryClone for Parquet #2681 (tustvold)
  • Fix AggregateStatistics optimization so it doesn't change output type #2674 (alamb)
  • If statistics of column Max/Min value does not exists in parquet file, sent Min/Max to None #2671 (AssHero)
  • MINOR: Move more expression code to datafusion-expr crate #2669 (andygrove)
  • MINOR: Rewrite imports in optimizer moduler #2667 (andygrove)
  • Update snmalloc-rs requirement from 0.2 to 0.3 #2663 (dependabot[bot])
  • Add module doc for RuntimeEnv, SessionContext, TaskContext, etc... #2655 (tustvold)
  • Prune unused dependencies from datafusion-proto #2651 (tustvold)
  • MINOR: Implement serde for join filter #2649 (andygrove)
  • pushdown support for predicates in ON clause of joins #2647 (korowa)
  • Move SortKeyCursor and RowIndex into modules, add sort_key_cursor test #2645 (alamb)
  • Implement DESCRIBE <table> #2642 (LiuYuHui)
  • Implement LogicalPlan serde in datafusion-proto #2639 (andygrove)
  • Fix limit + offset pushdown #2638 (ming535)
  • change result type of count/count_distinct from uint64 to int64 #2636 (liukun4515)
  • if none columns in window expr are needed, remove the window exprs #2634 (AssHero)
  • Like, NotLike expressions work with literal NULL #2627 (WinkerDu)
  • MINOR: Refactor datafusion-proto dependencies and imports #2623 (andygrove)
  • MINOR: add optimizer struct #2616 (jackwener)
  • Remove FilterPushDown dependency on physical plan #2615 (andygrove)
  • Support CREATE OR REPLACE TABLE #2613 (AssHero)
  • Support binary mathematical operators work with NULL literals #2610 (WinkerDu)
  • chore: try fix CI coverage #2608 (Ted-Jiang)
  • MINOR: Rename benchmark crate #2607 (andygrove)
  • chore(dep): bump cranelift to 0.84.0 #2598 (waynexia)
  • fix some typos #2597 (ming535)
  • Support limit pushdown through left right outer join #2596 (Ted-Jiang)
  • Unignore rustdoc code examples in datafusion-expr crate #2590 (andygrove)
  • Evaluate JIT'd expression over arrays #2587 (waynexia)
  • [minor]Fix ci clippy for unused import #2586 (Ted-Jiang)
  • [Doc]add doc for enable SIMD need cargo nightly #2577 (Ted-Jiang)
  • Add DataFrame union_distinct and fix documentation for distinct #2574 (andygrove)
  • Fix avro tests (#2570) #2571 (tustvold)
  • Make datafusion-proto match exhaustive #2567 (andygrove)
  • Support limit push down for offset_plan #2566 (Ted-Jiang)
  • Introduce Expr.variant_name() function #2564 (jdye64)
  • Fix some 404 links in the contribution guide #2561 (hi-rustin)
  • Update datafusion-cli readme cli version #2559 (hi-rustin)
  • MINOR: Move expr_rewriter.rs to datafusion-expr crate #2552 (andygrove)
  • Fix JOINs with complex predicates in ON (split ON expressions only by AND operator) #2534 (korowa)
  • Reduce duplication in file scan tests #2533 (tustvold)
  • Fix size_of_scalar test #2531 (alamb)
  • Update to arrow-rs 14.0.0 #2528 (alamb)
  • ObjectStoreRegistry get_by_uri now returns correct path when "scheme" is provided #2526 (timvw)
  • MINOR: Add ORDER BY clause to test #2524 (andygrove)
  • Remove unused binary_array_op_scalar! in binary.rs #2512 (alamb)
  • fix NULL <op> column evaluation, tests for same #2510 (alamb)
  • Fix projection pushdown produces incorrect results when column names are reused #2463 (jonmmease)
  • Benchmark for sort preserving merge #2431 (alamb)
  • Support GetIndexedFieldExpr for ScalarValue #2196 (ovr)

8.0.0 (2022-05-12)

Full Changelog

Breaking changes:

  • Add SQL planner support for ROLLUP and CUBE grouping set expressions #2446 (andygrove)
  • Make ExecutionPlan::execute Sync #2434 (tustvold)
  • Introduce new DataFusionError::SchemaError type #2371 (andygrove)
  • Add Expr::InSubquery and Expr::ScalarSubquery #2342 (andygrove)
  • Add Expr::Exists to represent EXISTS subquery expression #2339 (andygrove)
  • Move LogicalPlan enum to datafusion-expr crate #2294 (andygrove)
  • Remove dependency from LogicalPlan::TableScan to ExecutionPlan #2284 (andygrove)
  • Move logical expression type-coercion code from physical-expr crate to expr crate #2257 (andygrove)
  • feat: 2061 create external table ddl table partition cols #2099 [sql] (jychen7)
  • Reorganize the project folders #2081 (yahoNanJing)
  • Support more ScalarFunction in Ballista #2008 (Ted-Jiang)
  • Merge dataframe and dataframe imp #1998 (vchag)
  • Rename ExecutionContext to SessionContext, ExecutionContextState to SessionState, add TaskContext to support multi-tenancy configurations - Part 1 #1987 (mingmwang)
  • Add Coalesce function #1969 (msathis)
  • Add Create Schema functionality in SQL #1959 [sql] (matthewmturner)
  • omit some clone when converting sql to logical plan #1945 [sql] (doki23)
  • [split/16] move physical plan expressions folder to datafusion-physical-expr crate #1889 (Jimexist)
  • remove sync constraint of SendableRecordBatchStream #1884 (doki23)
  • [split/15] move built in window expr and partition evaluator #1865 (Jimexist)

Implemented enhancements:

  • Include Expr to datafusion::prelude #2347
  • Implement Serialization API for DataFusion #2340
  • Implement power function #1493
  • allow lit python function to support boolean and other types #1136
  • Automate dependency updates #37
  • Add CREATE VIEW #2279 (matthewmturner)
  • [Ballista] Support Union in ballista. #2098 (Ted-Jiang)
  • Change the DataFusion explain plans to make it clearer in the predicate/filter #2063 (Ted-Jiang)
  • Add write_json, read_json, register_json, and JsonFormat to CREATE EXTERNAL TABLE functionality #2023 (matthewmturner)
  • Qualified wildcard #2012 [sql] (doki23)
  • support bitwise or/'|' operation #1876 [sql] (liukun4515)
  • Introduce JIT code generation #1849 (yjshen)

Fixed bugs:

  • CASE expr with NULL literals panics 'WHEN expression did not return a BooleanArray' #1189
  • Function calls with NULL literals do not work #1188
  • Add SQL planner support for calling round function with two arguments #2503 (andygrove)
  • nested query fix #2402 (comphead)
  • fix issue#2058 file_format/json.rs attempt to subtract with overflow #2066 (silence-coding)
  • fix bug the optimizer rule filter push down #2039 (jackwener)
  • fix: replace ExecutionContex and ExecutionConfig with SessionContext and SessionConfig #2030 (xudong963)
  • Fixed parquet path partitioning when only selecting partitioned columns #2000 (pjmore)
  • Fix ambiguous reference error in filter plan #1925 (jonmmease)
  • platform aware partition parsing #1867 (korowa)
  • Fix incorrect aggregation in case that GROUP BY contains duplicate column names #1855 (alex-natzka)

Documentation updates:

Performance improvements:

Closed issues:

  • Make expected result string in unit tests more readable #2412
  • remove duplicated fn aggregate() in aggregate expression tests #2399
  • split distinct_expression.rs into count_distinct.rs and array_agg_distinct.rs #2385
  • move sql tests in context.rs to corresponding test files in datafustion/core/tests/sql #2328
  • Date32/Date64 as join keys for merge join #2314
  • Error precision and scale for decimal coercion in logic comparison #2232
  • Support Multiple row layout #2188
  • TPC-H Query 18 #169
  • TPC-H Query 16 #167
  • Implement Sort-Merge Join #141
  • Split logical expressions out into separate source files #114

Merged pull requests:

7.1.0 (2022-04-10)

Full Changelog

Fixed bugs:

  • By default, use only 1000 rows to infer the schema #2159

7.0.0 (2022-02-14)

Full Changelog

Breaking changes:

  • Consolidate various configurations options, remove unrelated batch_size #1565
  • Extract logical plans in LogicalPlan as independent struct #1228
  • Update ExecutionPlan to know about sortedness and repartitioning optimizer pass respect the invariants #1776 (alamb)
  • Update to arrow 8.0.0 #1673 (alamb)
  • Remove non idiomatic DataFusionError::into_arrow_external_error in favor of From conversion #1645 (alamb)
  • Remove Accumulator::update and Accumulator::merge #1582 (Jimexist)
  • implement Hash for various types and replace PartialOrd #1580 (Jimexist)
  • Replace DatafusionError with GenericError in ObjectStore interface #1541 (matthewmturner)
  • Make FLOAT SQL type map to Float32 rather than Float64 #1423 [sql] (liukun4515)
  • Map REAL SQL type to Float32 rather than Float64 to be consistent with pg #1390 [sql] (hntd187)

Implemented enhancements:

  • Create new datafusion_expr crate #1753
  • Create new datafusion_common crate #1752
  • API to get Expr's type and nullability without a DFSchema #1725
  • Cleaner API to create Expr::ScalarFunction programatically #1718
  • Introduce a Vec<u8> based row-wise representation for DataFusion #1708
  • Simplify creating new ListingTable #1705
  • Implement TableProvider for DataFrameImpl to allow registration of logical plans #1698
  • Public Expr simplification API #1694
  • Query Optimizer: Add OUTER --> INNER join conversion #1670
  • Support reading from CSV, Avro and Json files that have mergeable/compatible, but not identical schemas #1669
  • Remove DataFusionError::into_arrow_external_error in favor of From conversion #1644
  • Include join type in display implementation for logical plan #1620
  • Switch datafusion to using eq_dyn_scalar, etc kernels #1610
  • Proposal: Remove Accumulator::update and Accumulator::merge #1549
  • Replace DataFusionError/Result with impl Error for ObjectStore and Reader #1540
  • Add approx_quantile support #1538
  • support sorting decimal data type #1522
  • Keep all datafusion's packages up to date with Dependabot #1472
  • ExecutionContext support init ExecutionContextState with new(state: Arc<Mutex<ExecutionContextState>>) method #1439
  • support the decimal scalar value #1393
  • Documentation for using scalar functions with the the DataFrame API #1364
  • Support boolean == boolean and boolean != boolean operators #1159
  • Support DataType::Decimal(15, 2) in TPC-H benchmark #174
  • Make MemoryStream public #150
  • Add support for Parquet schema merging #132
  • Add SQL support for IN expression #118
  • Add logging to datafusion-cli #1789 (alamb)
  • Add approx_median() aggregate function #1729 (realno)
  • Add join type for logical plan display #1674 [sql] (xudong963)
  • Fix null comparison for Parquet pruning predicate #1595 (viirya)
  • Add corr aggregate function #1561 (realno)
  • Add covar, covar_pop and covar_samp aggregate functions #1551 (realno)
  • Add approx_quantile() aggregation function #1539 (domodwyer)
  • Initial MemoryManager and DiskManager APIs for query execution + External Sort implementation #1526 (yjshen)
  • Add stddev and variance #1525 (realno)
  • Add rem operation for Expr #1467 (liukun4515)
  • support decimal data type in create table #1431 [sql] (liukun4515)
  • Ordering by index in select expression #1419 [sql] (hntd187)
  • Add support for ORDER BY on unprojected columns #1415 (viirya)
  • Support decimal for min and max aggregate #1407 (liukun4515)
  • Consolidate ConstantFolding and SimplifyExpression #1375 (alamb)
  • Datafusion cli quiet mode command to contain option bool #1345 (Jimexist)
  • Implement array_agg aggregate function #1300 (viirya)
  • Add a command to switch output format in cli #1284 (capkurmagati)
  • Support =, <, <=, >, >=, !=, is distinct from, is not distinct from for BooleanArray #1163 (alamb)

Fixed bugs:

  • Unsupported data type in hasher: Timestamp(Second, None) #1768
  • SQL column identifiers should be converted to lowercase when unquoted #1746
  • Data type Dictionary(Int32, Utf8) not supported for binary operation 'eq' on dyn arrays #1605
  • datafusion doesn't process predicate pushdown correctly when there is outer join #1586
  • casting Int64 to Float64 unsuccessfully caused tpch8 to fail #1576
  • CTE/WITH .. UNION ALL confuses name resolution in WHERE #1509
  • ORDER BY min(x) results in error Plan("No field named 'foo.x'. Valid fields are 'MIN(foo.x)'.") #1479
  • Sort discards field metadata on the output schema #1476
  • Datafusion should not strip out timezone information from existing types #1454
  • Error on some queries: "column types must match schema types, expected XXX but found YYY" #1447
  • Query failing to return any results when filter is an equality check on strings (bad statistics in parquet) #1433
  • Field names containing period such as f.c1 cannot be named in SQL query #1432
  • Select * returns an unexpected result #1412
  • Turn off unused default features of chrono and ahash #1398
  • real data type is float32 in PG database, but in the datafusion it is as float64 #1380
  • TPC-H q10 performance regression (expression for filter with added alias is not pushed down) #1367
  • ProjectionExec Loses Field Metadata #1361
  • Support Filter on unprojected columns #1351
  • NULLS ORDER is inconsistent with postgres #1343
  • Fix bug while merging RecordBatch, add SortPreservingMerge fuzz tester #1678 (alamb)
  • fix a cte block with same name for many times #1639 [sql] (xudong963)
  • fix: casting Int64 to Float64 unsuccessfully caused tpch8 to fail #1601 (xudong963)
  • Fix single_distinct_to_groupby for arbitrary expressions #1519 (james727)
  • Fix SortExec discards field metadata on the output schema #1477 (alamb)
  • fix calculate in many_to_many_hash_partition test. #1463 (Ted-Jiang)
  • Add Timezone to Scalar::Time* types, and better timezone awareness to Datafusion's time types #1455 (maxburke)
  • Support identifiers with . in them #1449 [sql] (alamb)
  • Fixes for working with functions in dataframes, additional documentation #1430 (tobyhede)
  • [Minor] Fix send_time metric for hash-repartition #1421 (Dandandan)
  • fix: Select * returns an unexpected result #1413 [sql] (xudong963)
  • Make cli handle multiple whitespaces #1388 (capkurmagati)
  • Metadata is kept in projections for non-derived columns #1378 (hntd187)
  • Fix Predicate Pushdown: split_members should be able to split aliased predicate #1368 (viirya)
  • Change the arg names and make parameters more meaningful #1357 (liukun4515)
  • collect table stats by default for listing table #1347 (houqp)
  • fix: make nulls-order consistent with postgres #1344 [sql] (xudong963)
  • Avoid changing expression names during constant folding #1319 (viirya)
  • improve error message for invalid create table statement #1294 [sql] (houqp)
  • Forbid creating the table with the same name #1288 (liukun4515)

Documentation updates:

Performance improvements:

  • Parquet pruning predicate for IS NULL #1591
  • Fix predicate pushdown for outer joins #1618 (james727)
  • fix: sql planner creates cross join instead of inner join from select predicates #1566 [sql] (xudong963)
  • Split fetch_metadata into fetch_statistics and fetch_schema #1365 (Dandandan)
  • Optimize the performance queries with a single distinct aggregate #1315 (ic4y)
  • Left join could use bitmap for left join instead of Vec<bool> #1291 (boazberman)

Closed issues:

  • Add release compile to CI #1728
  • DiskManager and TempFiles getting created several times per query #1690
  • Add a test for the pyarrow feature in CI #1635
  • SQL tests for when sorting exceeded available memory and had to spill to disk #1573
  • Consolidate the N-way merging code and SortPreservingMergeStream (which has quite good tests of what is often quite tricky code, and it will be performance critical) #1572
  • Consolidate the SortExec code (so there is only a single sort operator that does in memory sorting if it has enough memory budget but then spills to disk if needed). #1571
  • Track memory usage in Non Limited Operators #1569
  • [Question] Why does ballista store tables in the client instead of in the SchedulerServer #1473
  • Consolidate Projection for Schema and RecordBatch #1425
  • Support Sort on unprojected columns #1372
  • Unused code in hash_aggregate #1362
  • Why use the expr types before coercion to get the result type? #1358
  • A problem about the projection_push_down optimizer gathers valid columns #1312
  • apply constant folding to LogicalPlan::Values #1170
  • reduce usage of IntoIterator<Item = Expr> in logical plan builder window fn #372
  • Why does DataFusion throw a Tokio 0.2 runtime error? #176
  • TPC-H Query 14 #165
  • Length kernel returns bytes not character length #156
  • Split the logical operators out into separate source files #115

Merged pull requests:

  • Fixup some doc warnings #1811 (alamb)
  • Ensure most of links in docs are correct #1808 [sql] (HaoYang670)
  • Update CHANGELOG.md, update release scripts #1807 (alamb)
  • Update versions for split crates #1803 (matthewmturner)
  • Improve the error message and UX of tpch benchmark program #1800 (alamb)
  • rename references of expr in logical plan module after datafusion-expr split #1797 (Jimexist)
  • Update to sqlparser 0.14 #1796 [sql] (alamb)
  • [split/13] move rest of expr to expr_fn in datafusion-expr module #1794 (Jimexist)
  • Update datafusion versions #1793 (matthewmturner)
  • Less verbose plans in debug logging #1787 (alamb)
  • [split/11] split expr type and null info to be expr-schemable #1784 (Jimexist)
  • Introduce Row format backed by raw bytes #1782 (yjshen)
  • rewrite predicates before pushing to union inputs #1781 (korowa)
  • Update datafusion to use arrow 9.0.0 #1775 (alamb)
  • [split/10] split up expr for rewriting, visiting, and simplification traits #1774 [sql] (Jimexist)
  • #1768 Support TimeUnit::Second in hasher #1769 (jychen7)
  • TPC-H benchmark can optionally write JSON output file with benchmark summary #1766 (andygrove)
  • [split/8] move Accumulator and ColumnarValue to datafusion-expr #1765 (Jimexist)
  • [split/7] move built-in scalar function to datafusion-expr #1764 (Jimexist)
  • [split/6] move signature, type signature, volatility to datafusion-expr #1763 (Jimexist)
  • [split/9+12] move udf, udaf, Expr to datafusion-expr module #1762 [sql] (Jimexist)
  • [split/5] move window frame and operator to datafusion-expr module #1761 (Jimexist)
  • [split/4] move scalar value to datafusion-common #1760 (Jimexist)
  • [split/3] split datafusion expr module and move aggregate and window function expr #1759 (Jimexist)
  • [split/2] move column and dfschema to datafusion-common module #1758 (Jimexist)
  • Use ordered-float 2.10 #1756 (andygrove)
  • [split/1] split datafusion-common module #1751 (Jimexist)
  • use clap 3 style args parsing for datafusion cli #1749 (Jimexist)
  • fix: Case insensitive unquoted identifiers in SQL #1747 [sql] (mkmik)
  • Move more tests out of context.rs #1743 (alamb)
  • Move optimize test out of context.rs #1742 (alamb)
  • Fix typos in crate documentation #1739 (r4ntix)
  • add cargo check --release to ci #1737 (xudong963)
  • Update parking_lot requirement from 0.11 to 0.12 #1735 (dependabot[bot])
  • Create built-in scalar functions programmatically #1734 (HaoYang670)
  • Prevent repartitioning of certain operator's direct children (#1731) #1732 (tustvold)
  • API to get Expr's type and nullability without a DFSchema #1726 (alamb)
  • minor: fix cargo run --release error #1723 (xudong963)
  • substitute parking_lot::Mutex for std::sync::Mutex #1720 (xudong963)
  • Convert boolean case expressions to boolean logic #1719 (tustvold)
  • Add Expression Simplification API #1717 (alamb)
  • Create ListingTableConfig which includes file format and schema inference #1715 (matthewmturner)
  • make select_to_plan clearer #1714 [sql] (xudong963)
  • Add upper bound for public function signature #1713 (HaoYang670)
  • Add tests and CI for optional pyarrow module #1711 (wjones127)
  • Create SchemaAdapter trait to map table schema to file schemas #1709 (thinkharderdev)
  • refine test in repartition.rs & coalesce_batches.rs #1707 (xudong963)
  • Fuzz test for spillable sort #1706 (yjshen)
  • Support create_physical_expr and ExecutionContextState or DefaultPhysicalPlanner for faster speed #1700 (alamb)
  • Implement TableProvider for DataFrameImpl #1699 (cpcloud)
  • Move timestamp related tests out of context.rs and into sql integration test #1696 (alamb)
  • Lazy TempDir creation in DiskManager #1695 (alamb)
  • Add MemTrackingMetrics to ease memory tracking for non-limited memory consumers #1691 (yjshen)
  • (minor) Reduce memory manager and disk manager logs from info! to debug! #1689 (alamb)
  • Make SortPreservingMergeStream stable on input stream order #1687 (alamb)
  • Incorporate dyn scalar kernels #1685 (matthewmturner)
  • Move information_schema tests out of execution/context.rs to sql_integration tests #1684 (alamb)
  • Add a new metric type: Gauge + CurrentMemoryUsage to metrics #1682 (yjshen)
  • refactor array_agg to not to have update and merge #1681 (Jimexist)
  • Use NamedTempFile rather than String in DiskManager #1680 (alamb)
  • upgrade clap to version 3 #1672 (Jimexist)
  • Improve configuration and resource use of MemoryManager and DiskManager #1668 (alamb)
  • feat: Support quarter granularity in date_trunc function #1667 (ovr)
  • Fix can not load parquet table form spark in datafusion-cli. #1665 (Ted-Jiang)
  • Make MemoryManager and MemoryStream public #1664 (yjshen)
  • [Cleanup] Move AggregatedMetricsSet to metrics for further reuse #1663 (yjshen)
  • fix: substr - correct behaivour with negative start pos #1660 (ovr)
  • suppport bitwise and as an example #1653 [sql] (liukun4515)
  • refine match pattern related code #1650 (xudong963)
  • update md-5, sha2, blake2 #1647 (xudong963)
  • Add DataFusionError -> ArrowError conversion #1643 (alamb)
  • Add spill_count and spilled_bytes to BaselineMetrics, test sort with spill #1641 (yjshen)
  • support hash decimal array and group by #1640 (liukun4515)
  • Consolidate Schema and RecordBatch projection #1638 (alamb)
  • Update hashbrown requirement from 0.11 to 0.12 #1631 (dependabot[bot])
  • Update pyo3 requirement from 0.14 to 0.15 #1627 (dependabot[bot])
  • Optimize SortPreservingMergeStream to avoid SortKeyCursor sharing #1624 (yjshen)
  • Handle merging of evolved schemas in ParquetExec #1622 (thinkharderdev)
  • feat: Support Substring(str [from int] [for int]) #1621 [sql] (ovr)
  • feat: Support complex interval via IntervalMonthDayNano #1615 [sql] (ovr)
  • consolidate binary_expr coercion rule code into binary_rule.rs module #1607 (alamb)
  • Fix comparison of dictionary arrays #1606 (alamb)
  • add test for decimal to decimal #1603 (liukun4515)
  • update nightly version #1597 (Jimexist)
  • Consolidate sort and external_sort #1596 (yjshen)
  • support from_slice for binary, string, and boolean array types #1589 (Jimexist)
  • add from_slice trait to ease arrow2 migration #1588 (Jimexist)
  • Implement ARRAY_AGG(DISTINCT ...) #1579 (james727)
  • Rename sql integration tests from mod to sql_integration #1575 (alamb)
  • minor: improve the benchmark readme #1567 (xudong963)
  • Consolidate batch_size configuration in ExecutionConfig, RuntimeConfig and PhysicalPlanConfig #1562 (yjshen)
  • Update to rust 1.58 #1557 (xudong963)
  • support mathematics operation for decimal data type #1554 (liukun4515)
  • Address clippy warnings #1553 (sergey-melnychuk)
  • enhance arithmetic operation for array with scalar #1552 (liukun4515)
  • Remove unused update and merge implementations from Aggregates and supporting ScalarValue arithmetic #1550 (alamb)
  • Add batch operations to stddev #1547 (realno)
  • Mark ARRAY_AGG(DISTINCT ...) not implemented #1534 (james727)
  • Update to arrow-7.0.0 #1523 (alamb)
  • Fix ORDER BY on aggregate #1506 (viirya)
  • Add example on how to query multiple parquet files #1497 (nitisht)
  • Refactor testing modules #1491 (hntd187)
  • add rfcs for datafusion #1490 (xudong963)
  • support comparison for decimal data type and refactor the binary coercion rule #1483 (liukun4515)
  • Minor: Rename predicate_builder --> pruning_predicate for consistency #1481 (alamb)
  • Tests for support try_cast/cast decimal to numeric #1465 (liukun4515)
  • Avoid send empty batches for Hash partitioning. #1459 (Ted-Jiang)
  • Planner code cleanup #1450 [sql] (alamb)
  • Fix bug in projection: "column types must match schema types, expected XXX but found YYY" #1448 (alamb)
  • Update arrow-rs to 6.4.0 and replace boolean comparison in datafusion with arrow compute kernel #1446 (xudong963)
  • support cast/try_cast for decimal: signed numeric to decimal #1442 (liukun4515)
  • Consolidate decimal error checking and improve error messages #1438 [sql] (alamb)
  • use 0.13 sql parser #1435 (Jimexist)
  • Minor Code cleanups #1428 (alamb)
  • Clarify communication on bi-weekly sync #1427 (alamb)
  • support sum/avg agg for decimal, change sum(float32) --> float64 #1408 [sql] (liukun4515)
  • Fix bugs with nullability during rewrites: Combine simplify and Simplifier #1401 (alamb)
  • Minimize features #1399 (carols10cents)
  • Update rust vesion to 1.57 #1395 [sql] (xudong963)
  • support decimal scalar value #1394 (liukun4515)
  • Add coercion rules for AggregateFunctions #1387 (liukun4515)
  • upgrade the arrow-rs version #1385 (liukun4515)
  • add array agg name #1382 (liukun4515)
  • Make tests for simplify and Simplifer consistent #1376 (alamb)
  • Refactor: Consolidate expression simplification code in simplify_expression.rs #1374 (alamb)
  • remove unused code in hash_aggregate #1370 (ic4y)
  • Use BufReader for LocalFileReader to revert performance regression in parquet reading #1366 (Dandandan)
  • Add unit test for constant folding on values #1355 (viirya)
  • Extract logical plan: rename the plan name (follow up) #1354 [sql] (liukun4515)
  • Moved aggr_test_schema to test_utils #1338 (rdettai)
  • upgrade arrow-rs to 6.2.0 #1334 (liukun4515)
  • Update release instructions #1331 (alamb)
  • #1268: allow datafusion-cli to toggle quiet flag within CLI #1330 (jgoday)
  • Extract Aggregate, Sort, and Join to struct from AggregatePlan #1326 (matthewmturner)
  • Extract EmptyRelation, Limit, Values from LogicalPlan #1325 (liukun4515)
  • Extract CrossJoin, Repartition, Union in LogicalPlan #1322 (liukun4515)
  • Fifth batch of updating sql tests to use assert_batches_eq #1318 (matthewmturner)
  • Extract Explain, Analyze, Extension in LogicalPlan as independent struct #1317 [sql] (xudong963)
  • Extract CreateMemoryTable, DropTable, CreateExternalTable in LogicalPlan as independent struct #1311 [sql] (liukun4515)
  • Extract Projection, Filter, Window in LogicalPlan as independent struct #1309 (ic4y)
  • Add PSQL comparison tests for except, intersect #1292 (mrob95)
  • Extract logical plans in LogicalPlan as independent struct: TableScan #1290 (xudong963)
  • Add statement helper command to cli #1285 (matthewmturner)
  • Python bindings for window functions #819 [sql] (jgoday)

6.0.0 (2021-11-13)

Full Changelog

Breaking changes:

  • Removed deprecated with_concurrency #1200 (rdettai)
  • File partitioning for ListingTable #1141 (rdettai)
  • Add function volatility to Signature #1071 [sql] (pjmore)
  • fix: allow duplicate field names in table join, fix output with duplicated names #1023 (houqp)
  • Make TableProvider.scan() and PhysicalPlanner::create_physical_plan() async #1013 (rdettai)
  • Reorganize table providers by table format #1010 (rdettai)
  • Make Metrics::labels() public #999 (alamb)
  • Rename NthValue::{first_value,last_value,nth_value} to satisfy clippy in Rust 1.55 #986 (alamb)
  • Move CBOs and Statistics to physical plan #965 (rdettai)
  • Update to sqlparser v 0.10.0 #934 [sql] (alamb)
  • FilePartition and PartitionedFile for scanning flexibility #932 [sql] (yjshen)
  • Improve SQLMetric APIs, port existing metrics #908 (alamb)
  • Add support for EXPLAIN ANALYZE #858 [sql] (alamb)
  • Rename concurrency to target_partitions #706 (andygrove)

Implemented enhancements:

  • Add booleans support to the CASE statement #1156
  • Implement General Purpose Constant Folding with the Expression Evaluator #1070
  • Mark volatility categories of functions #1069
  • Add "show" support to DataFrame API #937
  • Add support for TRIM BOTH/LEADING/TRAILING #935
  • Add "baseline" metrics to all built in operators #866
  • Add SQL support for referencing fields in structs #119
  • add filename completer for create table statement #1278 (Jimexist)
  • Add drop table support #1266 [sql] (viirya)
  • Dataframe supports except and update readme #1261 (xudong963)
  • Implement EXCEPT & EXCEPT DISTINCT #1259 [sql] (xudong963)
  • Add DataFrame support for INTERSECT and update readme #1258 (xudong963)
  • use arrow 6.1.0 #1255 (Jimexist)
  • fix 1250, add editor support for datafusion cli with validation #1251 (Jimexist)
  • Add support for create table as via MemTable #1243 [sql] (Dandandan)
  • Add cli show columns command to describe tables #1231 (Jimexist)
  • datafusion-cli to add list table command #1229 (Jimexist)
  • datafusion cli to handle EoF and interrupt signal #1225 (Jimexist)
  • add \q as quit command and add ? for help #1224 (Jimexist)
  • Add algebraic simplifications to constant_folding #1208 (matthewmturner)
  • Improve GetIndexedFieldExpr adding utf8 key based access for struct v… #1204 [sql] (Igosuki)
  • Fix between in select query #1202 [sql] (capkurmagati)
  • Move code to fold Stable functions like now() from Simplifier to ConstEvaluator #1176 (alamb)
  • DataFrame supports window function #1167 [sql] (xudong963)
  • add values list expression #1165 [sql] (Jimexist)
  • Add booleans support to the CASE statement #1161 (xudong963)
  • Improve error messages when operations are not supported #1158 (alamb)
  • Generic constant expression evaluation #1153 (alamb)
  • python lit function to support bool and byte vec #1152 (Jimexist)
  • [nit] simplify datafusion optimizer module codes #1146 (panarch)
  • Add ScalarValue support for arbitrary list elements #1142 (jonmmease)
  • Multiple files per partitions for CSV Avro Json #1138 (rdettai)
  • Implement INTERSECT & INTERSECT DISTINCT #1135 [sql] (xudong963)
  • Simplify file struct abstractions #1120 (rdettai)
  • Implement is [not] distinct from #1117 [sql] (Dandandan)
  • Clean up spawned task on drop for RepartitionExec, SortPreservingMergeExec, WindowAggExec #1112 (crepererum)
  • add hyperloglog implementation (add and count) #1095 (Jimexist)
  • Add ScalarValue::Struct variant #1091 (jonmmease)
  • add digest(utf8, method) function and refactor all current hash digest functions #1090 (Jimexist)
  • [crypto] add blake3 algorithm to digest function #1086 (Jimexist)
  • [crypto] add blake2b and blake2s functions #1081 (Jimexist)
  • [nit] make schema qualifier error message in field lookup more readable #1079 (Jimexist)
  • [window function] add percent_rank window function #1077 (Jimexist)
  • [window function] add cume_dist implementation #1076 (Jimexist)
  • Add a LogicalPlanBuilder::schema() function #1075 (alamb)
  • Add support for UNION [DISTINCT] sql #1068 [sql] (xudong963)
  • fix: fix joins on Float32/Float64 columns bug #1054 (francis-du)
  • Update sqlparser-rs to 0.11 #1052 [sql] (alamb)
  • Support querying CSV files without providing the schema #1050 [sql] (xudong963)
  • remove hard coded partition count in ballista logicalplan deserialization #1044 (xudong963)
  • feat: add lit_timestamp_nanosecond #1030 (NGA-TRAN)
  • Ignore metadata on schema merge #1024 (Smurphy000)
  • add ExecutionConfig.with_optimizer_rules #1022 (seddonm1)
  • Add baseline execution stats to WindowAggExec and UnionExec, and fixup CoalescePartitionsExec #1018 (alamb)
  • Derive PartialOrd for Expr #1015 (alamb)
  • Indexed field access for List #1006 [sql] (Igosuki)
  • Add metrics for Limit and Projection, and CoalesceBatches #1004 (alamb)
  • Update DataFusion to arrow 6.0 #984 (alamb)
  • Implement Display for Expr, improve operator display #971 [sql] (matthewmturner)
  • Add metrics for FilterExec #960 (alamb)
  • Change compound column field name rules #952 (waynexia)
  • ObjectStore API to read from remote storage systems #950 (yjshen)
  • Add baseline metrics to SortPreservingMergeExec #948 (alamb)
  • Add support for TRIM LEADING/TRAILING/BOTH syntax #947 [sql] (adsharma)
  • fixes #933 replace placeholder fmt_as fr ExecutionPlan impls #939 (tiphaineruy)
  • Add metrics for SortExect + HashAggregateExec #938 (alamb)
  • Add some additional asserts in utils::from_plan #930 (alamb)
  • Avro Table Provider #910 [sql] (Igosuki)
  • Add BaselineMetrics, Timestamp metrics, add for CoalescePartitionsExec, rename output_time -> elapsed_compute #909 (alamb)
  • add cross join support to ballista #891 (houqp)
  • Add Ballista support to DataFusion CLI #889 (andygrove)
  • support like on DictionaryArray #876 (b41sh)
  • Register table based on known schema without file IO #872 (Dandandan)
  • Add support for PostgreSQL regex match #870 [sql] (b41sh)
  • Include planning time in datafusion-cli printing #860 (Dandandan)
  • Implement basic common subexpression eliminate optimization #792 (waynexia)
  • Impl ops::Not for expr #763 (Jimexist)

Fixed bugs:

  • Can not use between in the select list: #1196
  • ORDER BY does not work with literals: Sort operation is not applicable to scalar value 'foo' #1195
  • window functions with NULL literals in partition by and order by do not work: Internal("Sort operation is not applicable to scalar value NULL") #1194
  • Operation name not included in internal errors -- Internal("Data type Boolean not supported for binary operation on dyn arrays") #1157
  • Physical plan explain UNION query says "ExecutionPlan(PlaceHolder)" #933
  • Can not use LIKE on DictionaryArray encoded strings #815
  • physical_plan::repartition::tests::repartition_with_dropping_output_stream failing locally #614
  • Fix some BuiltinScalarFunction panics with zero arguments #1249 (capkurmagati)
  • fix: not do boolean folding on NULL and/or expr #1245 (NGA-TRAN)
  • ignore case of with header row in sql when creating external table #1237 [sql] (lichuan6)
  • fix: Min/Max aggregation data type should not be dictionary #1235 (NGA-TRAN)
  • Fix build with --no-default-features #1219 (alamb)
  • Prevent "future cannot be sent between threads safely" compilation error #1155 (jonmmease)
  • Clean up spawned task on drop for AnalyzeExec, CoalescePartitionsExec, HashAggregateExec #1121 (crepererum)
  • Clean up spawned task on SortStream drop #1105 (crepererum)
  • fix UNION ALL bug: thread 'main' panicked at 'index out of bounds: the len is 1 but the index is 1', ./src/datatypes/schema.rs:165:10 #1088 (xudong963)
  • python: fix generated table name in dataframe creation #1078 (houqp)
  • fix subquery alias #1067 [sql] (xudong963)
  • fix pattern handling in regexp_match function #1065 (houqp)
  • fix: joins on Timestamp columns #1055 (francis-du)
  • Fix metric name typo #943 (alamb)
  • EXPLAIN ANALYZE should run all Optimizer passes #929 (alamb)

Documentation updates:

Performance improvements:

  • Improve avro reader performance by avoiding some cloning on avro_rs::Value #1206 (Igosuki)
  • optimize build profile for datafusion python binding, cli and ballista #1137 (houqp)
  • Avoid stack overflow by reducing stack usage of BinaryExpr::evaluate in debug builds #1047 (alamb)
  • Add ScalarValue::eq_array optimized comparison function #844 (alamb)
  • Rework GroupByHash to for faster performance and support grouping by nulls #808 (alamb)

Closed issues:

  • InList expr with NULL literals do not work #1190
  • update the homepage README to include values, approx_distinct, etc. #1171
  • [Python]: Inconsistencies with Python package name #1011
  • Wanting to contribute to project where to start? #983
  • delete redundant code #973
  • How to build DataFusion python wheel #853
  • Add support for partition pruning #204
  • [Datafusion] Support joins on TimestampMillisecond columns #187
  • TPC-H Query 21 #173
  • TPC-H Query 13 #164
  • TPC-H Query 8 #162
  • implement split_part(string, delimiter, position) #157
  • Join Statement: Schema contains duplicate unqualified field name #155
  • ParquetTable should avoid scanning all files twice #136
  • Add support for reading partitioned Parquet files #133
  • Add support for Parquet schema merging #132
  • Catalog abstraction #126
  • Optimizer rules should work with qualified column names #125
  • Add optional qualifier to Expr::Column #121
  • Implement modulus expression #99
  • [Rust] Add constant folding to expressions during logically planning #98
  • [Rust] Implement pretty print for physical query plan #93
  • Can not group by boolean columns (add boolean to valid keys of groupBy) #91
  • improve performance of building literal arrays #90
  • [rust][datafusion] optimize count(*) queries on parquet sources #89
  • Produce a design for a metrics framework #21

Merged pull requests:

  • Add timezome string to stablize test #1265 (viirya)
  • numerical_coercion pattern match optimize #1256 (Jimexist)
  • fix and update window function sql tests #1059 (Jimexist)
  • reduce ScalarValue from trait boilerplate with macro #989 (houqp)

For older versions, see apache/arrow/CHANGELOG.md

5.0.0 (2021-08-10)

Full Changelog

Breaking changes:

  • Box ScalarValue:Lists, reduce size by half size #788 (alamb)
  • JOIN conditions are order dependent #778 (seddonm1)
  • Show the result of all optimizer passes in EXPLAIN VERBOSE #759 (alamb)
  • #723 Datafusion add option in ExecutionConfig to enable/disable parquet pruning #749 (lvheyang)
  • Update API for extension planning to include logical plan #643 (alamb)
  • Rename MergeExec to CoalescePartitionsExec #635 (andygrove)
  • fix 593, reduce cloning by taking ownership in logical planner's from fn #610 (Jimexist)
  • fix join column handling logic for On and Using constraints #605 (houqp)
  • Rewrite pruning logic in terms of PruningStatistics using Array trait (option 2) #426 (alamb)
  • Support reading from NdJson formatted data sources #404 (heymind)
  • Add metrics to RepartitionExec #398 (andygrove)
  • Use 4.x arrow-rs from crates.io rather than git sha #395 (alamb)
  • Return Vec<bool> from PredicateBuilder rather than an Fn #370 (alamb)
  • Refactor: move RowGroupPredicateBuilder into its own module, rename to PruningPredicateBuilder #365 (alamb)
  • [Datafusion] NOW() function support #288 (msathis)
  • Implement select distinct #262 (Dandandan)
  • Refactor datafusion/src/physical_plan/common.rs build_file_list to take less param and reuse code #253 (Jimexist)
  • Support qualified columns in queries #55 (houqp)
  • Read CSV format text from stdin or memory #54 (heymind)
  • Use atomics for SQLMetric implementation, remove unused name field #25 (returnString)

Implemented enhancements:

  • Allow extension nodes to correctly plan physical expressions with relations #642
  • Filters aren't passed down to table scans in a union #557
  • Support pruning for boolean columns #490
  • Implement SQLMetrics for RepartitionExec #397
  • DataFusion benchmarks should show executed plan with metrics after query completes #396
  • Use published versions of arrow rather than github shas #393
  • Add Compare to GroupByScalar #364
  • Reusable "row group pruning" logic #363
  • Add an Order Preserving merge operator #362
  • Implement Postgres compatible now() function #251
  • COUNT DISTINCT does not support dictionary types #249
  • Use standard make_null_array for CASE #222
  • Implement date_trunc() function #203
  • COUNT DISTINCT does not support for Float64 #199
  • Update SQLMetric to use atomics rather than a Mutex #30
  • Implement PartialOrd for ScalarValue #838 (viirya)
  • Support date datatypes in max/min #820 (viirya)
  • Implement vectorized hashing for DictionaryArray types #812 (alamb)
  • Convert unsupported conditions in left right join to filters #796 [sql] (Dandandan)
  • Implement streaming versions of Dataframe.collect methods #789 (andygrove)
  • impl from str for column and scalar #762 (Jimexist)
  • impl fmt::Display for PlanType #752 (Jimexist)
  • Remove unnecessary projection in logical plan optimization phase #747 (waynexia)
  • Support table columns alias #735 (Dandandan)
  • Derive PartialEq for datasource enums #734 (alamb)
  • Allow filetype to be lowercase, Implement FromStr for FileType #728 (Jimexist)
  • Update to use arrow 5.0 #721 (alamb)
  • #554: Lead/lag window function with offset and default value arguments #687 (jgoday)
  • dedup using join column in wildcard expansion #678 (houqp)
  • Implement metrics for HashJoinExec #664 (andygrove)
  • Show physical plan with metrics in benchmark #662 (andygrove)
  • Allow non-equijoin filters in join condition #660 (Dandandan)
  • Add End-to-end test for parquet pruning + metrics for ParquetExec #657 (alamb)
  • Add support for leading field in interval #647 (Dandandan)
  • Remove hard-coded PartitionMode from Ballista serde #637 (andygrove)
  • Ballista: Implement scalable distributed joins #634 (andygrove)
  • implement rank and dense_rank function and refactor built-in window function evaluation #631 (Jimexist)
  • Improve "field not found" error messages #625 (andygrove)
  • Support modulus op #577 (gangliao)
  • implement std::default::Default for execution config #570 (Jimexist)
  • to_timestamp_millis(), to_timestamp_micros(), to_timestamp_seconds() #567 (velvia)
  • Filter push down for Union #559 (Dandandan)
  • Implement window functions with partition_by clause #558 (Jimexist)
  • support table alias in join clause #547 (houqp)
  • Not equal predicate in physical_planning pruning #544 (jgoday)
  • add error handling and boundary checking for window frames #530 (Jimexist)
  • Implement window functions with order_by clause #520 (Jimexist)
  • support group by column positions #519 [sql] (jychen7)
  • Implement constant folding for CAST #513 (msathis)
  • Add window frame constructs - alternative #506 (Jimexist)
  • Add partition by constructs in window functions and modify logical planning #501 (Jimexist)
  • Add support for boolean columns in pruning logic #500 (alamb)
  • #215 resolve aliases for group by exprs #485 (jychen7)
  • Support anti join #482 (Dandandan)
  • Support semi join #470 (Dandandan)
  • add order by construct in window function and logical plans #463 (Jimexist)
  • Remove reundant filters (e.g. c> 5 AND c>5 --> c>5) #436 (jgoday)
  • fix: display the content of debug explain #434 (NGA-TRAN)
  • implement lead and lag built-in window function #429 (Jimexist)
  • add support for ndjson for datafusion-cli #427 (Jimexist)
  • add first_value, last_value, and nth_value built-in window functions #403 (Jimexist)
  • export both now and random functions #389 (Jimexist)
  • Function to create ArrayRef from an iterator of ScalarValues #381 (alamb)
  • Sort preserving merge (#362) #379 (tustvold)
  • Add support for multiple partitions with SortExec (#362) #378 (tustvold)
  • add window expression stream, delegated window aggregation to aggregate functions, and implement row_number #375 (Jimexist)
  • Add PartialOrd and Ord to GroupByScalar (#364) #368 (tustvold)
  • Implement readable explain plans for physical plans #337 (alamb)
  • Add window expression part 1 - logical and physical planning, structure, to/from proto, and explain, for empty over clause only #334 (Jimexist)
  • Use NullArray to Pass row count to ScalarFunctions that take 0 arguments #328 (Jimexist)
  • add --quiet/-q flag and allow timing info to be turned on/off #323 (Jimexist)
  • Implement hash partitioned aggregation #320 (Dandandan)
  • Support COUNT(DISTINCT timestamps) #319 (charlibot)
  • add random SQL function #303 (Jimexist)
  • allow datafusion cli to take -- comments #296 (Jimexist)
  • Add json print format mode to datafusion cli #295 (Jimexist)
  • Add print format param with support for tsv print format to datafusion cli #292 (Jimexist)
  • Add print format param and support for csv print format to datafusion cli #289 (Jimexist)
  • allow datafusion-cli to take a file param #285 (Jimexist)
  • add param validation for datafusion-cli #284 (Jimexist)
  • [breaking change] fix 265, log should be log10, and add ln #271 (Jimexist)
  • Implement count distinct for dictionary arrays #256 (alamb)
  • Count distinct floats #252 (pjmore)
  • Add rule to eliminate LIMIT 0 and replace it with an EmptyRelation #213 (Dandandan)
  • Allow table providers to indicate their type for catalog metadata #205 (returnString)
  • Use arrow eq kernels in CaseWhen expression evaluation #52 (Dandandan)
  • Re-export Arrow and Parquet crates from DataFusion #39 (returnString)
  • [DataFusion] Optimize hash join inner workings, null handling fix #24 (Dandandan)
  • [ARROW-12441] [DataFusion] Cross join implementation #11 (Dandandan)

Fixed bugs:

  • Projection pushdown removes unqualified column names even when they are used #617
  • Panic while running join datatypes/schema.rs:165:10 #601
  • Indentation is incorrect for joins in formatted physical plans #345
  • Error while running COUNT DISTINCT (timestamp): 'Unexpected DataType for list #314
  • When joining two tables, get Error: Plan("Schema contains duplicate unqualified field name 'xxx'") #311
  • Incorrect answers with SELECT DISTINCT queries #250
  • Intermitent failure in CI join_with_hash_collision #227
  • Concat from Dataframe API no longer accepts multiple expressions #226
  • Fix right, full join handling when having multiple non-matching rows at the left side #845 (Dandandan)
  • Qualified field resolution too strict #810 [sql] (seddonm1)
  • Better join order resolution logic #797 [sql] (seddonm1)
  • Produce correct answers for Group BY NULL (Option 1) #793 (alamb)
  • Use consistent version of string_to_timestamp_nanos in DataFusion #767 (alamb)
  • #723 limit pruning rule to simple expression #764 (lvheyang)
  • #699 fix return type conflict when calling builtin math fuctions #716 (lvheyang)
  • Fix Date32 and Date64 parquet row group pruning #690 (alamb)
  • Remove qualifiers on pushed down predicates / Fix parquet pruning #689 (alamb)
  • use Weak ptr to break catalog list <> info schema cyclic reference #681 (crepererum)
  • honor table name for csv/parquet scan in ballista plan serde #629 (houqp)
  • fix 621, where unnamed window functions shall be differentiated by partition and order by clause #622 (Jimexist)
  • RFC: Do not prune out unnecessary columns with unqualified references #619 (alamb)
  • [fix] select * on empty table #613 (rdettai)
  • fix 592, support alias in window functions #607 (Jimexist)
  • RepartitionExec should not error if output has hung up #576 (alamb)
  • Fix pruning on not equal predicate #561 (alamb)
  • hash float arrays using primitive usigned integer type #556 (houqp)
  • Return errors properly from RepartitionExec #521 (alamb)
  • refactor sort exec stream and combine batches #515 (Jimexist)
  • Fix display of execution time in datafusion-cli #514 (Dandandan)
  • Wrong aggregation arguments error. #505 (jgoday)
  • fix window aggregation with alias and add integration test case #454 (Jimexist)
  • fix: don't duplicate existing filters #409 (e-dard)
  • Fixed incorrect logical type in GroupByScalar. #391 (jorgecarleitao)
  • Fix indented display for multi-child nodes #358 (alamb)
  • Fix SQL planner to support multibyte column names #357 (agatan)
  • Fix wrong projection 'optimization' #268 (Dandandan)
  • Fix Left join implementation is incorrect for 0 or multiple batches on the right side #238 (Dandandan)
  • Count distinct boolean #230 (pjmore)
  • Fix Filter / where clause without column names is removed in optimization pass #225 (Dandandan)

Documentation updates:

Performance improvements:

  • Speed up inlist for strings and primitives #813 (Dandandan)
  • perf: improve performance of SortPreservingMergeExec operator #722 (e-dard)
  • Optimize min/max queries with table statistics #719 (b41sh)
  • perf: Improve materialisation performance of SortPreservingMergeExec #691 (e-dard)
  • Optimize count(*) with table statistics #620 (Dandandan)
  • optimize window function's find_ranges_in_range #595 (Jimexist)
  • Collapse sort into window expr and do sort within logical phase #571 (Jimexist)
  • Use repartition in window functions to speed up #569 (Jimexist)
  • Constant fold / optimize to_timestamp function during planning #387 (msathis)
  • Speed up create_batch_from_map #339 (Dandandan)
  • Simplify math expression code (use unary kernel) #309 (Dandandan)

Closed issues:

  • Confirm git tagging strategy for releases #770
  • arrow::util::pretty::pretty_format_batches missing #769
  • move the assert_batches_eq! macros to a non part of datafusion #745
  • fix an issue where aliases are not respected in generating downstream schemas in window expr #592
  • make the planner to print more succinct and useful information in window function explain clause #526
  • move window frame module to be in logical_plan #517
  • use a more rust idiomatic way of handling nth_value #448
  • create a test with more than one partition for window functions #435
  • COUNT DISTINCT does not support for Boolean #202
  • Read CSV format text from stdin or memory #198
  • Fix null handling hash join #195
  • Allow TableProviders to indicate their type for the information schema #191
  • Make DataFrame extensible #190
  • TPC-H Query 19 #170
  • TPC-H Query 7 #161
  • Upgrade hashbrown to 0.10 #151
  • Implement vectorized hashing for hash aggregate #149
  • More efficient LEFT join implementation #143
  • Implement vectorized hashing #142
  • RFC Roadmap for 2021 (DataFusion) #140
  • Implement hash partitioning #131
  • Grouping by column position #110
  • [Datafusion] GROUP BY with a high cardinality doesn't seem to finish #107
  • [Rust] Add support for JSON data sources #103
  • [Rust] Implement metrics framework #95
  • Publically export Arrow crate from datafusion #36
  • Implement hash-partitioned hash aggregate #27
  • Consider using GitHub pages for DataFusion/Ballista documentation #18
  • Update "repository" in Cargo.toml #16

Merged pull requests:

  • Use RawTable API in hash join #827 (Dandandan)
  • Add test for window functions on dictionary #823 (alamb)
  • Update dependencies: prost to 0.8 and tonic to 0.5 #818 (alamb)
  • Move hash_array into hash_utils.rs #807 (alamb)
  • Remove GroupByScalar and use ScalarValue in preparation for supporting null values in GroupBy #786 (alamb)
  • fix 226, make concat, concat_ws, and random work with Python crate #761 (Jimexist)
  • Test for parquet pruning disabling #754 (alamb)
  • Add explain verbose with limit push down #751 (Jimexist)
  • Move assert_batches_eq! macros to test_utils.rs #746 (alamb)
  • Show optimized physical and logical plans in EXPLAIN #744 (alamb)
  • update python crate to support latest pyo3 syntax and gil sematics #741 (Jimexist)
  • update python crate dependencies #740 (Jimexist)
  • provide more details on required .parquet file extension error message #729 (Jimexist)
  • split up windows functions into a dedicated module with separate files #724 (Jimexist)
  • Use pytest in integration test #715 (Jimexist)
  • replace once iter chain with array::IntoIter #704 (houqp)
  • avoid iterator materialization in column index lookup #703 (houqp)
  • Fix build with 1.52.1 #696 (alamb)
  • Fix test output due to logical merge conflict #694 (alamb)
  • add more integration tests #668 (Jimexist)
  • Bump arrow and parquet versions to 4.4 #654 (toddtreece)
  • Add query 15 to TPC-H queries #645 (Dandandan)
  • Improve error message and comments #641 (alamb)
  • add integration tests for rank, dense_rank, fix last_value evaluation with rank #638 (Jimexist)
  • round trip TPCH queries in tests #630 (houqp)
  • use Into<String> as argument type wherever applicable #615 (houqp)
  • reuse alias map in aggregate logical planning and refactor position resolution #606 (Jimexist)
  • fix clippy warnings #581 (Jimexist)
  • Add benchmarks to window function queries #564 (Jimexist)
  • reuse code for now function expr creation #548 (houqp)
  • turn on clippy rule for needless borrow #545 (Jimexist)
  • Refactor hash aggregates's planner building code #539 (Jimexist)
  • Cleanup Repartition Exec code #538 (alamb)
  • reuse datafusion physical planner in ballista building from protobuf #532 (Jimexist)
  • remove redundant into_iter() calls #527 (Jimexist)
  • Fix 517 - move window_frames module to logical_plan #518 (Jimexist)
  • Refactor window aggregation, simplify batch processing logic #516 (Jimexist)
  • Add datafusion::test_util, resolve test data paths without env vars #498 (mluts)
  • Avoid warnings in tests when compiling without default features #489 (alamb)
  • update cargo.toml in python crate and fix unit test due to hash joins #483 (Jimexist)
  • use prettier check in CI #453 (Jimexist)
  • Optimize nth_value, remove first_value, last_value structs and use idiomatic rust style #452 (Jimexist)
  • Fixed typo / logical merge conflict #433 (jorgecarleitao)
  • include test data and add aggregation tests in integration test #425 (Jimexist)
  • Add some padding around the logo #411 (parthsarthy)
  • Benchmark subcommand to distinguish between DataFusion and Ballista #402 (jgoday)
  • refactor datafusion/scalar_value to use more macro and avoid dup code #392 (Jimexist)
  • Update TPC-H benchmark to show physical plan when debug mode is enabled #386 (andygrove)
  • Update arrow dependencies again #341 (alamb)
  • Update arrow-rs deps #317 (alamb)
  • Update PR template by commenting out instructions #315 (alamb)
  • fix clippy warning #286 (Jimexist)
  • add integration test to compare datafusion-cli against psql #281 (Jimexist)
  • Update arrow deps #269 (alamb)
  • Use multi-stage build dockerfile in datafusion-cli and reduce image size from 2.16GB to 89.9MB #266 (Jimexist)
  • Enable redundant_field_names clippy lint #261 (Dandandan)
  • fix clippy lint #259 (alamb)
  • Move datafusion-cli to new crate #231 (Dandandan)
  • Make test join_with_hash_collision deterministic #229 (Dandandan)
  • Update arrow-rs deps (to fix build due to flatbuffers update) #224 (alamb)
  • Use standard make_null_array for CASE #223 (alamb)
  • update arrow-rs deps to latest master #216 (alamb)
  • MINOR: Remove empty rust dir #61 (andygrove)

* This Changelog was automatically generated by github_changelog_generator