Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Utilise struct stats when available #656

Merged
merged 42 commits into from
Jul 6, 2022
Merged
Changes from 1 commit
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
e8be31b
Log stats
Tom-Newton Jun 19, 2022
89a25ae
Point unittest to problematic data
Tom-Newton Jun 19, 2022
1f2f023
Add minimal test data to reproduce
Tom-Newton Jun 20, 2022
c1c17b5
Update test data
Tom-Newton Jun 20, 2022
96b508b
Fix test data
Tom-Newton Jun 20, 2022
0c38cc8
Add test
Tom-Newton Jun 20, 2022
e37486f
Update rust logging
Tom-Newton Jun 20, 2022
a482b84
It actually compiles!
Tom-Newton Jun 22, 2022
7eb031c
I don't understand rust
Tom-Newton Jun 22, 2022
9af132a
Debug logging
Tom-Newton Jun 22, 2022
a5172e4
Fix sample data
Tom-Newton Jun 22, 2022
d5bf0da
Tidy
Tom-Newton Jun 22, 2022
514c7cd
Update rust/src/action.rs
Tom-Newton Jun 23, 2022
e3a3be1
Test data with more complex types
Tom-Newton Jun 23, 2022
1a78ccd
Still return json stats if there is an error parsing parquet stats
Tom-Newton Jun 23, 2022
5af9a3a
Better error message
Tom-Newton Jun 23, 2022
eba1a1d
Unittest covering more complex types
Tom-Newton Jun 23, 2022
94a007b
Support parsing structs
Tom-Newton Jun 24, 2022
46098e7
Compare struct comes out the same as json
Tom-Newton Jun 24, 2022
183d131
Correct timestamp formatting
Tom-Newton Jun 24, 2022
957c8b5
In progress better test and support for more columns
Tom-Newton Jun 26, 2022
c145585
All types except decimal work
Tom-Newton Jun 26, 2022
bee0d15
Test data with nested structs
Tom-Newton Jun 26, 2022
bc8128a
Update test
Tom-Newton Jun 26, 2022
c2887d9
All workng except decimal
Tom-Newton Jun 26, 2022
f82b6c9
Working decimal conversion
Tom-Newton Jun 26, 2022
2d3865c
Update test data again
Tom-Newton Jun 26, 2022
8e5efac
Passing test
Tom-Newton Jun 26, 2022
5b006f1
Tidy
Tom-Newton Jun 26, 2022
33a02c5
Tidy
Tom-Newton Jun 26, 2022
9bea24a
Tidy
Tom-Newton Jun 26, 2022
352e3d5
Remove .crc files
Tom-Newton Jun 26, 2022
0441169
Merge remote-tracking branch 'upstream2/main' into tomnewton/utilise_…
Tom-Newton Jun 26, 2022
8d9ca72
Remove unneeded return statements
Tom-Newton Jun 26, 2022
9ff8219
Remove python test
Tom-Newton Jun 26, 2022
a1571e6
Use from and reference instead of clone
Tom-Newton Jun 27, 2022
5e845cc
Use into
Tom-Newton Jun 27, 2022
a2c5733
dereferance timestamp
Tom-Newton Jun 27, 2022
70f8cee
use into
Tom-Newton Jun 27, 2022
da132ee
Use reference to field
Tom-Newton Jun 27, 2022
d0d8d5f
de-reference date
Tom-Newton Jun 27, 2022
270b027
Update rust/tests/read_delta_test.rs
Tom-Newton Jun 27, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
10 changes: 0 additions & 10 deletions python/tests/test_table_read.py
Original file line number Diff line number Diff line change
Expand Up @@ -193,16 +193,6 @@ def test_read_table_with_stats():
# assert data.num_rows == 0


def test_read_table_with_only_struct_stats():
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm removing this because everything should now be covered by the lower level rust test I added. This did surface something interesting though. It seems like the DeltaTable.to_pyarrow_schema() function cannot handle the struct of array of map type I put in my test data. I think this is probably an issue for another time though.

table_path = "../rust/tests/data/delta-1.2.1-only-struct-stats"
dt = DeltaTable(table_path)

dataset = dt.to_pyarrow_dataset()

filter_expr = ds.field("a") == 5
assert len(list(dataset.get_fragments(filter=filter_expr))) == 1


def test_vacuum_dry_run_simple_table():
table_path = "../rust/tests/data/delta-0.2.0"
dt = DeltaTable(table_path)
Expand Down