Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patched 40.0.0 with Parquet memory limiting40 #37

Open
wants to merge 9 commits into
base: alamb/40.0.0_base
Choose a base branch
from

Conversation

alamb
Copy link
Owner

@alamb alamb commented May 30, 2023

This PR contains a patched version of 40.0.0 that backports the fix for apache#3871 and other related parquet changes so that we can use it in IOx - https://github.com/influxdata/influxdb_iox/pull/7880

It starts with the parquet 40.0.0 release and cherry-picks the following commits. All git cherry-picks applied cleanly ( I didn't need to resolve any conflicts)

3adca53 -- metadata
58e2c1c -- splice column
17ca4d5 - Debug Impls
56437cc - default for writer props
aa799f0 - Send
3e5b07a - more send
6959b4b - metrics
741244d - Fixed size support
ea00892 - Memory Accounting

tustvold and others added 9 commits May 30, 2023 10:57
* Add splice column API (apache#4155)

* Review feedback

* Re-encode offset index
…e#4278)

* Add `Debug` impls for writers

* Improve display
* feat(api make ArrowArrayStreamReader Send

* simplify ptr handling

* rename pyarrow traits to conform to guidelines

* pr feedback

* remove dangling Box::from_raw
* Derive Default for WriterProperties

* Review feedback
* Initial implementation for writing fixed-size lists to Parquet.

The implementation still needs tests.
The implementation uses a new `write_fixed_size_list` method instead of `write_list`.
This is done to avoid the overhead of needlessly calculating list offsets.

* Initial implementation for reading fixed-size lists from Parquet.

The implementation still needs tests.

* Added tests for fixed-size list writer.

Fixed bugs in implementation found via tests.

* Added tests for fixed-size list reader.

Fixed bugs in implementation found via tests.

* Added correct behavior for writing empty fixed-length lists.

Writer now emits the correct definition levels for empty lists.
Added empty list unit test.

* Added correct behavior for reading empty fixed-length lists.

Reader now handles empty list definition levels correctly.
Added empty list unit test.

* Fixed linter warnings.

* Added license header to fixed_size_list_array.rs

* Added fixed-size list reader tests from PR review.

* Added fixed-size reader row length sanity checks.

* Simplified fixed-size list case in LevelInfoBuilder constructor.

* Removed dynamic dispatch inside fixed-length list writer.

* Expanded list of structs test for fixed-size list writer.

* Reverted expected levels in fixed-size list writer test.

* Fixed linter warnings.

* Updated list size check in fixed-size list reader.

Converted the check to return an error instead of panicking.

* Small tweak to row length check in fixed-size list reader.

* Fixed bug in fixed-size list level encoding.

Writer now correctly handles child arrays with variable row length.
Added new unit test to verify the new behavior is correct.

* Added fixed-size list reader test.

Test verifies that reader handles child arrays with variable length correctly.
…ad of RecordBatch (apache#3871) (apache#4280)

* Buffer Pages in ArrowWriter instead of RecordBatch (apache#3871)

* Review feedback

* Improved memory accounting

* Clippy
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants