-
Notifications
You must be signed in to change notification settings - Fork 658
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix disabling parquet statistics (#2185) #2191
Conversation
@@ -456,7 +456,8 @@ impl<'a, E: ColumnValueEncoder> GenericColumnWriter<'a, E> { | |||
for &level in levels { | |||
if level == self.descr.max_def_level() { | |||
values_to_write += 1; | |||
} else if self.statistics_enabled == EnabledStatistics::Page { | |||
} else { | |||
// We must always compute this as it is used to populate v2 pages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously this would cause an incorrect null count in the data page v2 header
.build()?; | ||
.set_dictionary_page_offset(dict_page_offset); | ||
|
||
if self.statistics_enabled != EnabledStatistics::None { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously we would still put the column statistics, even if we hadn't actually computed them 馃槺
statistics, | ||
.. | ||
} => { | ||
assert_eq!(*num_values, 6); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confusingly in parquet num_values includes nulls, despite null values not actually be encoded 馃槅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also verified that the newly added test does indeed fail without the code changes in this PR
---- column::writer::tests::test_disabled_statistics stdout ----
thread 'column::writer::tests::test_disabled_statistics' panicked at 'assertion failed: statistics.is_none()', parquet/src/column/writer/mod.rs:1689:17
stack backtrace:
馃殺 馃嚠馃嚬
Benchmark runs are scheduled for baseline = 445283c and contender = 82e0512. 82e0512 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #2185
Rationale for this change
What changes are included in this PR?
Fixes the handling of this option, and adds some test coverage 馃槃
Are there any user-facing changes?
This feature should now behave correctly