Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ColumnWriterImpl::write_batch_with_statistics incorrect distinct count in statistics #2016

Closed
tustvold opened this issue Jul 6, 2022 · 0 comments · Fixed by #2022
Closed
Labels
bug parquet Changes to the parquet crate

Comments

@tustvold
Copy link
Contributor

tustvold commented Jul 6, 2022

Describe the bug

Calling write_batch_with_statistics twice with a non-zero distinct count will compute the sum of the distinct counts. In most cases this will be incorrect.

Similarly calling write_batch having called write_batch_with_statistics will not clear the distinct count.

To Reproduce

Inspect code

Expected behavior

We should only set distinct count when it is known

Additional context

#2015

@tustvold tustvold added the bug label Jul 6, 2022
@tustvold tustvold changed the title Parquet Writer Distinct Count Incorrect Aggregation ColumnWriterImpl::write_batch_with_statistics incorrect distinct count Jul 6, 2022
@alamb alamb changed the title ColumnWriterImpl::write_batch_with_statistics incorrect distinct count ColumnWriterImpl::write_batch_with_statistics incorrect distinct count in statistics Jul 7, 2022
@alamb alamb added the parquet Changes to the parquet crate label Jul 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug parquet Changes to the parquet crate
Projects
None yet
2 participants