Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't overwrite existing data on snappy decompress (#1806) #1807

Merged
merged 2 commits into from Jun 8, 2022

Conversation

tustvold
Copy link
Contributor

@tustvold tustvold commented Jun 7, 2022

Which issue does this PR close?

Closes #1806

Rationale for this change

Fixes a bug

What changes are included in this PR?

Makes it so SnappyCodec doesn't trample existing data in the passed output buffer

Are there any user-facing changes?

No

.decompress(compressed.as_slice(), &mut decompressed)
.expect("Error when decompressing");
assert_eq!(data.len(), decompressed_size);
decompressed.truncate(decompressed_size);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test was actually incorrect, it never cleared decompressed and so this was effectively just checking the same bytes it checked in the test above (except for Snappy which would truncate and trample)

@github-actions github-actions bot added the parquet Changes to the parquet crate label Jun 7, 2022
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend updating the comments on the Codec trait to be explicit that the data is appended to the end of output and not the beginning

https://github.com/apache/arrow-rs/blob/91d12ec/parquet/src/compression.rs#L52-L61

@@ -111,9 +111,10 @@ mod snappy_codec {
output_buf: &mut Vec<u8>,
) -> Result<usize> {
let len = decompress_len(input_buf)?;
output_buf.resize(len, 0);
let offset = output_buf.len();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the other impl Codec in this file and they appear to be appending to output_buf as well 👍

https://sourcegraph.com/github.com/apache/arrow-rs/-/blob/parquet/src/compression.rs?subtree=true

assert_eq!(data, decompressed.as_slice());

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through the changes in this test carefully and they look good to me 👍

@alamb
Copy link
Contributor

alamb commented Jun 7, 2022

Nice work @tustvold

@alamb alamb changed the title Don't trample existing data on snappy decompress (#1806) Don't overwrite existing data on snappy decompress (#1806) Jun 7, 2022
@tustvold tustvold merged commit 1d31d30 into apache:master Jun 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parquet Snappy Codec overwrites Existing Data in Decompression Buffer
3 participants