New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't overwrite existing data on snappy decompress (#1806) #1807
Conversation
.decompress(compressed.as_slice(), &mut decompressed) | ||
.expect("Error when decompressing"); | ||
assert_eq!(data.len(), decompressed_size); | ||
decompressed.truncate(decompressed_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test was actually incorrect, it never cleared decompressed
and so this was effectively just checking the same bytes it checked in the test above (except for Snappy which would truncate and trample)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend updating the comments on the Codec
trait to be explicit that the data is appended to the end of output
and not the beginning
https://github.com/apache/arrow-rs/blob/91d12ec/parquet/src/compression.rs#L52-L61
@@ -111,9 +111,10 @@ mod snappy_codec { | |||
output_buf: &mut Vec<u8>, | |||
) -> Result<usize> { | |||
let len = decompress_len(input_buf)?; | |||
output_buf.resize(len, 0); | |||
let offset = output_buf.len(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed the other impl Codec
in this file and they appear to be appending to output_buf
as well 👍
https://sourcegraph.com/github.com/apache/arrow-rs/-/blob/parquet/src/compression.rs?subtree=true
assert_eq!(data, decompressed.as_slice()); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went through the changes in this test carefully and they look good to me 👍
Nice work @tustvold |
Which issue does this PR close?
Closes #1806
Rationale for this change
Fixes a bug
What changes are included in this PR?
Makes it so SnappyCodec doesn't trample existing data in the passed output buffer
Are there any user-facing changes?
No