New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatically grow parquet BitWriter (#2226) (~10% faster) #2231
Conversation
@@ -484,15 +452,6 @@ mod tests { | |||
test_internal_roundtrip_underflow(Encoding::RLE, &levels, max_level, true); | |||
} | |||
|
|||
#[test] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test no longer makes sense, as the Vec can just grow
f124959
to
2383ede
Compare
self.buffered_values = 0; | ||
self.byte_offset = self.start; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could perhaps be considered a behaviour change, previously a start
parameter was passed in the constructor and resetting the writer would reset to this offset. In practice this functionality was never being used, and I found it very confusing, so I just removed it
} | ||
|
||
impl BitWriter { | ||
pub fn new(max_bytes: usize) -> Self { | ||
Self { | ||
buffer: vec![0; max_bytes], | ||
max_bytes, | ||
buffer: Vec::with_capacity(max_bytes), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An obvious benefit from this is we no longer zero-initialize this data, which may improve performance (will run benchmarks tomorrow)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, as I reviewed the rest of this PR, it may means that the hotpath has at least one fewer error checks
@@ -138,7 +137,7 @@ where | |||
/// This function should be removed after | |||
/// [`int_roundings`](https://github.com/rust-lang/rust/issues/88581) is stable. | |||
#[inline] | |||
pub fn ceil(value: i64, divisor: i64) -> i64 { | |||
pub fn ceil<T: num::Integer>(value: T, divisor: T) -> T { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drive by cleanup to eliminate some unnecessary casting
@@ -658,20 +658,8 @@ pub(crate) mod private { | |||
_: &mut W, | |||
bit_writer: &mut BitWriter, | |||
) -> Result<()> { | |||
if bit_writer.bytes_written() + values.len() / 8 >= bit_writer.capacity() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a little bit funky, bool::encode would actually increase the BitWriter capacity instead of erroring 😅 This is now done automatically everywhere
Codecov Report
@@ Coverage Diff @@
## master #2231 +/- ##
==========================================
- Coverage 82.52% 82.29% -0.24%
==========================================
Files 240 241 +1
Lines 62267 62354 +87
==========================================
- Hits 51387 51312 -75
- Misses 10880 11042 +162
Help us with your feedback. Take ten seconds to tell us how you rate us. |
Benchmarks
So a roughly 10% performance improvement, likely from not needing to zero-allocate arrays |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -127,12 +126,11 @@ impl<T: DataType> DictEncoder<T> { | |||
/// the result. | |||
pub fn write_indices(&mut self) -> Result<ByteBufferPtr> { | |||
let buffer_len = self.estimated_data_encoded_size(); | |||
let mut buffer = vec![0; buffer_len]; | |||
buffer[0] = self.bit_width() as u8; | |||
let mut buffer = Vec::with_capacity(buffer_len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
self.encoder = Some(RleEncoder::new(1, DEFAULT_RLE_BUFFER_LEN)); | ||
} | ||
let rle_encoder = self.encoder.as_mut().unwrap(); | ||
let rle_encoder = self.encoder.get_or_insert_with(|| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let len = (buf.len() as i32).to_le(); | ||
let len_bytes = len.as_bytes(); | ||
let mut encoded_data = vec![]; | ||
encoded_data.extend_from_slice(len_bytes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this part of the speed increase -- avoid copying encoded bytes and instead reserve the space for the length up front and updated it at the end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potentially, I suspect it might have contributed
Edit: This is actually only used for encoding booleans, which I don't think the benchmarks actually cover
@@ -873,7 +869,7 @@ mod tests { | |||
let mut values = vec![]; | |||
values.extend_from_slice(&[true; 16]); | |||
values.extend_from_slice(&[false; 16]); | |||
run_test::<BoolType>(Encoding::RLE, -1, &values, 0, 2, 0); | |||
run_test::<BoolType>(Encoding::RLE, -1, &values, 0, 6, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did this change from 2
to 6
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we reserve space for the length up front now
parquet/src/encodings/levels.rs
Outdated
mem::size_of::<i32>(), | ||
)), | ||
Encoding::RLE => { | ||
buffer.extend_from_slice(&[0; 8]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
buffer.extend_from_slice(&[0; 8]); | |
// reserve space for length | |
buffer.extend_from_slice(&[0; 8]); |
Is that what this initial 0 is for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, although it occurs to me that this allocated 8 bytes instead of 4... Something isn't right here 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this was working because the extra 4 zero bytes would be interpreted as empty RLE runs, I will fix this and add a test that would have caught it
} | ||
|
||
impl BitWriter { | ||
pub fn new(max_bytes: usize) -> Self { | ||
Self { | ||
buffer: vec![0; max_bytes], | ||
max_bytes, | ||
buffer: Vec::with_capacity(max_bytes), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, as I reviewed the rest of this PR, it may means that the hotpath has at least one fewer error checks
parquet/src/util/bit_util.rs
Outdated
&mut self.buffer[self.byte_offset..], | ||
); | ||
let num_bytes = ceil(self.bit_offset, 8); | ||
let slice = &self.buffered_values.to_ne_bytes()[..num_bytes as usize]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I double checked that to_ne_bytes()
does the same as memcpy_value
in terms of byte order. I think it does
self.buffered_values = 0; | ||
if let Some(remaining) = self.bit_offset.checked_sub(64) { | ||
self.buffer | ||
.extend_from_slice(&self.buffered_values.to_le_bytes()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this path use to_le_bytes
and the path above use to_ne_bytes
? Maybe they should all be to_le_bytes
for consistency (and portability?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, they should all use to_le_bytes
@@ -846,16 +760,16 @@ mod tests { | |||
fn test_bit_reader_skip() { | |||
let buffer = vec![255, 0]; | |||
let mut bit_reader = BitReader::from(buffer); | |||
let skipped = bit_reader.skip(1,1); | |||
let skipped = bit_reader.skip(1, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is strange that cargo fmt
seems to have started caring about this file 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I have no idea either. Perhaps my IDE runs it with some more aggressive setting 🤔
writer.put_aligned(42, 4); | ||
writer.put_aligned_offset(0x10, 1, old_offset); | ||
let result = writer.consume(); | ||
assert_eq!(result.as_ref(), [0x10, 42, 0, 0, 0]); | ||
|
||
writer = BitWriter::new(4); | ||
let result = writer.skip(5); | ||
assert!(result.is_err()); | ||
assert_eq!(result, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is ok now because the buffer automatically grows, right?
Benchmark runs are scheduled for baseline = 6c3f9a2 and contender = 99ad915. 99ad915 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #2226
Rationale for this change
This makes the BitWriter significantly easier to use, simplifies the implementation, and will allow removing a lot of fallibility from the encode path
What changes are included in this PR?
Alters BitWriter to use a growable Vec instead of a fixed size Vec
Are there any user-facing changes?
No, both the encoding module and the util module are marked experimental