Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split out arrow-buffer crate (#2594) #2693

Merged
merged 6 commits into from Sep 15, 2022
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
17 changes: 9 additions & 8 deletions arrow-buffer/src/buffer/mutable.rs
Expand Up @@ -395,15 +395,15 @@ impl MutableBuffer {
/// as it eliminates the conditional `Iterator::next`
#[inline]
pub fn collect_bool<F: FnMut(usize) -> bool>(len: usize, mut f: F) -> Self {
let mut buffer = Self::new(bit_util::ceil(len, 8));
let mut buffer = Self::new(bit_util::ceil(len, 64) * 8);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes it faster than it was before

eq Float32              time:   [9.2538 µs 9.2627 µs 9.2720 µs]
                        change: [-50.278% -50.193% -50.080%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

eq scalar Float32       time:   [7.2924 µs 7.2962 µs 7.3004 µs]
                        change: [-7.5896% -7.5335% -7.4762%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

neq Float32             time:   [9.1950 µs 9.1992 µs 9.2043 µs]
                        change: [-50.651% -50.607% -50.563%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  4 (4.00%) high mild
  9 (9.00%) high severe

neq scalar Float32      time:   [7.2907 µs 7.2949 µs 7.2997 µs]
                        change: [-7.3374% -7.2597% -7.1685%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

lt Float32              time:   [9.2100 µs 9.2180 µs 9.2269 µs]
                        change: [-50.470% -50.359% -50.234%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

lt scalar Float32       time:   [7.2604 µs 7.2638 µs 7.2684 µs]
                        change: [-8.4085% -8.2278% -7.9735%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

lt_eq Float32           time:   [9.2350 µs 9.2410 µs 9.2472 µs]
                        change: [-50.679% -50.603% -50.530%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  8 (8.00%) high mild
  4 (4.00%) high severe

lt_eq scalar Float32    time:   [7.2869 µs 7.2906 µs 7.2946 µs]
                        change: [-7.8178% -7.7122% -7.6141%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

gt Float32              time:   [9.1731 µs 9.1783 µs 9.1844 µs]
                        change: [-50.586% -50.501% -50.367%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

gt scalar Float32       time:   [7.2970 µs 7.2996 µs 7.3026 µs]
                        change: [-7.6238% -7.4831% -7.2758%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe

gt_eq Float32           time:   [9.1925 µs 9.1962 µs 9.2004 µs]
                        change: [-50.274% -50.223% -50.176%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  8 (8.00%) high mild
  4 (4.00%) high severe

And also makes it less reliant on the inlining whims of LLVM. This PR does still represent an ~10% performance hit versus this change applied to master, but I can live with that. We are talking 1 microsecond 😅


let chunks = len / 8;
let remainder = len % 8;
let chunks = len / 64;
let remainder = len % 64;
for chunk in 0..chunks {
let mut packed = 0;
for bit_idx in 0..8 {
let i = bit_idx + chunk * 8;
packed |= (f(i) as u8) << bit_idx;
for bit_idx in 0..64 {
let i = bit_idx + chunk * 64;
packed |= (f(i) as u64) << bit_idx;
}

// SAFETY: Already allocated sufficient capacity
Expand All @@ -413,14 +413,15 @@ impl MutableBuffer {
if remainder != 0 {
let mut packed = 0;
for bit_idx in 0..remainder {
let i = bit_idx + chunks * 8;
packed |= (f(i) as u8) << bit_idx;
let i = bit_idx + chunks * 64;
packed |= (f(i) as u64) << bit_idx;
}

// SAFETY: Already allocated sufficient capacity
unsafe { buffer.push_unchecked(packed) }
}

buffer.truncate(bit_util::ceil(len, 8));
buffer
}
}
Expand Down