Split out arrow-buffer crate (#2594) #2693

tustvold · 2022-09-09T06:08:04Z

Which issue does this PR close?

Part of #2594.

Rationale for this change

Begins the process of gradually exploding this crate, to help with compile times, codesize, etc...

What changes are included in this PR?

Splits the arrow buffer types into their own crate. This is mostly mechanical but there are some breaking changes

Are there any user-facing changes?

Yes, hopefully none that are controversial

tustvold · 2022-09-09T06:09:03Z

arrow-buffer/src/alloc/mod.rs


 #[inline]
-unsafe fn null_pointer<T: NativeType>() -> NonNull<T> {
-    NonNull::new_unchecked(ALIGNMENT as *mut T)
+unsafe fn null_pointer() -> NonNull<u8> {


This is technically a breaking change, it appears to relate to early experiments by @jorgecarleitao r.e. arrow2. It isn't used anywhere, and so I think can just go. The removal is so that arrow-buffer doesn't depend on DataType

tustvold · 2022-09-09T06:11:20Z

arrow/benches/buffer_bit_ops.rs

@@ -38,15 +38,15 @@ fn create_buffer(size: usize) -> Buffer {
 }

 fn bench_buffer_and(left: &Buffer, right: &Buffer) {
-    criterion::black_box((left & right).unwrap());
+    criterion::black_box(buffer_bin_and(left, 0, right, 0, left.len() * 8));


This is the other breaking change, I opted to remove the BitAnd, etc... implementations for Buffer for two reasons:

They were very easy to accidentally forget to apply the offset

They need an error enumeration, and defining an error type in arrow-buffer solely for this case seemed overkill

tustvold · 2022-09-09T06:12:26Z

arrow/src/compute/kernels/boolean.rs

        Some(right_bitmap) => {
-            // NOTE: right values and bitmaps are combined and stay at bit offset right.offset()
-            (right.values() & &right_bitmap.bits).ok().map(|b| b.not())
+            let and = buffer_bin_and(


This logic is not only simpler now, but avoids performing work on values past the offset

Jimexist · 2022-09-09T07:55:30Z

arrow-buffer/src/buffer/mod.rs

+mod immutable;
+pub use immutable::*;
+mod mutable;
+pub use mutable::*;
+mod ops;
+mod scalar;
+pub use scalar::*;


i wonder if it makes sense to just collapse the /buffer/ folder to top level given arrow_buffer crate name gives that level of indirection already

I think as the crate still contains other public modules, like alloc, it would be confusing to lift the contents of buffer to the top-level

viirya · 2022-09-11T20:21:29Z

arrow-buffer/src/alloc/mod.rs

    unsafe {
        if size == 0 {
            null_pointer()
        } else {
-            let size = size * size_of::<T>();
+            let size = size;


Suggested change

let size = size;

viirya · 2022-09-11T20:30:28Z

arrow-buffer/src/alloc/mod.rs

 }

 /// Allocates a cache-aligned memory region of `size` bytes with uninitialized values.
 /// This is more performant than using [allocate_aligned_zeroed] when all bytes will have
 /// an unknown or non-zero value and is semantically similar to `malloc`.
-pub fn allocate_aligned<T: NativeType>(size: usize) -> NonNull<T> {
+pub fn allocate_aligned(size: usize) -> NonNull<u8> {


So the caller needs to compute the allocation size by the type T they want?

Yes, although nothing was actually using this API to allocate anything other than bytes

maybe it was also related to a time when there were fewer things that were ArrowNativeType (I can't remember but that may have changed over time)

viirya · 2022-09-11T20:31:07Z

arrow-buffer/src/alloc/mod.rs

-unsafe fn null_pointer<T: NativeType>() -> NonNull<T> {
-    NonNull::new_unchecked(ALIGNMENT as *mut T)
+unsafe fn null_pointer() -> NonNull<u8> {
+    NonNull::new_unchecked(ALIGNMENT as *mut u8)
 }

 /// Allocates a cache-aligned memory region of `size` bytes with uninitialized values.


Looks like previously the doc is not totally correct as size will be multiplied by size of T.

But actually I don't see any allocate_aligned usage with types other than u8 so far.

Indeed, I believe Jorge opted to fork off arrow2 before getting further with integrating this

arrow-buffer/src/bytes.rs

…uffer

alamb

Looks good to me. It is very exciting to see this happening

I suggest we should run some benchmarks if we haven't already done so to ensure this doesn't cause performance regressions (due to different cross crate inlining rules or something else silly)

I have also added a note to #2665 that we will have to update the release process as we add some new crates

alamb · 2022-09-14T17:34:59Z

arrow-buffer/src/alloc/mod.rs

 }

 /// Allocates a cache-aligned memory region of `size` bytes with uninitialized values.
 /// This is more performant than using [allocate_aligned_zeroed] when all bytes will have
 /// an unknown or non-zero value and is semantically similar to `malloc`.
-pub fn allocate_aligned<T: NativeType>(size: usize) -> NonNull<T> {
+pub fn allocate_aligned(size: usize) -> NonNull<u8> {


maybe it was also related to a time when there were fewer things that were ArrowNativeType (I can't remember but that may have changed over time)

alamb · 2022-09-14T17:37:15Z

arrow-buffer/src/buffer/ops.rs

-    third_offset_in_bits: usize,
-    fourth: &Buffer,
-    fourth_offset_in_bits: usize,
+pub fn bitwise_quaternary_op_helper<F>(


I wonder if we should profile with this change? It seems like it should not be any different given that buffers and offsets are sized (and thus the compiler can check and elide the bounds checks)

arrow-buffer/src/lib.rs

alamb · 2022-09-14T17:42:05Z

arrow/src/bitmap.rs

+        }
+        Ok(Bitmap::from(buffer_bin_and(
+            &self.bits,
+            0,


I see this does the same things as & (uses offset zero) but I wonder if that is correct?

The same comment applies to the other operators as well

BitMap currently doesn't have an offset, this does mean that practically this implementation is likely being used incorrectly in places, but I opted to just preserve the existing behaviour. I agree it is a bit of a footgun #1802

alamb · 2022-09-14T18:10:35Z

BTW I plan to test this change out against DataFusion

alamb · 2022-09-14T18:19:54Z

cd /Users/alamb/Software/arrow-datafusion && RUST_BACKTRACE=1 CARGO_TARGET_DIR=/Users/alamb/Software/df-target  nice cargo test --all
    Updating git repository `https://github.com/tustvold/arrow-rs.git`
    Updating git submodule `https://github.com/apache/parquet-testing.git`
    Updating git submodule `https://github.com/apache/arrow-testing`
    Updating crates.io index
   Compiling parquet v22.0.0 (https://github.com/tustvold/arrow-rs.git?rev=f47a878#f47a8784)
   Compiling arrow-buffer v22.0.0 (https://github.com/tustvold/arrow-rs.git?rev=f47a878#f47a8784)
   Compiling arrow-flight v22.0.0 (https://github.com/tustvold/arrow-rs.git?rev=f47a878#f47a8784)
   Compiling arrow v22.0.0 (https://github.com/tustvold/arrow-rs.git?rev=f47a878#f47a8784)
   Compiling fuzz-utils v0.1.0 (/Users/alamb/Software/arrow-datafusion/datafusion/core/fuzz-utils)
   Compiling datafusion-common v12.0.0 (/Users/alamb/Software/arrow-datafusion/datafusion/common)
   Compiling datafusion-expr v12.0.0 (/Users/alamb/Software/arrow-datafusion/datafusion/expr)
   Compiling datafusion-row v12.0.0 (/Users/alamb/Software/arrow-datafusion/datafusion/row)
   Compiling datafusion-physical-expr v12.0.0 (/Users/alamb/Software/arrow-datafusion/datafusion/physical-expr)
   Compiling datafusion-sql v12.0.0 (/Users/alamb/Software/arrow-datafusion/datafusion/sql)
   Compiling datafusion-jit v12.0.0 (/Users/alamb/Software/arrow-datafusion/datafusion/jit)
   Compiling datafusion-optimizer v12.0.0 (/Users/alamb/Software/arrow-datafusion/datafusion/optimizer)
   Compiling datafusion v12.0.0 (/Users/alamb/Software/arrow-datafusion/datafusion/core)
   Compiling datafusion-proto v12.0.0 (/Users/alamb/Software/arrow-datafusion/datafusion/proto)
   Compiling datafusion-benchmarks v12.0.0 (/Users/alamb/Software/arrow-datafusion/benchmarks)
   Compiling datafusion-examples v12.0.0 (/Users/alamb/Software/arrow-datafusion/datafusion-examples)

Seems to work well 👍

diff --git a/Cargo.toml b/Cargo.toml
index b5a7989d9..713c4844b 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -37,6 +37,6 @@ lto = true
 
 # TEMP
 [patch.crates-io]
-arrow = {  git = "https://github.com/apache/arrow-rs.git", rev="51466634f11b7d965ca3c912835c91e0f84a6c92"}
-parquet = {  git = "https://github.com/apache/arrow-rs.git", rev="51466634f11b7d965ca3c912835c91e0f84a6c92" }
-arrow-flight = {  git = "https://github.com/apache/arrow-rs.git", rev="51466634f11b7d965ca3c912835c91e0f84a6c92" }
+arrow = {  git = "https://github.com/tustvold/arrow-rs.git", rev="f47a878"}
+parquet = {  git = "https://github.com/tustvold/arrow-rs.git", rev="f47a878" }
+arrow-flight = {  git = "https://github.com/tustvold/arrow-rs.git", rev="f47a878" }

tustvold · 2022-09-14T18:33:34Z

eq Float32              time:   [44.005 µs 44.131 µs 44.276 µs]
                        change: [+137.97% +138.88% +139.85%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  6 (6.00%) high mild

eq scalar Float32       time:   [40.772 µs 40.865 µs 40.967 µs]
                        change: [+419.00% +420.49% +422.13%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild

Sigh... Time to play LLVM vectorisation bingo again...

alamb · 2022-09-14T18:56:14Z

Maybe it is time to sprinkle some #[inline] to get cross crate inlining

tustvold · 2022-09-14T18:57:23Z

If only it were that simple 😢

Edit: Plan to come back to this tomorrow morning, LLVM is not playing ball

jhorstmann · 2022-09-14T20:58:32Z

FWIW, I'm very in favor of this splitting up into smaller crates. We use a small abstraction layer over arrow buffers or plain vectors and this could reduce the compilation times for that usecase significantly.

Regarding benchmarks, do we actually run these with LTO? My current understanding is that without LTO the inline annotation is required for cross-crate inlining, but with LTO enabled inlining would work in more cases. I think most applications that then use these crates would be compiled with LTO.

tustvold · 2022-09-14T21:46:31Z

At least in theory LTO is only needed for cross-crate inlining of non-inline annotated functions, and even then generic functions should be inlined anyway as a consequence of monomorphisation.

The simple fix is just to move collect_bool into the comparison kernels, but I'm curious to understand what is going on as it may have implications down the line

tustvold · 2022-09-15T13:31:08Z

arrow-buffer/src/buffer/mutable.rs

@@ -395,15 +395,15 @@ impl MutableBuffer {
    /// as it eliminates the conditional `Iterator::next`
    #[inline]
    pub fn collect_bool<F: FnMut(usize) -> bool>(len: usize, mut f: F) -> Self {
-        let mut buffer = Self::new(bit_util::ceil(len, 8));
+        let mut buffer = Self::new(bit_util::ceil(len, 64) * 8);


This makes it faster than it was before

eq Float32 time: [9.2538 µs 9.2627 µs 9.2720 µs] change: [-50.278% -50.193% -50.080%] (p = 0.00 < 0.05) Performance has improved. Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) high mild 1 (1.00%) high severe eq scalar Float32 time: [7.2924 µs 7.2962 µs 7.3004 µs] change: [-7.5896% -7.5335% -7.4762%] (p = 0.00 < 0.05) Performance has improved. Found 7 outliers among 100 measurements (7.00%) 4 (4.00%) high mild 3 (3.00%) high severe neq Float32 time: [9.1950 µs 9.1992 µs 9.2043 µs] change: [-50.651% -50.607% -50.563%] (p = 0.00 < 0.05) Performance has improved. Found 13 outliers among 100 measurements (13.00%) 4 (4.00%) high mild 9 (9.00%) high severe neq scalar Float32 time: [7.2907 µs 7.2949 µs 7.2997 µs] change: [-7.3374% -7.2597% -7.1685%] (p = 0.00 < 0.05) Performance has improved. Found 7 outliers among 100 measurements (7.00%) 4 (4.00%) high mild 3 (3.00%) high severe lt Float32 time: [9.2100 µs 9.2180 µs 9.2269 µs] change: [-50.470% -50.359% -50.234%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) high mild 1 (1.00%) high severe lt scalar Float32 time: [7.2604 µs 7.2638 µs 7.2684 µs] change: [-8.4085% -8.2278% -7.9735%] (p = 0.00 < 0.05) Performance has improved. Found 5 outliers among 100 measurements (5.00%) 2 (2.00%) high mild 3 (3.00%) high severe lt_eq Float32 time: [9.2350 µs 9.2410 µs 9.2472 µs] change: [-50.679% -50.603% -50.530%] (p = 0.00 < 0.05) Performance has improved. Found 12 outliers among 100 measurements (12.00%) 8 (8.00%) high mild 4 (4.00%) high severe lt_eq scalar Float32 time: [7.2869 µs 7.2906 µs 7.2946 µs] change: [-7.8178% -7.7122% -7.6141%] (p = 0.00 < 0.05) Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild gt Float32 time: [9.1731 µs 9.1783 µs 9.1844 µs] change: [-50.586% -50.501% -50.367%] (p = 0.00 < 0.05) Performance has improved. Found 4 outliers among 100 measurements (4.00%) 2 (2.00%) high mild 2 (2.00%) high severe gt scalar Float32 time: [7.2970 µs 7.2996 µs 7.3026 µs] change: [-7.6238% -7.4831% -7.2758%] (p = 0.00 < 0.05) Performance has improved. Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) high mild 3 (3.00%) high severe gt_eq Float32 time: [9.1925 µs 9.1962 µs 9.2004 µs] change: [-50.274% -50.223% -50.176%] (p = 0.00 < 0.05) Performance has improved. Found 12 outliers among 100 measurements (12.00%) 8 (8.00%) high mild 4 (4.00%) high severe

And also makes it less reliant on the inlining whims of LLVM. This PR does still represent an ~10% performance hit versus this change applied to master, but I can live with that. We are talking 1 microsecond 😅

tustvold · 2022-09-15T13:42:56Z

Docs failure is related to rust-lang/rust#101844, it runs fine on an older nightly

ursabot · 2022-09-15T14:41:50Z

Benchmark runs are scheduled for baseline = 7594db6 and contender = fb01656. fb01656 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Split out arrow-buffer crate (apache#2594)

b9cc1fc

tustvold added the api-change Changes to the arrow API label Sep 9, 2022

github-actions bot added the arrow Changes to the arrow crate label Sep 9, 2022

tustvold commented Sep 9, 2022

View reviewed changes

Fix doc

972b802

tustvold mentioned this pull request Sep 9, 2022

Split up Arrow Crate #2594

Closed

Jimexist reviewed Sep 9, 2022

View reviewed changes

tustvold requested a review from viirya September 10, 2022 20:46

viirya reviewed Sep 11, 2022

View reviewed changes

tustvold added 2 commits September 12, 2022 17:05

Review feedback

49485ac

Merge remote-tracking branch 'upstream/master' into split-out-arrow-b…

f47a878

…uffer

tustvold requested a review from viirya September 12, 2022 17:54

alamb mentioned this pull request Sep 14, 2022

Release Arrow 23.0.0 (next release after 22.0.0) #2665

Closed

8 tasks

alamb approved these changes Sep 14, 2022

View reviewed changes

Review feedback

19287a9

alamb mentioned this pull request Sep 14, 2022

Upgrade to arrow 23.0.0 apache/datafusion#3483

Merged

Use 64-bit wide collect_bool

0a2e302

tustvold commented Sep 15, 2022

View reviewed changes

tustvold merged commit fb01656 into apache:master Sep 15, 2022

This was referenced Sep 15, 2022

Split out arrow-schema (#2594) #2711

Merged

Docs CI test is broken with latest nightly #2733

Closed

tustvold mentioned this pull request Sep 15, 2022

Partially flatten arrow-buffer #2737

Merged

alamb mentioned this pull request Sep 16, 2022

Fix verify_release_candidate.sh for new arrow subcrates #2752

Merged

tustvold mentioned this pull request Nov 1, 2022

"or" kernel applied to slices may be invalid on later operations #1498

Closed

tustvold mentioned this pull request Dec 7, 2023

Remove SIMD Feature #5184

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split out arrow-buffer crate (#2594) #2693

Split out arrow-buffer crate (#2594) #2693

tustvold commented Sep 9, 2022

tustvold Sep 9, 2022

tustvold Sep 9, 2022

tustvold Sep 9, 2022

Jimexist Sep 9, 2022

tustvold Sep 9, 2022

viirya Sep 11, 2022

viirya Sep 11, 2022

tustvold Sep 12, 2022

alamb Sep 14, 2022

viirya Sep 11, 2022

viirya Sep 11, 2022

tustvold Sep 12, 2022

alamb left a comment

alamb Sep 14, 2022

alamb Sep 14, 2022

alamb Sep 14, 2022

alamb Sep 14, 2022

tustvold Sep 14, 2022

alamb commented Sep 14, 2022

alamb commented Sep 14, 2022

tustvold commented Sep 14, 2022

alamb commented Sep 14, 2022

tustvold commented Sep 14, 2022 •

edited

jhorstmann commented Sep 14, 2022

tustvold commented Sep 14, 2022

tustvold Sep 15, 2022

tustvold commented Sep 15, 2022

ursabot commented Sep 15, 2022

Split out arrow-buffer crate (#2594) #2693

Split out arrow-buffer crate (#2594) #2693

Conversation

tustvold commented Sep 9, 2022

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alamb commented Sep 14, 2022

alamb commented Sep 14, 2022

tustvold commented Sep 14, 2022

alamb commented Sep 14, 2022

tustvold commented Sep 14, 2022 • edited

jhorstmann commented Sep 14, 2022

tustvold commented Sep 14, 2022

Choose a reason for hiding this comment

tustvold commented Sep 15, 2022

ursabot commented Sep 15, 2022

tustvold commented Sep 14, 2022 •

edited