Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Added Offsets and OffsetsBuffer #1316

Merged
merged 5 commits into from Dec 10, 2022
Merged

Added Offsets and OffsetsBuffer #1316

merged 5 commits into from Dec 10, 2022

Conversation

jorgecarleitao
Copy link
Owner

@jorgecarleitao jorgecarleitao commented Dec 4, 2022

This PR is a backward incompatible change that improves performance of compute and IO by reducing the number of runtime checks performed to arrays with offsets (Binary, Utf8, List, Map).

It introduces two new structs, Offsets(Vec<O>) and OffsetsBuffer(Buffer<O>) that upheld the invariants of Arrow offsets (i.e. always contain an element and monotonically increasing).

This is expected to improve performance of:

  • Compute take
  • Compute substring
  • Compute cast
  • IO Avro read of lists
  • IO Parquet read of binary, utf8 and lists
  • IO JSON read of lists

by skipping checks of whether the offsets are well constructed.

It also removes some uses of unsafe (-42 LOC, +22 LOC)

@codecov
Copy link

codecov bot commented Dec 6, 2022

Codecov Report

Base: 83.13% // Head: 83.16% // Increases project coverage by +0.03% 🎉

Coverage data is based on head (c150c7c) compared to base (1fa497f).
Patch coverage: 87.63% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1316      +/-   ##
==========================================
+ Coverage   83.13%   83.16%   +0.03%     
==========================================
  Files         369      370       +1     
  Lines       40245    40192      -53     
==========================================
- Hits        33458    33426      -32     
+ Misses       6787     6766      -21     
Impacted Files Coverage Δ
src/array/binary/fmt.rs 92.30% <ø> (ø)
src/array/binary/from.rs 100.00% <ø> (ø)
src/array/binary/iterator.rs 75.00% <ø> (ø)
src/array/equal/binary.rs 100.00% <ø> (ø)
src/array/equal/list.rs 100.00% <ø> (ø)
src/array/equal/mod.rs 82.14% <ø> (ø)
src/array/equal/utf8.rs 100.00% <ø> (ø)
src/array/growable/utils.rs 100.00% <ø> (ø)
src/array/list/fmt.rs 94.44% <ø> (ø)
src/array/list/iterator.rs 32.14% <ø> (ø)
... and 101 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@jorgecarleitao jorgecarleitao marked this pull request as ready for review December 6, 2022 07:30
@jorgecarleitao jorgecarleitao merged commit 4ed9b91 into main Dec 10, 2022
@jorgecarleitao jorgecarleitao deleted the offsets branch December 10, 2022 05:08
ritchie46 pushed a commit to ritchie46/arrow2 that referenced this pull request Mar 29, 2023
ritchie46 pushed a commit to ritchie46/arrow2 that referenced this pull request Apr 5, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant