Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Improved ZipValidity iterators #1284

Merged
merged 2 commits into from Nov 13, 2022

Conversation

ritchie46
Copy link
Collaborator

Due to slicing it is pretty common to have an array with a validity that has a null count of 0.

Given that we have cached the null count, we can use that information to determine if we need to take Optional branch in ZipValidity. This PR ensures that we can take the Required branch in case the null_count/unset_bits of a validity is 0.

@codecov
Copy link

codecov bot commented Oct 29, 2022

Codecov Report

Base: 83.07% // Head: 83.03% // Decreases project coverage by -0.04% ⚠️

Coverage data is based on head (d02e3a9) compared to base (e106cff).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1284      +/-   ##
==========================================
- Coverage   83.07%   83.03%   -0.05%     
==========================================
  Files         363      364       +1     
  Lines       38499    39240     +741     
==========================================
+ Hits        31982    32581     +599     
- Misses       6517     6659     +142     
Impacted Files Coverage Δ
src/array/binary/mod.rs 91.55% <ø> (+1.33%) ⬆️
src/array/boolean/iterator.rs 72.72% <100.00%> (+1.29%) ⬆️
src/array/boolean/mod.rs 81.30% <100.00%> (-0.18%) ⬇️
src/array/dictionary/mod.rs 89.88% <100.00%> (-0.18%) ⬇️
src/array/fixed_size_binary/iterator.rs 72.72% <100.00%> (-3.28%) ⬇️
src/array/fixed_size_list/iterator.rs 60.00% <100.00%> (-6.67%) ⬇️
src/array/list/iterator.rs 32.14% <100.00%> (-34.53%) ⬇️
src/array/map/iterator.rs 60.52% <100.00%> (-2.89%) ⬇️
src/array/primitive/iterator.rs 68.42% <100.00%> (+1.75%) ⬆️
src/array/primitive/mod.rs 81.56% <100.00%> (-0.17%) ⬇️
... and 37 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Owner

@jorgecarleitao jorgecarleitao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

This hints that we should check unsets on slices so that we do not keep the ref count on empty validities.

Regardless, this is a good optimization. I left a comment with a small proposal

src/bitmap/utils/zip_validity.rs Outdated Show resolved Hide resolved
@ritchie46
Copy link
Collaborator Author

I have added a ZipValidity::new_with_validity so that the

array.validity().and_then(|validity| (validity.unset_bits() > 0).then(|| validity.iter()))

branch can be done in a single place and is unlikely to be forgotten if we have designated constructor.

@ritchie46 ritchie46 force-pushed the zipvalidity_nulls branch 2 times, most recently from a1cc908 to e14cbe2 Compare November 2, 2022 08:45
@jorgecarleitao jorgecarleitao added the enhancement An improvement to an existing feature label Nov 13, 2022
@jorgecarleitao jorgecarleitao merged commit c23d813 into jorgecarleitao:main Nov 13, 2022
ritchie46 added a commit to ritchie46/arrow2 that referenced this pull request Mar 29, 2023
ritchie46 added a commit to ritchie46/arrow2 that referenced this pull request Apr 5, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement An improvement to an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants