Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support nullable indices in boolean take kernel and some optimizations #2064

Merged
merged 4 commits into from Jul 16, 2022

Conversation

jhorstmann
Copy link
Contributor

Which issue does this PR close?

Closes #2057.

Rationale for this change

What changes are included in this PR?

  • Use the PrimitiveIter to iterate over indices, which makes it easier to handle nullable indices.
  • Split up the kernel into separate logic for taking the values and the validity bitmap. This reduces branching in the inner loop and thus increases the performance.

Are there any user-facing changes?

no

Benchmark results compared to master

Gnuplot not found, using plotters backend
take bool 512           time:   [1.3921 us 1.3927 us 1.3933 us]                           
                        change: [+3.7963% +3.9074% +3.9940%] (p = 0.00 < 0.05)
                        Performance has regressed.

take bool 1024          time:   [3.6781 us 3.8022 us 3.9126 us]                            
                        change: [-33.173% -31.574% -30.047%] (p = 0.00 < 0.05)
                        Performance has improved.

take bool 4096          time:   [10.833 us 10.878 us 10.924 us]                            
                        change: [-67.171% -67.042% -66.917%] (p = 0.00 < 0.05)
                        Performance has improved.

take bool nulls 512     time:   [1.4058 us 1.4067 us 1.4077 us]                                 
                        change: [+4.1519% +4.3855% +4.6259%] (p = 0.00 < 0.05)
                        Performance has regressed.

take bool nulls 1024    time:   [2.5999 us 2.6011 us 2.6027 us]                                  
                        change: [-31.505% -31.418% -31.333%] (p = 0.00 < 0.05)
                        Performance has improved.

take bool nulls 4096    time:   [9.7339 us 9.7377 us 9.7420 us]                                  
                        change: [-54.586% -54.558% -54.531%] (p = 0.00 < 0.05)
                        Performance has improved.

There is a small regression for very small inputs, which will probably be alleviated by #1857. Larger inputs see a big improvement.

let index = ToPrimitive::to_usize(&indices.value(i)).ok_or_else(|| {
ArrowError::ComputeError("Cast to usize failed".to_string())
})?;
indices
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to lose the special case of no null values, I would have naively thought this would make a non-trivial performance difference for the very common case where there aren't any nulls?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also a bit surprised, we are probably trading the bounds check for indices.value(i), which is avoided by the iterator, vs. the added null check. I can try adding another variant without nulls and benchmark that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I basically wonder if it isn't showing up as a regression as separating out the null mask handling makes up for it. However, perhaps we can make it even faster by adding the special case back 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm planning to look at iteration in general in #1857 and ideally convince the compiler to do the specialization itself.

@tustvold tustvold merged commit a6379fe into apache:master Jul 16, 2022
@ursabot
Copy link

ursabot commented Jul 16, 2022

Benchmark runs are scheduled for baseline = 72dada6 and contender = a6379fe. a6379fe is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Boolean take kernel does not handle null indices correctly
3 participants