Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Writing of Bloom Filters #3320

Closed
tustvold opened this issue Dec 9, 2022 · 1 comment
Closed

Optimize Writing of Bloom Filters #3320

tustvold opened this issue Dec 9, 2022 · 1 comment
Labels
enhancement Any new improvement worthy of a entry in the changelog help wanted

Comments

@tustvold
Copy link
Contributor

tustvold commented Dec 9, 2022

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

#3318 highlighted that the current performance of writing bloom filters leaves a lot to be desired. This is a placeholder ticket for tracking improving this situation

Describe the solution you'd like

Describe alternatives you've considered

Additional context

@tustvold tustvold added enhancement Any new improvement worthy of a entry in the changelog help wanted labels Dec 9, 2022
tustvold added a commit to tustvold/arrow-rs that referenced this issue Dec 12, 2022
tustvold added a commit to tustvold/arrow-rs that referenced this issue Dec 12, 2022
@viirya
Copy link
Member

viirya commented Dec 15, 2022

Currently we achieve the following:

write_batch primitive/4096 values primitive with bloom filter                                                                                                                                                                          
                        time:   [3.9911 ms 4.0027 ms 4.0151 ms]                                                                                                                                                                        
                        thrpt:  [43.965 MiB/s 44.100 MiB/s 44.229 MiB/s]                                                                                                                                                               
                 change:                                                                                                                                                                                                               
                        time:   [-18.041% -17.601% -17.194%] (p = 0.00 < 0.05)                                                                                                                                                         
                        thrpt:  [+20.764% +21.360% +22.012%]                                                                                                                                                                           
                        Performance has improved.                                                                                                                                                                                      
write_batch primitive/4096 values primitive non-null with bloom filter                                                                                                                                                                 
                        time:   [3.9366 ms 3.9445 ms 3.9530 ms]                                                                                                                                                                        
                        thrpt:  [43.791 MiB/s 43.885 MiB/s 43.972 MiB/s]                                                                                                                                                               
                 change:                                                                                                                                                                                                               
                        time:   [-16.064% -15.526% -15.016%] (p = 0.00 < 0.05)                                                                                                                                                         
                        thrpt:  [+17.669% +18.379% +19.139%]                                                                                                                                                                           
                        Performance has improved.                                                                                                                                                                                      
write_batch primitive/4096 values string with bloom filter
                        time:   [778.11 µs 778.54 µs 779.02 µs]
                        thrpt:  [102.23 MiB/s 102.29 MiB/s 102.34 MiB/s]
                 change:
                        time:   [-41.408% -40.747% -40.069%] (p = 0.00 < 0.05)
                        thrpt:  [+66.857% +68.769% +70.670%]
                        Performance has improved.
write_batch primitive/4096 values string dictionary with bloom filter
                        time:   [392.89 µs 397.73 µs 403.93 µs]
                        thrpt:  [119.18 MiB/s 121.04 MiB/s 122.53 MiB/s]
                 change:
                        time:   [-35.315% -33.950% -32.450%] (p = 0.00 < 0.05)
                        thrpt:  [+48.040% +51.400% +54.596%]
                        Performance has improved.
write_batch primitive/4096 values string non-null with bloom filter
                        time:   [844.48 µs 846.53 µs 850.05 µs]
                        thrpt:  [92.534 MiB/s 92.919 MiB/s 93.145 MiB/s]
                 change:
                        time:   [-42.406% -41.573% -40.784%] (p = 0.00 < 0.05)
                        thrpt:  [+68.873% +71.153% +73.630%]
                        Performance has improved.

Maybe we can close this.

@tustvold tustvold closed this as completed Jun 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Any new improvement worthy of a entry in the changelog help wanted
Projects
None yet
Development

No branches or pull requests

2 participants