Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use hashbrown::RawTable API #2765

Closed
wants to merge 1 commit into from

Conversation

tustvold
Copy link
Contributor

Which issue does this PR close?

Closes #.

Rationale for this change

The use of HashMap without a hasher nor keys, is a little perverse. It turns out hashbrown has a RawTable API that we can use directly instead. As an added bonus this provides a very minor performance improvement

string_dictionary_builder/(dict_size:20, len:1000, key_len: 5)
                        time:   [18.563 µs 18.605 µs 18.647 µs]
                        change: [-4.5058% -3.6392% -2.7326%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) low mild
  2 (2.00%) high mild
  3 (3.00%) high severe
string_dictionary_builder/(dict_size:100, len:1000, key_len: 5)
                        time:   [21.649 µs 21.693 µs 21.738 µs]
                        change: [-2.2389% -1.8649% -1.5032%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe
string_dictionary_builder/(dict_size:100, len:1000, key_len: 10)
                        time:   [21.696 µs 21.715 µs 21.734 µs]
                        change: [-2.0519% -1.8956% -1.6960%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe
string_dictionary_builder/(dict_size:100, len:10000, key_len: 10)
                        time:   [176.06 µs 176.14 µs 176.22 µs]
                        change: [-0.9251% -0.6795% -0.4273%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe
string_dictionary_builder/(dict_size:100, len:10000, key_len: 100)
                        time:   [463.94 µs 464.12 µs 464.30 µs]
                        change: [-0.1715% -0.0733% +0.0237%] (p = 0.15 > 0.05)
                        No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

What changes are included in this PR?

Switches to using hashbrown::RawTable in place of HashMap and the raw entry API

Are there any user-facing changes?

No

@github-actions github-actions bot added arrow Changes to the arrow crate parquet Changes to the parquet crate labels Sep 21, 2022
@tustvold tustvold marked this pull request as draft September 21, 2022 18:20
@tustvold
Copy link
Contributor Author

Just realised this API is marked experimental

@Dandandan
Copy link
Contributor

I think you need to explicitly enable this feature?

I also used it in the hash join implementation (and I think in hash aggregate as well):
https://github.com/apache/arrow-datafusion/blob/master/datafusion/core/Cargo.toml#L72

@tustvold
Copy link
Contributor Author

Yeah, I'm just a little bit unsure what the implications of this being an experimental API are... I'll revisit this after #2769

@tustvold tustvold closed this Oct 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants