Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add #[inline(always)] to find_inner #375

Merged
merged 1 commit into from Nov 20, 2022
Merged

Conversation

QuarticCat
Copy link
Contributor

@QuarticCat QuarticCat commented Nov 20, 2022

Benchmark

test clone_from_large            ... bench:       5,562 ->   5,523 ns/iter (+/- 96)
test clone_from_small            ... bench:          52 ->      52 ns/iter (+/- 0)
test clone_large                 ... bench:       5,764 ->   5,760 ns/iter (+/- 36)
test clone_small                 ... bench:          71 ->      72 ns/iter (+/- 12)
test grow_insert_ahash_highbits  ... bench:      24,918 ->  25,229 ns/iter (+/- 710)
test grow_insert_ahash_random    ... bench:      24,590 ->  24,824 ns/iter (+/- 279)
test grow_insert_ahash_serial    ... bench:      24,896 ->  24,892 ns/iter (+/- 3,497)
test grow_insert_std_highbits    ... bench:      43,482 ->  43,342 ns/iter (+/- 706)
test grow_insert_std_random      ... bench:      43,074 ->  43,674 ns/iter (+/- 592)
test grow_insert_std_serial      ... bench:      43,305 ->  43,335 ns/iter (+/- 1,059)
test insert_ahash_highbits       ... bench:      25,025 ->  25,110 ns/iter (+/- 96)
test insert_ahash_random         ... bench:      25,017 ->  25,329 ns/iter (+/- 7,387)
test insert_ahash_serial         ... bench:      24,895 ->  25,041 ns/iter (+/- 692)
test insert_erase_ahash_highbits ... bench:      25,049 ->  25,284 ns/iter (+/- 1,191)
test insert_erase_ahash_random   ... bench:      24,836 ->  24,742 ns/iter (+/- 380)
test insert_erase_ahash_serial   ... bench:      23,832 ->  23,563 ns/iter (+/- 611)
test insert_erase_std_highbits   ... bench:      43,699 ->  43,265 ns/iter (+/- 684)
test insert_erase_std_random     ... bench:      44,499 ->  44,193 ns/iter (+/- 670)
test insert_erase_std_serial     ... bench:      45,250 ->  43,969 ns/iter (+/- 544)
test insert_std_highbits         ... bench:      35,552 ->  34,444 ns/iter (+/- 739)
test insert_std_random           ... bench:      34,484 ->  34,596 ns/iter (+/- 508)
test insert_std_serial           ... bench:      34,778 ->  34,658 ns/iter (+/- 1,003)
test iter_ahash_highbits         ... bench:       1,292 ->   1,292 ns/iter (+/- 89)
test iter_ahash_random           ... bench:       1,299 ->   1,295 ns/iter (+/- 72)
test iter_ahash_serial           ... bench:       1,306 ->   1,295 ns/iter (+/- 18)
test iter_std_highbits           ... bench:       1,277 ->   1,306 ns/iter (+/- 31)
test iter_std_random             ... bench:       1,278 ->   1,295 ns/iter (+/- 30)
test iter_std_serial             ... bench:       1,303 ->   1,310 ns/iter (+/- 54)
test lookup_ahash_highbits       ... bench:       4,766 ->   4,767 ns/iter (+/- 44)
test lookup_ahash_random         ... bench:       4,735 ->   4,761 ns/iter (+/- 45)
test lookup_ahash_serial         ... bench:       4,670 ->   4,696 ns/iter (+/- 98)
test lookup_fail_ahash_highbits  ... bench:       4,438 ->   4,232 ns/iter (+/- 55)
test lookup_fail_ahash_random    ... bench:       4,233 ->   4,243 ns/iter (+/- 85)
test lookup_fail_ahash_serial    ... bench:       3,928 ->   3,970 ns/iter (+/- 307)
test lookup_fail_std_highbits    ... bench:      14,177 ->  14,927 ns/iter (+/- 165)
test lookup_fail_std_random      ... bench:      14,217 ->  15,081 ns/iter (+/- 188)
test lookup_fail_std_serial      ... bench:      14,253 ->  14,949 ns/iter (+/- 1,439)
test lookup_std_highbits         ... bench:      14,700 ->  14,797 ns/iter (+/- 940)
test lookup_std_random           ... bench:      14,759 ->  14,996 ns/iter (+/- 609)
test lookup_std_serial           ... bench:      14,784 ->  14,928 ns/iter (+/- 293)
test rehash_in_place             ... bench:     213,678 -> 218,933 ns/iter (+/- 3,012)

ASM Difference

Here's the major difference of generated assembly of HashMap::<u64, u64>::get:

image

When using #[inline], the last self.bucket(index) of find doesn't get optimized away. According to @nbdd0121, #[inline(always)] happens in MIR inlining, while #[inline] (likely) happens in LLVM inlining.

@Amanieu
Copy link
Member

Amanieu commented Nov 20, 2022

@bors r+

@bors
Copy link
Collaborator

bors commented Nov 20, 2022

📌 Commit 7d6c2a8 has been approved by Amanieu

It is now in the queue for this repository.

@bors
Copy link
Collaborator

bors commented Nov 20, 2022

⌛ Testing commit 7d6c2a8 with merge c93ef8f...

@bors
Copy link
Collaborator

bors commented Nov 20, 2022

☀️ Test successful - checks-actions
Approved by: Amanieu
Pushing c93ef8f to master...

@bors bors merged commit c93ef8f into rust-lang:master Nov 20, 2022
@QuarticCat
Copy link
Contributor Author

The code mentioned that "avoid Option::map because it bloats LLVM IR." By my tests, although option_value.map(...) does generate bad assembly, option_value.as_ref().map(...) has the same codegen as match. I haven't checked LLVM IR though. You might be interested in using this fact to simplify the code.

@Amanieu
Copy link
Member

Amanieu commented Nov 21, 2022

LLVM can optimize the code to the same as a match, that's not the problem. The issue is that using closures generates many small functions and it slows down compilation. This is noticeable when building rustc itself since it uses a lot of hashtables and each of these small functions needs to be instantiated for every K/V combination used in the code base.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants