Add `#[inline(always)]` to `find_inner` #375

QuarticCat · 2022-11-20T08:17:36Z

Benchmark

test clone_from_large            ... bench:       5,562 ->   5,523 ns/iter (+/- 96)
test clone_from_small            ... bench:          52 ->      52 ns/iter (+/- 0)
test clone_large                 ... bench:       5,764 ->   5,760 ns/iter (+/- 36)
test clone_small                 ... bench:          71 ->      72 ns/iter (+/- 12)
test grow_insert_ahash_highbits  ... bench:      24,918 ->  25,229 ns/iter (+/- 710)
test grow_insert_ahash_random    ... bench:      24,590 ->  24,824 ns/iter (+/- 279)
test grow_insert_ahash_serial    ... bench:      24,896 ->  24,892 ns/iter (+/- 3,497)
test grow_insert_std_highbits    ... bench:      43,482 ->  43,342 ns/iter (+/- 706)
test grow_insert_std_random      ... bench:      43,074 ->  43,674 ns/iter (+/- 592)
test grow_insert_std_serial      ... bench:      43,305 ->  43,335 ns/iter (+/- 1,059)
test insert_ahash_highbits       ... bench:      25,025 ->  25,110 ns/iter (+/- 96)
test insert_ahash_random         ... bench:      25,017 ->  25,329 ns/iter (+/- 7,387)
test insert_ahash_serial         ... bench:      24,895 ->  25,041 ns/iter (+/- 692)
test insert_erase_ahash_highbits ... bench:      25,049 ->  25,284 ns/iter (+/- 1,191)
test insert_erase_ahash_random   ... bench:      24,836 ->  24,742 ns/iter (+/- 380)
test insert_erase_ahash_serial   ... bench:      23,832 ->  23,563 ns/iter (+/- 611)
test insert_erase_std_highbits   ... bench:      43,699 ->  43,265 ns/iter (+/- 684)
test insert_erase_std_random     ... bench:      44,499 ->  44,193 ns/iter (+/- 670)
test insert_erase_std_serial     ... bench:      45,250 ->  43,969 ns/iter (+/- 544)
test insert_std_highbits         ... bench:      35,552 ->  34,444 ns/iter (+/- 739)
test insert_std_random           ... bench:      34,484 ->  34,596 ns/iter (+/- 508)
test insert_std_serial           ... bench:      34,778 ->  34,658 ns/iter (+/- 1,003)
test iter_ahash_highbits         ... bench:       1,292 ->   1,292 ns/iter (+/- 89)
test iter_ahash_random           ... bench:       1,299 ->   1,295 ns/iter (+/- 72)
test iter_ahash_serial           ... bench:       1,306 ->   1,295 ns/iter (+/- 18)
test iter_std_highbits           ... bench:       1,277 ->   1,306 ns/iter (+/- 31)
test iter_std_random             ... bench:       1,278 ->   1,295 ns/iter (+/- 30)
test iter_std_serial             ... bench:       1,303 ->   1,310 ns/iter (+/- 54)
test lookup_ahash_highbits       ... bench:       4,766 ->   4,767 ns/iter (+/- 44)
test lookup_ahash_random         ... bench:       4,735 ->   4,761 ns/iter (+/- 45)
test lookup_ahash_serial         ... bench:       4,670 ->   4,696 ns/iter (+/- 98)
test lookup_fail_ahash_highbits  ... bench:       4,438 ->   4,232 ns/iter (+/- 55)
test lookup_fail_ahash_random    ... bench:       4,233 ->   4,243 ns/iter (+/- 85)
test lookup_fail_ahash_serial    ... bench:       3,928 ->   3,970 ns/iter (+/- 307)
test lookup_fail_std_highbits    ... bench:      14,177 ->  14,927 ns/iter (+/- 165)
test lookup_fail_std_random      ... bench:      14,217 ->  15,081 ns/iter (+/- 188)
test lookup_fail_std_serial      ... bench:      14,253 ->  14,949 ns/iter (+/- 1,439)
test lookup_std_highbits         ... bench:      14,700 ->  14,797 ns/iter (+/- 940)
test lookup_std_random           ... bench:      14,759 ->  14,996 ns/iter (+/- 609)
test lookup_std_serial           ... bench:      14,784 ->  14,928 ns/iter (+/- 293)
test rehash_in_place             ... bench:     213,678 -> 218,933 ns/iter (+/- 3,012)

ASM Difference

Here's the major difference of generated assembly of HashMap::<u64, u64>::get:

When using #[inline], the last self.bucket(index) of find doesn't get optimized away. According to @nbdd0121, #[inline(always)] happens in MIR inlining, while #[inline] (likely) happens in LLVM inlining.

Amanieu · 2022-11-20T20:13:12Z

@bors r+

bors · 2022-11-20T20:13:14Z

📌 Commit 7d6c2a8 has been approved by Amanieu

It is now in the queue for this repository.

bors · 2022-11-20T20:13:22Z

⌛ Testing commit 7d6c2a8 with merge c93ef8f...

bors · 2022-11-20T20:26:00Z

☀️ Test successful - checks-actions
Approved by: Amanieu
Pushing c93ef8f to master...

QuarticCat · 2022-11-21T01:55:24Z

The code mentioned that "avoid Option::map because it bloats LLVM IR." By my tests, although option_value.map(...) does generate bad assembly, option_value.as_ref().map(...) has the same codegen as match. I haven't checked LLVM IR though. You might be interested in using this fact to simplify the code.

Amanieu · 2022-11-21T02:37:09Z

LLVM can optimize the code to the same as a match, that's not the problem. The issue is that using closures generates many small functions and it slows down compilation. This is noticeable when building rustc itself since it uses a lot of hashtables and each of these small functions needs to be instantiated for every K/V combination used in the code base.

QuarticCat force-pushed the feedback branch from 7f0a32d to 14ef985 Compare November 20, 2022 08:24

Add #[inline(always)] to find_inner

7d6c2a8

QuarticCat force-pushed the feedback branch from 14ef985 to 7d6c2a8 Compare November 20, 2022 08:25

bors merged commit c93ef8f into rust-lang:master Nov 20, 2022

QuarticCat mentioned this pull request Dec 3, 2022

feat(lang/optimizations.md): add lang/optimizations.md OI-wiki/OI-wiki#4472

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `#[inline(always)]` to `find_inner` #375

Add `#[inline(always)]` to `find_inner` #375

QuarticCat commented Nov 20, 2022 •

edited

Amanieu commented Nov 20, 2022

bors commented Nov 20, 2022

bors commented Nov 20, 2022

bors commented Nov 20, 2022

QuarticCat commented Nov 21, 2022

Amanieu commented Nov 21, 2022

Add #[inline(always)] to find_inner #375

Add #[inline(always)] to find_inner #375

Conversation

QuarticCat commented Nov 20, 2022 • edited

Benchmark

ASM Difference

Amanieu commented Nov 20, 2022

bors commented Nov 20, 2022

bors commented Nov 20, 2022

bors commented Nov 20, 2022

QuarticCat commented Nov 21, 2022

Amanieu commented Nov 21, 2022

Add `#[inline(always)]` to `find_inner` #375

Add `#[inline(always)]` to `find_inner` #375

QuarticCat commented Nov 20, 2022 •

edited