Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bencher::black_box does not stop benchmark(s) from being optimized out #198

Open
saethlin opened this issue Sep 26, 2021 · 0 comments
Open

Comments

@saethlin
Copy link

Hi, it looks to me like the benchmarks in this crate need a bit of love. I'd like to help but I need some input.

Currently (e209a50) cargo bench slice tells me:

test extend_from_slice    ... bench:           6 ns/iter (+/- 1) = 85333 MB/s
test extend_with_slice    ... bench:           1 ns/iter (+/- 0) = 512000 MB/s

That doesn't seem right. Extending with an iterator over a slice shouldn't be faster than extending with a slice directly. Here is what perf tells me about extend_with_slice:

  1.86 │190:┌─→movups %xmm0,-0x134(%rbp)
       │    │extend::extend_with_slice::{{closure}}:
       │    │b.iter(|| {
       │    │v.clear();
       │    │let iter = data.iter().map(|&x| x);
       │    │v.extend(iter);
       │    │v[511]
       │    │  movzbl -0x125(%rbp),%eax
       │    │extend::extend_with_slice:
  8.26 │    │  mov    %al,-0x9(%rbp)
       │    │core::ptr::read_volatile:
       │    │  movzbl -0x9(%rbp),%eax
       │    │core::cmp::impls::<impl core::cmp::PartialOrd for u64>::lt:
 89.88 │    │  add    $0xffffffffffffffff,%rcx
       │    │<core::ops::range::Range<T> as core::iter::range::RangeIteratorImpl>::spec_next:
       │    └──jne    190

That looks to me like the extend is gone and this just updates some internal benchmark counter.

I cannot devise a way to prevent this optimization using bencher::black_box. If I apply the most aggressive black-boxing I can think of:

diff --git a/benches/extend.rs b/benches/extend.rs
index ba33a93..de24f57 100644
--- a/benches/extend.rs
+++ b/benches/extend.rs
@@ -37,9 +37,12 @@ fn extend_with_slice(b: &mut Bencher) {
     let mut v = ArrayVec::<u8, 512>::new();
     let data = [1; 512];
     b.iter(|| {
+        black_box(&v);
         v.clear();
+        black_box(&v);
         let iter = data.iter().map(|&x| x);
         v.extend(iter);
+        black_box(&v);
         v[511]
     });
     b.bytes = v.capacity() as u64;

If this is optimized well, we should get performance equal to extend_from_slice, but we don't. It looks to me like we end up with a loop that copies each item independently, for a ~36x regression. Ow. But if that isn't bad enough, setting codegen-units = 1 in profile.bench again lets LLVM optimize away the benchmark. And unfortunately the structure of these benchmarks forbids doing something like v = black_box(v);, but if I restructure them to accommodate that, benchmarks at dominated by bencher::black_box.

If I use the clobber-all-memory black box with this diff, no combination of codegen-units = 1, lto = true, and panic = "abort" will induce LLVM to optimize away the benchmark:

diff --git a/benches/extend.rs b/benches/extend.rs
index ba33a93..0decd87 100644
--- a/benches/extend.rs
+++ b/benches/extend.rs
@@ -1,4 +1,4 @@
-
+#![feature(bench_black_box)]
 extern crate arrayvec;
 #[macro_use] extern crate bencher;

@@ -7,7 +7,7 @@ use std::io::Write;
 use arrayvec::ArrayVec;

 use bencher::Bencher;
-use bencher::black_box;
+use core::hint::black_box;

 fn extend_with_constant(b: &mut Bencher) {
     let mut v = ArrayVec::<u8, 512>::new();
@@ -40,6 +40,7 @@ fn extend_with_slice(b: &mut Bencher) {
         v.clear();
         let iter = data.iter().map(|&x| x);
         v.extend(iter);
+        black_box(&v);
         v[511]
     });
     b.bytes = v.capacity() as u64;

So as far as I can tell, this crate needs nightly for its benchmarks to function. Does this all check out? And/or is this repo up for requiring nightly for benchmarking?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant