Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance of read_all and count #439

Open
duskmoon314 opened this issue May 18, 2024 · 2 comments · May be fixed by #441
Open

Performance of read_all and count #439

duskmoon314 opened this issue May 18, 2024 · 2 comments · May be fixed by #441

Comments

@duskmoon314
Copy link

I just found that read_all seems to perform worse than count. So, I'm opening this issue to see whether there is room for improvement.

The rust code I used to test is as follows. I defined two simple structs that only wrap Vec<u8> and use Deku to read [u8; 1500] into these structs.

use criterion::{black_box, criterion_group, criterion_main, Criterion};
use deku::prelude::*;

pub fn read_all_vs_count(c: &mut Criterion) {
    #[derive(DekuRead, DekuWrite)]
    pub struct AllWrapper {
        #[deku(read_all)]
        pub data: Vec<u8>,
    }

    #[derive(DekuRead, DekuWrite)]
    #[deku(ctx = "len: usize")]
    pub struct CountWrapper {
        #[deku(count = "len")]
        pub data: Vec<u8>,
    }

    c.bench_function("read_all_bytes", |b| {
        b.iter(|| AllWrapper::from_bytes(black_box((&[1; 1500], 0))))
    });

    c.bench_function("read_all", |b| {
        b.iter(|| {
            let mut cursor = [1u8; 1500].as_ref();
            let mut reader = Reader::new(&mut cursor);
            AllWrapper::from_reader_with_ctx(black_box(&mut reader), ())
        })
    });

    c.bench_function("count", |b| {
        b.iter(|| {
            let mut cursor = [1u8; 1500].as_ref();
            let mut reader = Reader::new(&mut cursor);
            CountWrapper::from_reader_with_ctx(black_box(&mut reader), 1500)
        })
    });
}

criterion_group!(benches, read_all_vs_count);
criterion_main!(benches);

The output of criterion:

read_all_bytes          time:   [55.840 µs 55.882 µs 55.924 µs]
                        change: [-0.3273% -0.2149% -0.1034%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) low mild
  1 (1.00%) high mild

read_all                time:   [57.742 µs 57.794 µs 57.856 µs]
                        change: [+0.6835% +0.8126% +0.9457%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) low mild

count                   time:   [4.2984 µs 4.2997 µs 4.3013 µs]
                        change: [+0.2270% +0.2768% +0.3244%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

The result shows that read_all takes nearly 14 times longer than count on my machine (CPU: x86-64 2.10GHz 80threads).

Dig into the code:

// read_all
let mut res = capacity.map_or_else(Vec::new, Vec::with_capacity);
loop {
    if reader.end() {
        break;
    }
    let val = <T>::from_reader_with_ctx(reader, ctx)?;
    res.push(val);
}
Ok(res)

// count
let mut res = capacity.map_or_else(Vec::new, Vec::with_capacity);
let start_read = reader.bits_read;
loop {
    let val = <T>::from_reader_with_ctx(reader, ctx)?;
    res.push(val);
    // I replace the original code here
	count -= 1;
	if count == 0 {
		break;
	}
}
Ok(res)

The code is not very different; there is just one check, count == 0, and the other, reader.end(). I wonder whether reader.end() can be improved to perform better.

I also wonder whether it is possible to read an exact number of bytes instead of the loop for Vec<u8>. But I haven't figured out how to specialize the trait implementation for Vec<u8> over the generic Vec<T>.

@wcampbell0x2a wcampbell0x2a linked a pull request May 19, 2024 that will close this issue
@wcampbell0x2a
Copy link
Collaborator

wcampbell0x2a commented May 19, 2024

Quickly wrote up a possible solution here: #441

EDIT: Eh, I need to fix the test failures.

@wcampbell0x2a
Copy link
Collaborator

wcampbell0x2a commented May 19, 2024

I also wonder whether it is possible to read an exact number of bytes instead of the loop for Vec. But I haven't figured out how to specialize the trait implementation for Vec over the generic Vec.

I think the solution here might just be the use a BufReaader, which will read larger amounts at a time instead of one byte. Syscalls wise at least.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants