Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: .tuples() #971

Open
WhyNotHugo opened this issue Aug 20, 2022 · 4 comments
Open

Feature request: .tuples() #971

WhyNotHugo opened this issue Aug 20, 2022 · 4 comments

Comments

@WhyNotHugo
Copy link

I see tuples() listed in Probably applicable but did not yet open issues

I'd find this very useful. I'm re-encoding images from LE-XRBG to BE-RGB:

    let mmap: memmap2::Mmap = ...
    let mut image_data = vec![0u8; mmap.len() * 3 / 4];
    mmap.iter()
        .tuples()
        .zip(image_data.iter_mut().tuples())
        .for_each(|((b, g, r, _), (r2, g2, b2))| (*r2, *g2, *b2) = (*r, *g, *b));

This can clearly be parallelised quite a bit, but I don't see any obvious way of writing it without tuples().

@cuviper
Copy link
Member

cuviper commented Aug 22, 2022

The ugly part of tuples() is trying to be variadic in length, which doesn't have any language support. We deal with this a little in the IntoParallelIterator for tuples into a MultiZip, which we manually (macro) implement up to length 12. Itertools depends on a similar manual implementation of its own HomogeneousTuple trait.

For your particular case with slice-able data, you could also use par_chunks_exact(4) and par_chunks_exact_mut(3). You can still pattern-match that if you treat it as a refutable pattern within your for_each, like if let ([b, g, r, _], [r2, g2, b2]) {...}, even though that if let will never fail in practice. Eventually it can be a truly irrefutable argument pattern with arrays, #789, but I've been hoping that the standard library would stabilize the regular Iterator API for that first.

@cuviper
Copy link
Member

cuviper commented Aug 22, 2022

I'm re-encoding images from LE-XRBG to BE-RGB:

This can clearly be parallelised quite a bit,

FWIW, it's not necessarily true that this task will benefit from pixel-level parallelization, especially since there's little real CPU work, just data movement. If you have many images to convert, it may work better to parallelize that coarsely with each image working serially, but ultimately you're going to be limited by I/O and memory bandwidth.

@WhyNotHugo
Copy link
Author

I tend to agree with that last statement; it seems to be mostly memory IO limited (rather than CPU), so it might be a pointless optimisation. FWIW, the original code can also be re-written as:

    let mut image_data = vec![0u8; mmap.len() * 3 / 4];
    mmap.iter()
        .tuples()
        .zip(image_data.iter_mut().tuples())
        .for_each(|((b, g, r, _), (r2, g2, b2))| (*r2, *g2, *b2) = (*r, *g, *b));

Clearly it's just copying data around. I'd still be curious to benchmark with parallelisation, but performance has improved to the point where this is no longer the slowest thing.

I couldn't get chunks to work for this example, regrettably.

Shall I still leave this issue open for further discussion on .tuple()?

@cuviper
Copy link
Member

cuviper commented Aug 25, 2022

I couldn't get chunks to work for this example, regrettably.

Here's a playground with chunks_exact, par_chunks_exact, and also using unstable standard library features for array_chunks and as_chunks.

Shall I still leave this issue open for further discussion on .tuple()?

Yes, we can leave that request open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants