Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rayon parallel iterators execute serially #383

Open
farnoy opened this issue Jan 8, 2023 · 4 comments
Open

rayon parallel iterators execute serially #383

farnoy opened this issue Jan 8, 2023 · 4 comments

Comments

@farnoy
Copy link

farnoy commented Jan 8, 2023

I tried to use a HashMap::par_iter() and was surprised to see that it does not execute in parallel. Is this something that can be fixed, or documented somewhere? As it stands, I'm not sure why I would ever use par_iter over par_bridge if I wanted parallel iteration.

My repro case:

let container: Vec<i32> = (0..5).into_iter().collect();

container.par_iter().for_each(|_| {
    let _span = tracy_client::span!("vec");
    std::thread::sleep_ms(10);
});

let container: HashSet<i32> = (0..5).into_iter().collect();

container.par_iter().for_each(|_| {
    let _span = tracy_client::span!("hashbrown");
    std::thread::sleep_ms(10);
});

container.iter().par_bridge().for_each(|_| {
    let _span = tracy_client::span!("hashbrown par_bridge");
    std::thread::sleep_ms(10);
});

image

This is on 0.13.1 with the rayon feature flag enabled.

@Amanieu
Copy link
Member

Amanieu commented Jan 8, 2023

Due to the structure of the table, the parallel iterator won't split groups of 16 elements across threads. If you increase the number of elements in the table then you will see parallel execution.

@farnoy
Copy link
Author

farnoy commented Jan 8, 2023

Interesting, is the granularity always 16 consecutive elements in iteration order?

Is this a hard limitation, or a consequence of the current implementation? It would be great if it can be made more granular, unless there are trade offs I'm not aware of.

@Amanieu
Copy link
Member

Amanieu commented Jan 8, 2023

The granularity comes from the group width which is 16 elements on x86 because that is the number of bytes in a 128-bit SSE regsiter. On other platforms is it 8 elements.

@JayXon
Copy link

JayXon commented Apr 4, 2023

I was affected by this issue, and ended up doing stuff like this to fix it

if map.len() < THRESHOLD {
    Either::Left(map.iter().par_bridge())
} else {
    Either::Right(map.par_iter())
}

Please consider having something similar built into the IntoParallelIterator implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants