Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelizing internal mutate of elements in vector at specific indices #1046

Open
jkbch opened this issue May 11, 2023 · 1 comment
Open

Comments

@jkbch
Copy link

jkbch commented May 11, 2023

Hello, I am trying to parallelize this code:

for i in indices {
    self.rankers[i].update(post_id, score);
}

self.rankers: Vec<Ranker> is a vector of the data structure Ranker which calls the function/method
pub fn update(&mut self, id: Id, score: Score) which mutate self in Ranker.

So far i have gotten this to work:

self.rankers
    .par_iter_mut()
    .enumerate()
    .filter(|(i, _)| indices.contains(i))
    .for_each(|(_, ranker)| ranker.update(post_id, score));

But self.rankers is a very large vector so it feels wastefull to have to filter the whole vector when I only need to update a small amout of indices. Is it possible to parallelize the code without filtering the whole vector?

@cuviper
Copy link
Member

cuviper commented May 19, 2023

Maybe par_chunks_mut would be better for you, and then you can serially apply relevant indices to each parallel chunk. That could be imbalanced though if the indices might be clustered in any chunk.

If the indices are sorted, another possibility is to write a custom split performing indices.split_at(midpoint) paired with rankers.split_at_mut at the least of the right indices split. This would have the advantage of parallelizing well for any distribution of indices.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants