How to implement ParallelIterator for a custom Range? #1086

hardskulls · 2023-09-03T15:51:15Z

Hello!

I want to iterate over a range of Ts that implementNum from num_traits crate.
rayon implements ParallelIterator for ranges using private Traits and macros, but I'm not very familiar with macros and they use integer types anyway.
I wasn't able examples or tutorials for my use case.

Thank you

The text was updated successfully, but these errors were encountered:

cuviper · 2023-09-18T22:23:53Z

The smallest entry point for writing your own kind of parallel iterator is to use split, so you only have to think about one question -- how can you divide your range into two subranges?

hardskulls · 2023-10-04T08:13:18Z

Ok, but what do i do with it next?
My case looks almost exactly like the one in the docs to split, but for_each method just gives me another range, whileI want to iterate over range and use each number.
Do I use a regular iterator after I used split and for_each?

cuviper · 2023-10-04T16:03:38Z

Sure, you can use a regular iterator within for_each. Or you can flatten_iter().for_each(...) to have it give you one item at a time, or flat_map_iter if you need a more general way to convert to an iterator.

hardskulls · 2023-10-06T10:42:29Z

Ok, I changed this:

use rayon::iter::{IntoParallelIterator, ParallelIterator};

#[derive(Debug, Clone)]
pub struct NumberHash<N, H> {
    pub number: N,
    pub hash: H,
}

pub type Number = u128;

pub fn gen_range_of_nums<H, F, A, R>(start: Number, end: Number, filter: F, apply: A)
where
    F: Fn(Number) -> Option<NumberHash<Number, H>> + Sync + Send,
    A: Fn(NumberHash<Number, H>) -> R + Sync + Send,
{
    (start..=end).into_par_iter().for_each(|number| {
        if let Some(num_hash) = filter(number) {
            apply(num_hash);
        }
    })
}

to this:

use rayon::iter::{IntoParallelIterator, ParallelIterator};
use std::ops::{Add, RangeInclusive};

#[derive(Debug, Clone)]
pub struct NumberHash<N, H> {
    pub number: N,
    pub hash: H,
}

pub type Number = u128;

fn splitter<N>(data: RangeInclusive<N>) -> (RangeInclusive<N>, Option<RangeInclusive<N>>)
where
    N: num_traits::Num + Copy,
{
    let (start, end) = (*data.start(), *data.end());
    let middle = end.div(N::one() + N::one());
    let next = middle + N::one();
    (start..=middle, Some(next..=end))
}

pub fn gen_range_of_nums<N, H, F, A, R>(start: N, end: N, filter: F, apply: A)
where
    N: num_traits::Num + num_traits::ToPrimitive + Add<N, Output = N> + PartialOrd + Copy + Send,
    F: Fn(N) -> Option<NumberHash<N, H>> + Sync + Send,
    A: Fn(NumberHash<N, H>) -> R + Sync + Send,
{
    rayon::iter::split(start..=end, splitter)
        .into_par_iter()
        .for_each(|sub_range| {
            let (start, end) = (*sub_range.start(), *sub_range.end());
            num_iter::range_inclusive(start, end).for_each(|number| {
                if let Some(num_hash) = filter(number) {
                    apply(num_hash);
                }
            })
        })
}

Works fine, haven't notice any performance degradation.
Wasn't able to measure it though, cause in any benches stack just blows up 🙃.

Is it OK that my splitter may return an emoty range?
Should I try to bench the new version somehow?

cuviper · 2023-10-06T18:45:24Z

Wasn't able to measure it though, cause in any benches stack just blows up 🙃.

Is it OK that my splitter may return an emoty range?

I expect these are related. If you keep splitting, then at the tail end of computation you'll have threads racing to steal each others' empty ranges, and stealing also triggers the heuristic to reset its split count and try to split even more. With a terminating case that returns (range, None), it shouldn't get so deep on the stack.

hardskulls · 2023-10-11T08:49:32Z

Yes, you are right.
I was able to run benchmarks, by using cheap dummy functions in gen_range_of_nums (plus your suggestion).
The second implementation is roughly 10 times slower, and the bigger the range is, the bigger the difference (when range is 10 times bigger the difference is roughly 10 times bigger).
The secont implementation blows up stack if I use non-dummy functions ingen_range_of_nums.

cuviper · 2023-10-11T17:45:24Z

Can you share your implementation, with enough for someone to run and reproduce that result?

hardskulls · 2023-10-13T12:59:48Z

Sure, the code is here.
I have to add that the size of ranges becomes problematic only in benching.
To reproduce the problem you need to use empty ranges in splitter in src/core/hashing/abstractions/gen_range/abstract_number.rs or use big enough range in range_generators bench.

cuviper · 2023-10-13T16:52:47Z

    let middle = end.div(N::one() + N::one());

Oh, this isn't a correct middle -- I think you want something like start + (end - start) / 2. Or you might like the Average trait, which handles cases like signed overflow where the raw distance is too large, e.g. i32::MIN..=i32::MAX.

hardskulls · 2023-10-14T14:08:09Z

Yeah, makes sense.
Tried it out.
Replaced end.div(N::one() + N::one()) with start + (end - start) / (N::one() + N::one()) in splitter.
The results are... strange.

this fix does improve performance greatly
performance is still bad though (10 times worse than non-generic version)
performance improvement does not come from making range shorter (!)
old version results in stack overflow, new version does not (it does, but much bigger ranges)
if I put middle calculations in two functions, old version generates some extra asm instructions

I wrote some tests to compare fixed and unfixed version.
When used with positive integers (exactly what I was using), the results are identical.
So the calculation of middle is correct in both versions (at least with a given constraint).

I was able to bench the two versions versus each other (old ver usually overflows the stack, but sometimes it works, just with terrible performance) and new version if far better.

However, benchmarks show that the two 'middle' lines above do not differ in performance (even though this one line changes so much in the end).
As a last resort, I compared assembly output of both versions, and figured out that old version generates some extra asm instructions.
So I came to conclusion that these extra asm instructions shit the stack.

Of course, this has nothing to do with rayon, but even with this fix, it the code is still 10 times slower.

hardskulls · 2023-10-15T09:07:48Z

Update: some of the info is incorrect, I'll check again.

hardskulls · 2023-10-15T10:20:31Z

OK, analyzing assebbly output turned out to be a bit too hard for me 😅.
I forgot that it can change.

So now I'm completely lost.
The difference is there, but have no means to detect it, except manual testing.

hardskulls · 2023-10-15T18:14:24Z

No, wait I do see the difference when I compare asm output for different splitter versions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to implement ParallelIterator for a custom Range? #1086

How to implement ParallelIterator for a custom Range? #1086

hardskulls commented Sep 3, 2023

cuviper commented Sep 18, 2023

hardskulls commented Oct 4, 2023

cuviper commented Oct 4, 2023

hardskulls commented Oct 6, 2023

cuviper commented Oct 6, 2023

hardskulls commented Oct 11, 2023

cuviper commented Oct 11, 2023

hardskulls commented Oct 13, 2023

cuviper commented Oct 13, 2023

hardskulls commented Oct 14, 2023

hardskulls commented Oct 15, 2023

hardskulls commented Oct 15, 2023

hardskulls commented Oct 15, 2023 •

edited

How to implement ParallelIterator for a custom Range? #1086

How to implement ParallelIterator for a custom Range? #1086

Comments

hardskulls commented Sep 3, 2023

cuviper commented Sep 18, 2023

hardskulls commented Oct 4, 2023

cuviper commented Oct 4, 2023

hardskulls commented Oct 6, 2023

cuviper commented Oct 6, 2023

hardskulls commented Oct 11, 2023

cuviper commented Oct 11, 2023

hardskulls commented Oct 13, 2023

cuviper commented Oct 13, 2023

hardskulls commented Oct 14, 2023

hardskulls commented Oct 15, 2023

hardskulls commented Oct 15, 2023

hardskulls commented Oct 15, 2023 • edited

hardskulls commented Oct 15, 2023 •

edited