Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate ndarray-parallel and make rayon an optional feature #563

Merged
merged 24 commits into from Dec 3, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
ce45726
FEAT: Add ndarray::parallel module and optional feature rayon
bluss Nov 24, 2018
f59969a
FEAT: Parallel, make the extension array and zip methods inherent
bluss Nov 24, 2018
a4d60e7
FEAT: Switch parallel to rayon's IntoParallelIterator traits
bluss Nov 24, 2018
9fbb90f
DOC: Update docs for integrated ndarray::parallel
bluss Nov 24, 2018
4317698
DOC: Update parallel doc
bluss Nov 24, 2018
1da56d8
FIX: Use where clause in Zip's parallel methods
bluss Nov 24, 2018
49f79f4
FIX: Use where clauses in Zip's regular methods
bluss Nov 24, 2018
cd7b6c8
FIX: Use where clauses in Zip's parallel trait impls
bluss Nov 24, 2018
3c6df12
DOC: Clarify parallel doc
bluss Nov 24, 2018
e8bd68b
DOC: Add more notices about crate feature rayon
bluss Nov 24, 2018
8a868eb
TEST: Move in all the tests from parallel
bluss Nov 24, 2018
c8febb0
MAINT: Add rayon to docs features: show in docs.rs and run on travis
bluss Nov 24, 2018
81b0906
MAINT: Drop old parallel crate from travis tests
bluss Nov 24, 2018
68b751e
TEST: Move benchmarks from parallel to main crate
bluss Nov 24, 2018
82f5cb7
FIX: Use $crate in par_azip and add to the ndarray::parallel prelude
bluss Nov 30, 2018
566d28b
DOC: Minor edits to ndarray::parallel docs
bluss Dec 1, 2018
6f1108c
FIX: Rename ext_traits to impl_par_methods
bluss Dec 3, 2018
219dab0
DOC: Improve docs for the parallel array methods
bluss Dec 3, 2018
558a2d2
FIX: Move parallel ArrayBase methods up close to the other mapping me…
bluss Dec 3, 2018
74aa19e
FIX: Remove unused $name parameter in parallel macro
bluss Dec 3, 2018
e53c273
DOC: Use backticks in ndarray::parallel mod docs
bluss Dec 3, 2018
d19585e
DOC: Edit doc for ndarray::parallel::prelude
bluss Dec 3, 2018
b3c8fe7
DOC: Update main module doc and readme for ndarray::parallel
bluss Dec 3, 2018
b677c77
TEST: Drop num_cpus as benchmark dev-dependency
bluss Dec 3, 2018
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
4 changes: 3 additions & 1 deletion Cargo.toml
Expand Up @@ -31,6 +31,8 @@ num-traits = "0.2"
num-complex = "0.2"
itertools = { version = "0.7.0", default-features = false }

rayon = { version = "1.0.3", optional = true }

# Use via the `blas` crate feature!
cblas-sys = { version = "0.1.4", optional = true, default-features = false }
blas-src = { version = "0.2.0", optional = true, default-features = false }
Expand All @@ -57,7 +59,7 @@ test-blas-openblas-sys = ["blas"]
test = ["test-blas-openblas-sys"]

# This feature is used for docs
docs = ["serde-1"]
docs = ["serde-1", "rayon"]

[profile.release]
[profile.bench]
Expand Down
5 changes: 5 additions & 0 deletions README.rst
Expand Up @@ -52,6 +52,11 @@ your `Cargo.toml`.
- Optional, compatible with Rust stable
- Enables serialization support for serde 1.0

- ``rayon``

- Optional, compatible with Rust stable
- Enables parallel iterators, parallelized methods and ``par_azip!``.

- ``blas``

- Optional and experimental, compatible with Rust stable
Expand Down
157 changes: 157 additions & 0 deletions benches/par_rayon.rs
@@ -0,0 +1,157 @@
#![cfg(feature="rayon")]
#![feature(test)]

extern crate rayon;

extern crate ndarray;
extern crate itertools;

use ndarray::prelude::*;
use ndarray::parallel::prelude::*;

extern crate test;
use test::Bencher;

use ndarray::Zip;

const EXP_N: usize = 256;
const ADDN: usize = 512;

use std::cmp::max;

fn set_threads() {
// Consider setting a fixed number of threads here, for example to avoid
// oversubscribing on hyperthreaded cores.
// let n = 4;
// let _ = rayon::ThreadPoolBuilder::new().num_threads(n).build_global();
}

#[bench]
fn map_exp_regular(bench: &mut Bencher)
{
let mut a = Array2::<f64>::zeros((EXP_N, EXP_N));
a.swap_axes(0, 1);
bench.iter(|| {
a.mapv_inplace(|x| x.exp());
});
}

#[bench]
fn rayon_exp_regular(bench: &mut Bencher)
{
set_threads();
let mut a = Array2::<f64>::zeros((EXP_N, EXP_N));
a.swap_axes(0, 1);
bench.iter(|| {
a.view_mut().into_par_iter().for_each(|x| *x = x.exp());
});
}

const FASTEXP: usize = EXP_N;

#[inline]
fn fastexp(x: f64) -> f64 {
let x = 1. + x/1024.;
x.powi(1024)
}

#[bench]
fn map_fastexp_regular(bench: &mut Bencher)
{
let mut a = Array2::<f64>::zeros((FASTEXP, FASTEXP));
bench.iter(|| {
a.mapv_inplace(|x| fastexp(x))
});
}

#[bench]
fn rayon_fastexp_regular(bench: &mut Bencher)
{
set_threads();
let mut a = Array2::<f64>::zeros((FASTEXP, FASTEXP));
bench.iter(|| {
a.view_mut().into_par_iter().for_each(|x| *x = fastexp(*x));
});
}

#[bench]
fn map_fastexp_cut(bench: &mut Bencher)
{
let mut a = Array2::<f64>::zeros((FASTEXP, FASTEXP));
let mut a = a.slice_mut(s![.., ..-1]);
bench.iter(|| {
a.mapv_inplace(|x| fastexp(x))
});
}

#[bench]
fn rayon_fastexp_cut(bench: &mut Bencher)
{
set_threads();
let mut a = Array2::<f64>::zeros((FASTEXP, FASTEXP));
let mut a = a.slice_mut(s![.., ..-1]);
bench.iter(|| {
a.view_mut().into_par_iter().for_each(|x| *x = fastexp(*x));
});
}

#[bench]
fn map_fastexp_by_axis(bench: &mut Bencher)
{
let mut a = Array2::<f64>::zeros((FASTEXP, FASTEXP));
bench.iter(|| {
for mut sheet in a.axis_iter_mut(Axis(0)) {
sheet.mapv_inplace(fastexp)
}
});
}

#[bench]
fn rayon_fastexp_by_axis(bench: &mut Bencher)
{
set_threads();
let mut a = Array2::<f64>::zeros((FASTEXP, FASTEXP));
bench.iter(|| {
a.axis_iter_mut(Axis(0)).into_par_iter()
.for_each(|mut sheet| sheet.mapv_inplace(fastexp));
});
}

#[bench]
fn rayon_fastexp_zip(bench: &mut Bencher)
{
set_threads();
let mut a = Array2::<f64>::zeros((FASTEXP, FASTEXP));
bench.iter(|| {
Zip::from(&mut a).into_par_iter().for_each(|(elt, )| *elt = fastexp(*elt));
});
}

#[bench]
fn add(bench: &mut Bencher)
{
let mut a = Array2::<f64>::zeros((ADDN, ADDN));
let b = Array2::<f64>::zeros((ADDN, ADDN));
let c = Array2::<f64>::zeros((ADDN, ADDN));
let d = Array2::<f64>::zeros((ADDN, ADDN));
bench.iter(|| {
azip!(mut a, b, c, d in {
*a += b.exp() + c.exp() + d.exp();
});
});
}

#[bench]
fn rayon_add(bench: &mut Bencher)
{
set_threads();
let mut a = Array2::<f64>::zeros((ADDN, ADDN));
let b = Array2::<f64>::zeros((ADDN, ADDN));
let c = Array2::<f64>::zeros((ADDN, ADDN));
let d = Array2::<f64>::zeros((ADDN, ADDN));
bench.iter(|| {
par_azip!(mut a, b, c, d in {
*a += b.exp() + c.exp() + d.exp();
});
});
}
1 change: 0 additions & 1 deletion scripts/all-tests.sh
Expand Up @@ -11,7 +11,6 @@ cargo test --verbose --no-default-features
cargo test --release --verbose --no-default-features
cargo build --verbose --features "$FEATURES"
cargo test --verbose --features "$FEATURES"
cargo test --manifest-path=parallel/Cargo.toml --verbose
cargo test --manifest-path=serialization-tests/Cargo.toml --verbose
cargo test --manifest-path=blas-tests/Cargo.toml --verbose
CARGO_TARGET_DIR=target/ cargo test --manifest-path=numeric-tests/Cargo.toml --verbose
Expand Down
13 changes: 10 additions & 3 deletions src/lib.rs
Expand Up @@ -55,11 +55,8 @@
//! needs matching memory layout to be efficient (with some exceptions).
//! + Efficient floating point matrix multiplication even for very large
//! matrices; can optionally use BLAS to improve it further.
//! + See also the [`ndarray-parallel`] crate for integration with rayon.
//! - **Requires Rust 1.30**
//!
//! [`ndarray-parallel`]: https://docs.rs/ndarray-parallel
//!
//! ## Crate Feature Flags
//!
//! The following crate feature flags are available. They are configured in your
Expand All @@ -68,6 +65,9 @@
//! - `serde-1`
//! - Optional, compatible with Rust stable
//! - Enables serialization support for serde 1.0
//! - `rayon`
//! - Optional, compatible with Rust stable
//! - Enables parallel iterators, parallelized methods and [`par_azip!`].
//! - `blas`
//! - Optional and experimental, compatible with Rust stable
//! - Enable transparent BLAS support for matrix multiplication.
Expand All @@ -87,6 +87,9 @@
#[cfg(feature = "serde-1")]
extern crate serde;

#[cfg(feature="rayon")]
extern crate rayon;

#[cfg(feature="blas")]
extern crate cblas_sys;
#[cfg(feature="blas")]
Expand Down Expand Up @@ -1333,6 +1336,10 @@ impl<A, S, D> ArrayBase<S, D>
}


// parallel methods
#[cfg(feature="rayon")]
pub mod parallel;

mod impl_1d;
mod impl_2d;
mod impl_dyn;
Expand Down
85 changes: 85 additions & 0 deletions src/parallel/impl_par_methods.rs
@@ -0,0 +1,85 @@

use {
Dimension,
NdProducer,
Zip,
ArrayBase,
DataMut,
};

use parallel::prelude::*;


/// # Parallel methods
///
/// These methods require crate feature `rayon`.
impl<A, S, D> ArrayBase<S, D>
where S: DataMut<Elem=A>,
D: Dimension,
A: Send + Sync,
{
/// Parallel version of `map_inplace`.
///
/// Modify the array in place by calling `f` by mutable reference on each element.
///
/// Elements are visited in arbitrary order.
pub fn par_map_inplace<F>(&mut self, f: F)
where F: Fn(&mut A) + Sync + Send
{
self.view_mut().into_par_iter().for_each(f)
}

/// Parallel version of `mapv_inplace`.
///
/// Modify the array in place by calling `f` by **v**alue on each element.
/// The array is updated with the new values.
///
/// Elements are visited in arbitrary order.
pub fn par_mapv_inplace<F>(&mut self, f: F)
where F: Fn(A) -> A + Sync + Send,
A: Clone,
{
self.view_mut().into_par_iter()
.for_each(move |x| *x = f(x.clone()))
}
}




// Zip

macro_rules! zip_impl {
($([$($p:ident)*],)+) => {
$(
#[allow(non_snake_case)]
impl<D, $($p),*> Zip<($($p,)*), D>
where $($p::Item : Send , )*
$($p : Send , )*
D: Dimension,
$($p: NdProducer<Dim=D> ,)*
{
/// The `par_apply` method for `Zip`.
///
/// This is a shorthand for using `.into_par_iter().for_each()` on
/// `Zip`.
///
/// Requires crate feature `rayon`.
pub fn par_apply<F>(self, function: F)
where F: Fn($($p::Item),*) + Sync + Send
{
self.into_par_iter().for_each(move |($($p,)*)| function($($p),*))
}
}
)+
}
}

zip_impl!{
[P1],
[P1 P2],
[P1 P2 P3],
[P1 P2 P3 P4],
[P1 P2 P3 P4 P5],
[P1 P2 P3 P4 P5 P6],
}
54 changes: 54 additions & 0 deletions src/parallel/into_impls.rs
@@ -0,0 +1,54 @@
use {Array, ArcArray, Dimension, ArrayView, ArrayViewMut};

use super::prelude::IntoParallelIterator;
use super::Parallel;

/// Requires crate feature `rayon`.
impl<'a, A, D> IntoParallelIterator for &'a Array<A, D>
where D: Dimension,
A: Sync
{
type Item = &'a A;
type Iter = Parallel<ArrayView<'a, A, D>>;
fn into_par_iter(self) -> Self::Iter {
self.view().into_par_iter()
}
}

// This is allowed: goes through `.view()`
/// Requires crate feature `rayon`.
impl<'a, A, D> IntoParallelIterator for &'a ArcArray<A, D>
where D: Dimension,
A: Sync
{
type Item = &'a A;
type Iter = Parallel<ArrayView<'a, A, D>>;
fn into_par_iter(self) -> Self::Iter {
self.view().into_par_iter()
}
}

/// Requires crate feature `rayon`.
impl<'a, A, D> IntoParallelIterator for &'a mut Array<A, D>
where D: Dimension,
A: Sync + Send
{
type Item = &'a mut A;
type Iter = Parallel<ArrayViewMut<'a, A, D>>;
fn into_par_iter(self) -> Self::Iter {
self.view_mut().into_par_iter()
}
}

// This is allowed: goes through `.view_mut()`, which is unique access
/// Requires crate feature `rayon`.
impl<'a, A, D> IntoParallelIterator for &'a mut ArcArray<A, D>
where D: Dimension,
A: Sync + Send + Clone,
{
type Item = &'a mut A;
type Iter = Parallel<ArrayViewMut<'a, A, D>>;
fn into_par_iter(self) -> Self::Iter {
self.view_mut().into_par_iter()
}
}