Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zip and azip! behavior related to inputted NdProducers #453

Open
rcarson3 opened this issue May 17, 2018 · 7 comments
Open

Zip and azip! behavior related to inputted NdProducers #453

rcarson3 opened this issue May 17, 2018 · 7 comments
Labels
good first issue A good issue to start contributing to ndarray!

Comments

@rcarson3
Copy link
Contributor

Based on trying to help in Issue #452 , I found some interesting behavior regarding either Zip or azip!. I've given examples down below

    let a = arr3(&[[[ 0,  1,  2],
                [ 3,  4,  5]],
               [[ 6,  7,  8],
                [ 9, 10, 11]]]);
    let inner0 = a.lanes(Axis(0));

    let trial = arr1(&[0,0,0,0,0,0]);
    let tr_iter = trial.axis_iter(Axis(0));

    //The below will error out when compiling.
    azip!(ref output (inner0), ref trial (tr_iter) in {println!("{:?}, {:?}",output, trial); });

The compiler returns the following error:

error[E0271]: type mismatch resolving `<ndarray::iter::AxisIter<'_, {integer}, ndarray::Dim<[usize; 0]>> as ndarray::IntoNdProducer>::Dim == ndarray::Dim<[usize; 2]>`
   --> src/main.rs:141:5
    |
141 |     azip!(ref output (inner0), ref trial (tr_iter) in {println!("{:?}, {:?}",output, trial); });
    |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected an array with a fixed size of 1 elements, found one with 2 elements
    |
    = note: expected type `ndarray::Dim<[usize; 1]>`
               found type `ndarray::Dim<[usize; 2]>`
    = note: this error originates in a macro outside of the current crate

error[E0599]: no method named `apply` found for type `ndarray::Zip<(ndarray::iter::Lanes<'_, {integer}, ndarray::Dim<[usize; 2]>>, ndarray::iter::AxisIter<'_, {integer}, ndarray::Dim<[usize; 0]>>), ndarray::Dim<[usize; 2]>>` in the current scope
   --> src/main.rs:141:5
    |
141 |     azip!(ref output (inner0), ref trial (tr_iter) in {println!("{:?}, {:?}",output, trial); });
    |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |
    = note: this error originates in a macro outside of the current crate

However, if I were to roll what inner0 outputs directly, I can get it to work no problem as seen below:

    let trial = arr1(&[0,0,0,0,0,0]);
    let tr_iter = trial.axis_iter(Axis(0));

    let b = arr2(&[[0,1,2,3,4,5],[6,7,8,9,10,11]]);
    let b_iter = b.axis_iter(Axis(1));
    //The compiler is perfectly okay with this code.
    azip!(ref output (b_iter), ref trial (tr_iter) in {println!("{:?}, {:?}",output, trial); });

I feel that either method should work, since we contain the same number of "items" to iterate over in each version. So, I was curious if this is simply an error/bug in the code or is there a design choice behind one method failing to compile and the other one working.

@jturner314
Copy link
Member

The key thing here is that NdProducer is an n-dimensional generalization of the Iterator trait in the standard library. azip!/Zip operate in an n-dimensional way on types that implement NdProducer.

When zipping together producers with azip!/Zip, they must have the same shape. In your first example, the array a has shape (2, 2, 3), so the shape of inner0 is (2, 3) since axis 0 is removed by .lanes(). The shape of trial is (6,), so the shape of tr_iter is (6,) since .axis_iter() iterates over axis 0 of length 6. Both producers have 6 elements, but they have different shapes. You can verify this with the following (.raw_dim() is a hidden method of NdProducer):

extern crate ndarray;

use ndarray::prelude::*;
use ndarray::{NdProducer, arr3};

fn main() {
    let a = arr3(&[[[0, 1, 2], [3, 4, 5]], [[6, 7, 8], [9, 10, 11]]]);
    let inner0 = a.lanes(Axis(0));
    println!("shape of inner0 = {:?}", inner0.raw_dim());
    for item in inner0 {
        println!("item = {}", item);
    }

    let trial = arr1(&[0, 0, 0, 0, 0, 0]);
    let tr_iter = trial.axis_iter(Axis(0));
    println!("shape of tr_iter = {:?}", tr_iter.raw_dim());
    for item in tr_iter {
        println!("item = {}", item);
    }
}

which prints

shape of inner0 = [2, 3]
item = [0, 6]
item = [1, 7]
item = [2, 8]
item = [3, 9]
item = [4, 10]
item = [5, 11]
shape of tr_iter = [6]
item = 0
item = 0
item = 0
item = 0
item = 0
item = 0

If you expand the azip! macro to

Zip::from(inner0).and(tr_iter).apply(|output, trial| {
    println!("{:?}, {:?}", output, trial);
});

the first compilation error is a bit clearer:

error[E0271]: type mismatch resolving `<ndarray::iter::AxisIter<'_, {integer}, ndarray::Dim<[usize; 0]>> as ndarray::IntoNdProducer>::Dim == ndarray::Dim<[usize; 2]>`
  --> src/main.rs:22:23
   |
22 |     Zip::from(inner0).and(tr_iter).apply(|output, trial| {
   |                       ^^^ expected an array with a fixed size of 1 elements, found one with 2 elements
   |
   = note: expected type `ndarray::Dim<[usize; 1]>`
              found type `ndarray::Dim<[usize; 2]>`

It's an error to try to add tr_iter to the Zip instance with .and() since the Zip instance has dimension 2 (created from inner0) instead of dimension 1 (required to match tr_iter).

Note that although Zip doesn't flatten the producers, you can do so manually. Lanes implements IntoIterator and AxisIter implements Iterator, so you can flatten the Lanes producer with .into_iter() and .zip() it with the AxisIter producer:

for (output, trial) in inner0.into_iter().zip(tr_iter) {
    println!("{:?}, {:?}", output, trial);
}

which prints

[0, 6] shape=[2], strides=[6], layout=Custom (0x0), 0 shape=[], strides=[], layout=C | F (0x3)
[1, 7] shape=[2], strides=[6], layout=Custom (0x0), 0 shape=[], strides=[], layout=C | F (0x3)
[2, 8] shape=[2], strides=[6], layout=Custom (0x0), 0 shape=[], strides=[], layout=C | F (0x3)
[3, 9] shape=[2], strides=[6], layout=Custom (0x0), 0 shape=[], strides=[], layout=C | F (0x3)
[4, 10] shape=[2], strides=[6], layout=Custom (0x0), 0 shape=[], strides=[], layout=C | F (0x3)
[5, 11] shape=[2], strides=[6], layout=Custom (0x0), 0 shape=[], strides=[], layout=C | F (0x3)

Your second example works because b_iter has shape (6,), just like tr_iter. Writing similar code to what I provided above yields:

shape of tr_iter = [6]
item = 0
item = 0
item = 0
item = 0
item = 0
item = 0
shape of b = [6]
item = [0, 6]
item = [1, 7]
item = [2, 8]
item = [3, 9]
item = [4, 10]
item = [5, 11]

Why is NdProducer an n-dimensional trait separate from Iterator? The docs say "This separation is needed because the producer represents a multidimensional set of items, it can be split along a particular axis for parallelization, and it has no fixed correspondance to a sequence." IMO, the most import reason it's important for producers to keep track of their shape instead of just flattening everything is that doing so helps avoid bugs. Flattening should be an explicit operation. Consider this example:

#[macro_use]
extern crate ndarray;

fn main() {
    let a = array![[1, 2, 3], [4, 5, 6]];
    let b = array![[1, 2], [3, 4], [5, 6]];
    azip!(a, b in {
        println!("{} {}", a, b);
    });
}

How should the elements of a and b be zipped together? The arrays have the same length but different shapes. If azip! naively flattened them, the zip would be weird and confusing. For example, index (1, 0) of a would be paired with with index (1, 1) of b. Instead, azip! panics in this case because the producers have different shapes. If the user actually wanted flat iterators, they could do so explicitly, such as using .iter() and the standard library's .zip() in this case.

More discussion about Zip is available here.

I do think the docs could be improved to make this clearer. For example, the docs for azip! should say "Panics if any of the producers are not of the same shape." instead of "Panics if any of the arrays are not of the same shape." Do you have any other suggestions?

@rcarson3
Copy link
Contributor Author

rcarson3 commented May 18, 2018

I think actually including examples that fail in the documentation wouldn't be a bad idea. By doing it this way, you can highlight what you mean by if any of the producers are not of the same shape.

Also, I will say that thinking about how .lanes and the other methods work has got me thinking that a few new methods might be useful to add to the crate. So, it might be useful to have a method slightly more general than .merge_axes that allows one to merge more than just two axes at a time. I could see it being beneficial to being able to merge 3 or more at a time depending on ones application. The applications I'm thinking you would be doing this with the outer most dimensions so you could then quickly iterate through some internal tensor or vector data. Along with this method, it might not hurt to have a macro or method like numpy's squeeze that removes all of the axes that have a dimension of 1.

Another method that might be useful would work kinda similar to the above, but it would allow one to create an iterator that merges multiple outer axes together reducing the dimensionality of the array, and then iterating over a specified axis.

Edit: I guess the above method could also work as long as the axes are next to each other as well.

In my mind it would have a behavior something like:

extern crate ndarray;

use ndarray::prelude::*;
fn main() {
//Simply a 2x2x3x1
    let mut a = Array::from_shape_vec((2,2,3,1).f(), vec![0, 1, 2, 3 , 4, 5, 6, 7, 8, 9, 10, 11]).unwrap();
//merge_multi_axes_iter is the new iterator method
//First term here are the axes that are to be merged together
//Second term is the axis we want to iterate over.
    let new_iter = a.merge_multi_axes_iter([1,2,3], Axis(1));
    for i in new_iter {
        println!("{:?}", i);
     }
}

The equivalent code using Ndarray today.

extern crate ndarray;

use ndarray::prelude::*;
fn main() {
    let mut a = Array::from_shape_vec((2,2,3,1).f(), vec![0, 1, 2, 3 , 4, 5, 6, 7, 8, 9, 10, 11]).unwrap();

    a.merge_axes(Axis(3), Axis(2));
    a.merge_axes(Axis(2), Axis(1));

    
    let a = a.remove_axis(Axis(3));
    let a = a.remove_axis(Axis(2));

    let a_iter = a.axis_iter(Axis(1));

    for i in a_iter {
        println!("{:?}", i);
    }
}

It would then print the same stuff we saw in your comment for either inner0 or b. I could see this really being useful in something similar to a finite element method where it's not uncommon to store either all of your vector or tensorial data at the quadrature points and the elements. So you could end up with a data structure for either stress or strain with the following dimensions 3x3xnqptsxnelems. You would often then operate over all of the quadrature/integration points over every element at once doing whatever operations are needed. If you think either of these might be of useful to include in 'ndarray' I can take a stab at making a prototype of them.

@LukeMathWalker LukeMathWalker added the good first issue A good issue to start contributing to ndarray! label Mar 18, 2019
@stokhos
Copy link
Contributor

stokhos commented Apr 17, 2021

I'd like to work on this issue. Is there anything specific that I need to be aware of ?

@rcarson3
Copy link
Contributor Author

Oh man looking back on this issue after 3 years, I definitely would not have had time to work on a prototype of the functions I'd envisioned between finishing grad school and starting a new job hahaha.

So, the maintainers might have some thoughts on this, but I would probably start with Arrays without custom strides initially just to keep things easy. It's been a while since I've normally worked with Rust, since my day to day job has me working with c++, so I would take that into consideration when reading my initial thoughts on this issue.

For the merge_axes type methods, I would probably first make a requirement that all the axes to be merged should be continuous aka right next to each other. For the input related to which axes to merge, it probably wouldn't hurt to sort the input first and then check to make sure that all values are within the # of axes. For an output, I would imagine the easiest initial implementation could consume the input and return a type ArrayD. It might also be interesting to have something like a merge_axes that returns an ArrayView that way for whatever your doing you don't consume the original array, but I'm not too sure how much of a pain that might be to implement. Imagine if you get an ArrayView version working it could probably be used in that proposed merge_multi_axes_iter function. However, it's been a long time since I've worked with Rust iterators, so I'm probably missing something.

@bluss
Copy link
Member

bluss commented Apr 21, 2021

This issue needs more design, potentially. For rcarson3's 2018 example #453 (comment) it looks good to to just use to_shape/into_shape with Order::F here when they arrive in smarter versions #390 #982?

@rcarson3
Copy link
Contributor Author

@bluss quickly glancing over #982 I'd say what you have listed in there would probably solve the issue I listed earlier especially since it appears that they can returns views of the original data structure.

@bluss
Copy link
Member

bluss commented Apr 23, 2021

Ok, so then current into_shape can also do this, it's just a bit harder to use/understand, and will be better with the improvements in #982 or follow-ups (into_shape and to_shape). Everything around logical order before/after the reshaping should become clearer.

So apart from the shaping, what remains in this issue seems to be:

  • Documentation and example clarifications
  • Should there be an explicit flattening operation (to 1D?), or is documentation enough? Current flattening like into_shape, to_shape (coming) and iterators could be enough. If not enough, describe the design before implementing. 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue A good issue to start contributing to ndarray!
Projects
None yet
Development

No branches or pull requests

5 participants