Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: partitions #611

Open
bruderjakob17 opened this issue Apr 22, 2022 · 5 comments · May be fixed by #637
Open

Feature Request: partitions #611

bruderjakob17 opened this issue Apr 22, 2022 · 5 comments · May be fixed by #637

Comments

@bruderjakob17
Copy link

bruderjakob17 commented Apr 22, 2022

It would be neat to have a way of iterating over all possible partitions of some vector. Example usages:

all_partitions(&vec![7,8,9]) should iterate over

[[7,8,9]]
[[7,8],[9]]
[[7,9],[8]]
[[7],[8,9]]
[[7],[8],[9]]

while partitions(&vec![6,7,8,9], 2) should iterate over

[[6],[7,8,9]]
[[6,7],[8,9]]
[[6,8],[7,9]]
[[6,9],[7,8]]
[[6,7,8],[9]]
[[6,7,9],[8]]
[[6,8,9],[7]]

Remark: I asked this question on StackOverflow, and implemented a version of all_partitions(). I am happy to create a PR if this is something that could belong to the itertools package.

@phimuemue
Copy link
Member

Hi @bruderjakob17. I think this might be a neat addition.

I think I understand the idea of your algorithm (though I did not in-depth-review it), and I think we can go with it (we do similar index mapping in combinations).

Some remarks on the code on stackoverflow before you dive in (copied in so you have it at hand):

use itertools::zip;

// Based on https://stackoverflow.com/a/30898130/4322240. Read first if something here is unclear!

// Representation of a state in which the iterator can be.
// elements: contains the elements to be partitioned.
// index_map: in terms of the linked post, index_map[i] = p_i (but 0-indexed)
// initialized: marks if the Partition element was just initialized and hence
//     was not generated by next()
pub struct Partition<'a, T> where T: Copy {
    elements: &'a Vec<T>,
    index_map: Vec<usize>,
    initialized: bool, 
}
impl<'a, T> Partition<'a, T> where T: Copy {
    fn smallest_viable(v: &'a Vec<T>) -> Partition<'a, T> {
        Partition::<T> { elements: v, index_map: vec![0; v.len()], initialized: true }
    }
    fn is_viable(&self) -> bool {
        self.elements.len() == self.index_map.len()
        && (self.index_map.is_empty() || self.index_map[0] == 0)
        && (1..self.index_map.len())
                .all(|x| (0..x).any(|y| self.index_map[x] <= self.index_map[y] + 1))
    }
    fn is_incrementable(&self, index: usize) -> bool {
        index <= self.index_map.len()
        && index > 0
        // p_i <= n does not need to be checked, as it is guaranteed if the invariant
        // "for all j > 0 there exists i < j s.t. p_j <= p_i + 1" is satisfied
        && (0..index).any(|x| self.index_map[x] == self.index_map[index])
    }
    fn increment(&mut self, index: usize) {
        self.index_map[index] += 1;
        for x in index + 1..self.index_map.len() {
            self.index_map[x] = 0;
        }
    }
    fn create_partition(&self) -> Vec<Vec<T>> {
        if let Some(&max_index) = self.index_map.iter().max(){
            return (0..=max_index).map(|x| zip(self.elements, &self.index_map)
                    .filter(|(_e,i)| **i == x).map(|(e,_i)| *e).collect())
                    .collect();
        } else {
            return Vec::new();
        }
    }
}
impl<'a, T> Iterator for Partition<'a, T> where T: Copy {
    type Item = Vec<Vec<T>>;

    fn next(&mut self) -> Option<Self::Item> {
        if self.initialized {
            self.initialized = false;
            return Some(self.create_partition());
        }
        for index in (0..self.index_map.len()).rev() {
            if self.is_incrementable(index) {
                self.increment(index);
                return Some(self.create_partition());
            }
        }
        return None;
    }
}

pub fn partitions<'a, T>(v: &'a Vec<T>) -> Partition<'a, T> where T: Copy {
    Partition::smallest_viable(v)
}
  • In the past, we found the combinatorial algorithms tricky wrt indexing issues, and we observed that it makes lives easier when certain methods are inlined. It allows you to quickly spot connections that you'd otherwise have to reconstruct from reading several methods.
    In this particular case, I think we should inline (at least) is_incrementable and increment. If I am not mistaken it trivially justifies elimination of index <= self.index_map.len() from is_incrementable, and the index>0 check should easily allow us to start looping at 1 instead of 0, making the check redundant.
  • create_partition's runtime looks quadratic to me. We could try to reach linear here be e.g. scanning once, and allocating the Vecs beforehand, just pushing one element at a time into the respective Vec.
  • I am not sure if we can get rid of the self.initialized special casing here... Sometimes iterators require that sort of border case handling.
  • If you have a authoritative source of the algorithm (wikipedia would be enough for my taste), we could include this and refer to it, so that a reader can easily map the source code to the algorithm.

Regarding all of the above: I might be wrong with my assessment, so please apply your judgement where necessary.

@bruderjakob17
Copy link
Author

bruderjakob17 commented Jul 31, 2022

Hi @phimuemue :)

First of all, thanks for all your comments and sorry for the late reply. Also, in the meantime I implemented a version where one can iterate also over all partitions over a specific size (the approach is similar, only that viable sequences have the additional requirement that there need to be k different p_i (where k is the number of classes into which the vec is partitioned)). I will just post the modified code below:

pub struct Partition<'a, T> where T: Copy {
    elements: &'a Vec<T>,
    index_map: Vec<usize>,
    number_of_classes: Option<usize>,
    initialized: bool,
}
impl<'a, T> Partition<'a, T> where T: Copy {
    fn smallest_viable(v: &'a Vec<T>, k: Option<usize>) -> Partition<'a, T> {
        let initial_index_map = match k {
            None => vec![0; v.len()],
            Some(k) => {
                let mut initial_index_map = vec![0; v.len() - k];
                initial_index_map.append(&mut (0..k).collect());
                initial_index_map
            }
        };
        Partition::<T> { elements: v, index_map: initial_index_map, number_of_classes: k, initialized: true }
    }
    fn is_incrementable(&self, index: usize) -> bool {
        index < self.index_map.len()
        && index > 0
        && (0..index).any(|x| self.index_map[x] == self.index_map[index])
        // additional constraint: when only partitions into number_of_classes subsets are required
        && (self.number_of_classes == None || self.index_map[index] + 1 < self.number_of_classes.unwrap())
    }
    fn increment(&mut self, index: usize) { //TODO BUGFIXES
        self.index_map[index] += 1;
        match self.number_of_classes {
            None => {
                for x in index + 1..self.index_map.len() {
                    self.index_map[x] = 0;
                }
            },
            Some(k) => {
                let m = self.index_map[0..=index].iter().map(|x| *x).max().unwrap();
                // now, append 0...0(m+1)(m+2)...(k - 1).
                // note that k - 1 = m + (k - 1 - m).
                // hence, (k - 1 - m) is the length of the tail
                let first_nonzero_index = self.index_map.len() - (k - 1 - m);
                for x in index + 1..first_nonzero_index {
                    self.index_map[x] = 0;
                }
                for x in first_nonzero_index..self.index_map.len() {
                    self.index_map[x] = m + x - first_nonzero_index + 1;
                }
            }
        }
    }
    fn create_partition(&self) -> Vec<Vec<T>> {
        if let Some(&max_index) = self.index_map.iter().max(){
            return (0..=max_index).map(|x| zip(self.elements, &self.index_map)
                    .filter(|(_e,i)| **i == x).map(|(e,_i)| *e).collect())
                    .collect();
        } else {
            return Vec::new();
        }
    }
}
impl<'a, T> Iterator for Partition<'a, T> where T: Copy {
    type Item = Vec<Vec<T>>;

    fn next(&mut self) -> Option<Self::Item> {
        if self.initialized {
            self.initialized = false;
            return Some(self.create_partition());
        }
        for index in (0..self.index_map.len()).rev() {
            if self.is_incrementable(index) {
                self.increment(index);
                return Some(self.create_partition());
            }
        }
        return None;
    }
}
pub fn partitions<'a, T>(v: &'a Vec<T>, k: Option<usize>) -> Partition<'a, T> where T: Copy {
    Partition::smallest_viable(v, k)
}
  • Regarding the inlining: if I understand inlining correctly, the code of the iterator implementation should then look like this (also with your suggested improvements):
impl<'a, T> Iterator for Partition<'a, T> where T: Copy {
    type Item = Vec<Vec<T>>;

    fn next(&mut self) -> Option<Self::Item> {
        if self.initialized {
            self.initialized = false;
            return Some(self.create_partition());
        }
        for index in (1..self.index_map.len()).rev() {
            if 
                (0..index).any(|x| self.index_map[x] == self.index_map[index])
                // additional constraint: when only partitions into number_of_classes subsets are required
                && (self.number_of_classes == None || self.index_map[index] + 1 < self.number_of_classes.unwrap())
            {
                self.index_map[index] += 1;
                match self.number_of_classes {
                    None => {
                        for x in index + 1..self.index_map.len() {
                            self.index_map[x] = 0;
                        }
                    },
                    Some(k) => {
                        let m = self.index_map[0..=index].iter().map(|x| *x).max().unwrap();
                        // now, append 0...0(m+1)(m+2)...(k - 1).
                        // note that k - 1 = m + (k - 1 - m).
                        // hence, (k - 1 - m) is the length of the tail
                        let first_nonzero_index = self.index_map.len() - (k - 1 - m);
                        for x in index + 1..first_nonzero_index {
                            self.index_map[x] = 0;
                        }
                        for x in first_nonzero_index..self.index_map.len() {
                            self.index_map[x] = m + x - first_nonzero_index + 1;
                        }
                    }
                }
                return Some(self.create_partition());
            }
        }
        return None;
    }
}
  • Regarding your second comment: I think your solution is not only faster, but also easier to read. It should look like this then:
fn create_partition(&self) -> Vec<Vec<T>> {
        if let Some(&max_index) = self.index_map.iter().max() {
            let mut partition_classes = vec![Vec::new(); max_index + 1];
            for i in 0..self.index_map.len(){
                partition_classes[self.index_map[i]].push(self.elements[i]);
            }
            return partition_classes;
        } else {
            return Vec::new();
        }
    }
  • I am also not sure about the special case handling. I am no expert in iterators (this is the first iterator I ever wrote in Rust) and just added it because the first partition always was missing.
  • Sadly, I did not find anything about this on Wikipedia. The only source I have found for this topic was the StackOverflow-post.

As a final comment: here is the code above with your first two suggestions:

pub struct Partition<'a, T> where T: Copy {
    elements: &'a Vec<T>,
    index_map: Vec<usize>,
    number_of_classes: Option<usize>,
    initialized: bool,
}
impl<'a, T> Partition<'a, T> where T: Copy {
    fn smallest_viable(v: &'a Vec<T>, k: Option<usize>) -> Partition<'a, T> {
        let initial_index_map = match k {
            None => vec![0; v.len()],
            Some(k) => {
                let mut initial_index_map = vec![0; v.len() - k];
                initial_index_map.append(&mut (0..k).collect());
                initial_index_map
            }
        };
        Partition::<T> { elements: v, index_map: initial_index_map, number_of_classes: k, initialized: true }
    }
    fn create_partition(&self) -> Vec<Vec<T>> {
        if let Some(&max_index) = self.index_map.iter().max() {
            let mut partition_classes = vec![Vec::new(); max_index + 1];
            for i in 0..self.index_map.len(){
                partition_classes[self.index_map[i]].push(self.elements[i]);
            }
            return partition_classes;
        } else {
            return Vec::new();
        }
    }
}
impl<'a, T> Iterator for Partition<'a, T> where T: Copy {
    type Item = Vec<Vec<T>>;

    fn next(&mut self) -> Option<Self::Item> {
        if self.initialized {
            self.initialized = false;
            return Some(self.create_partition());
        }
        for index in (1..self.index_map.len()).rev() {
            if 
                (0..index).any(|x| self.index_map[x] == self.index_map[index])
                // additional constraint: when only partitions into number_of_classes subsets are required
                && (self.number_of_classes == None || self.index_map[index] + 1 < self.number_of_classes.unwrap())
            {
                self.index_map[index] += 1;
                match self.number_of_classes {
                    None => {
                        for x in index + 1..self.index_map.len() {
                            self.index_map[x] = 0;
                        }
                    },
                    Some(k) => {
                        let m = self.index_map[0..=index].iter().map(|x| *x).max().unwrap();
                        // now, append 0...0(m+1)(m+2)...(k - 1).
                        // note that k - 1 = m + (k - 1 - m).
                        // hence, (k - 1 - m) is the length of the tail
                        let first_nonzero_index = self.index_map.len() - (k - 1 - m);
                        for x in index + 1..first_nonzero_index {
                            self.index_map[x] = 0;
                        }
                        for x in first_nonzero_index..self.index_map.len() {
                            self.index_map[x] = m + x - first_nonzero_index + 1;
                        }
                    }
                }
                return Some(self.create_partition());
            }
        }
        return None;
    }
}
pub fn partitions<'a, T>(v: &'a Vec<T>, k: Option<usize>) -> Partition<'a, T> where T: Copy {
    Partition::smallest_viable(v, k)
}

@phimuemue
Copy link
Member

Hi @bruderjakob17, just want to let you know that I did not forget this.

The new implementation imho already looks much cleaner. If you want to pursue this further, please do the following:

  • Remove number_of_classes: If there is demand, we can still add this functionality later. I think most users will just want to have the partitions - and should not pay the run-time cost of checking the Option. (I know, the check is fast, but as a guideline, we try to avoid dynamic stuff as far as possible.) Moreover (and this is my foremost argument) it makes the code (and, consequently, review) much simpler.
  • Please also inline smallest_viable.
  • Use a custom type for a partition (i.e. say type Partition<T>=Vec<Vec<T>> and use Item=Partition<T>). Reason: I think there may be more compact representations for partitions than a vector of vectors, and I'd l ike to keep the possibility to switch to another implementation in the future.

If you want, I think you can then also open a PR (which usually gets a bit more feedback).

I do not have much free time right now, but you can be assured that I'd really like to have this feature.

@bruderjakob17
Copy link
Author

Hi @phimuemue ,

Thanks again for your comments. I simplified (and commented) the code again. Into which branch should I merge my PR?

@phimuemue
Copy link
Member

Hi @bruderjakob17, cool. Please rebase against the branch master (at the time of writing this, this is 7a27408), and then open a PR.

If you do this, please ansure that no unrelated (formatting) changes are included in the PR, and - if needed - split your changes into several commits.

@bruderjakob17 bruderjakob17 linked a pull request Aug 12, 2022 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants