Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_split_col_subsets ignores columns when non-monotonic #92

Open
MarcAntoineSchmidtQC opened this issue Jul 22, 2021 · 5 comments
Open

_split_col_subsets ignores columns when non-monotonic #92

MarcAntoineSchmidtQC opened this issue Jul 22, 2021 · 5 comments

Comments

@MarcAntoineSchmidtQC
Copy link
Member

Maybe this is not the right function for what I am looking for. I expected this function to be perfect for the __getitem__ column subsetting but it's not working if the columns are not monotonic. PR #91 depends on fixing this issue.

Minimal working example:

import quantcore.matrix as mx
import numpy as np

a = mx.DenseMatrix(np.arange(16, dtype=float).reshape(4, 4))
sm = mx.SplitMatrix(matrices=[a])

sm._split_col_subsets([0, 2, 3])
# Correct - Out[6]: ([array([0, 1, 2], dtype=int32)], [array([0, 2, 3], dtype=int32)], 3)

ms._split_col_subsets([0, 3, 2])
# Incorrect - Out[9]: ([array([0, 1], dtype=int32)], [array([0, 3], dtype=int32)], 3)
# Expected: ([array([0, 1, 2], dtype=int32)], [array([0, 3, 2], dtype=int32)], 3)
@MarcAntoineSchmidtQC
Copy link
Member Author

@tbenthompson, if you could look into this I would like to get your opinion on whether we should spend the time fixing the _split_col_subsets method or write a new method for the column subsetting.

@tbenthompson
Copy link
Collaborator

Yes, you're correct that _split_col_subsets currently assumes the input columns are ordered monotonically. One thing we could do is sort the columns first, then call _split_col_subsets and then "unsort" the outputs. That would probably be the easiest thing to do. In order to minimize the amount of Cython, it might be nice to implement that sort/unsort step in a python wrapper function rather than directly in the Cython code.

I could help with this tomorrow or soon. Let me know if that would be helpful.

@tbenthompson
Copy link
Collaborator

tbenthompson commented Jul 22, 2021

I actually ran into this same issue from the other "end" recently when I fixed some bugs with subsets that were not monotonically specified. So this same bug is a problem if either:

  1. The cols to choose aren't monotonic. Or
  2. The subset_cols_indices aren't monotonic.

I fixed the subset_cols_indices problem by correcting the function that was creating the SplitMatrix and then requiring that input indices be monotonic.

Sorry about not realizing at the time that the cols problem would also be an issue!

@MarcAntoineSchmidtQC
Copy link
Member Author

My PR currently contains a fix.

My solution was to create a column_map attribute to the SplitMatrix object. It stores the mapping from index location to actual location in the matrices.
Then, creating a _split_col_subsets-like function that supports unordered columns and duplicate column was much more easy. I also modified the getcol method to use this.

We can discuss this in the coming days. Downside to this is that we need to store an array of size n_cols x 2, but I think it's worth it.

@tbenthompson
Copy link
Collaborator

@MarcAntoineSchmidtQC did this get fixed or did it get dropped when you closed your PR? Is it worth reviving just this fix?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants