Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve efficiency of __getitem__ #101

Open
MarcAntoineSchmidtQC opened this issue Jul 29, 2021 · 0 comments
Open

Improve efficiency of __getitem__ #101

MarcAntoineSchmidtQC opened this issue Jul 29, 2021 · 0 comments

Comments

@MarcAntoineSchmidtQC
Copy link
Member

Currently, our approach for some of the __getitem__ methods is inefficient. For example, column subsetting for CategoricalMatrix converts the full matrix to a csc_matrix.

Here's a list to update with potential improvements:

  • DenseMatrix: nothing to do. Already optimized with np.ndarray
  • SparseMatrix: nothing to do. Already optimized with sps.csc_matrix
  • CategoricalMatrix:
    • row: nothing to do, trivial
    • column: create a SparseMatrix with only the subset of columns/rows selected
  • SplitMatrix:
    • Test thoroughly all the potential ways to index
  • StandardizedMatrix
    • Not sure if columns subset with only one row works
  • Write docstrings for expected behavior
  • Write tests covering all expected behavior
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant