Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clearer documentation for ac.tl.lsi #87

Open
alexlenail opened this issue Jan 9, 2023 · 1 comment
Open

Clearer documentation for ac.tl.lsi #87

alexlenail opened this issue Jan 9, 2023 · 1 comment
Labels
enhancement New feature or request
Milestone

Comments

@alexlenail
Copy link

It looks like the function ac.tl.lsi is mostly just a call to scipy.sparse.linalg.svds.

cell_embeddings, svalues, peaks_loadings = svds(adata.X, k=n_comps)

I think this function expects that you've run ac.pp.tfidf beforehand, but that isn't mentioned in the docstring of the function / the API docs.

This is especially confusing because Muon's normalization docs read:

One of them is constructing term-document matrix from the original count matrix. This is typically followed by singular value decomposition (SVD) — the same technique that conventional principal component analysis (PCA) uses — to generate latent components, and these two steps together are referred to as latent semantic indexing (LSI).

From that explanation, you would expect a function called lsi() to run tf-idf() and then svd() -- so would take a raw counts matrix as input. But it seems like the function lsi() expects you to have already run tf-idf().

Clarification of the docs would help here.

@alexlenail alexlenail added the enhancement New feature or request label Jan 9, 2023
@gtca gtca modified the milestones: v0.1.3, v0.1.4 Jan 9, 2023
@gtca
Copy link
Collaborator

gtca commented Jan 9, 2023

Thanks for noting that, @alexlenail!

It might be also worthwhile to repurpose ac.tl.lsi() to run both TF-IDF and SVD while adding ac.tl.svd().
This should also make it more intuitive for the users of other commonly used toolkits for scATAC-seq processing.

We're currently refactoring the ATAC module, and that might be a welcome change to make these steps more intuitive.
If you have more thoughts on this or other matters, please do share, whether here or e.g. on Zulip (https://scverse.org/join/).

@gtca gtca modified the milestones: v0.1.4, v0.1.7 Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants