-
Notifications
You must be signed in to change notification settings - Fork 579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Access to Diffusion Map methods (and other embedding methods) as Scikit-Learn style API #3054
Comments
I’m not categorically against it, but could you describe what issues you encounter trying to use scanpy on the data you want to use it on? E.g. naively, I’d think you’d just wrap your matrix in an AnnData object, then run >>> import scanpy as sc
>>> adata = sc.AnnData(my_matrix) # shape: (n_observations, n_variables)
>>> sc.tl.diffmap(adata)
ValueError: You need to run `pp.neighbors` first to compute a neighborhood graph. Then you just follow that advice and >>> sc.pp.neighbors(adata)
>>> sc.tl.diffmap(adata)
>>> adata
AnnData object ...
uns: diffmap_evals
obsm: X_diffmap Alternatively you read the docs: The
… and where the results are pushed:
so you just take them out again: eigenvecs, eigenvals = adata.obsm['X_diffmap'], adata.uns['diffmap_evals'] |
Thanks for the explanation and walkthrough on where everything is located and how to access it! This is actually very useful. This makes it easier to navigate the addata object. Having access to the Scikit-Learn style API would be useful for incorporating with other sklearn compatible methods. The biggest thing is the .transform method to project new samples into the diffusion space. I've been trying to figure out how to implement this on my own but I hit a snag: https://stackoverflow.com/questions/78486471/how-to-add-a-transform-method-to-project-new-observations-into-an-existing-spac pyDiffMap has an implementation for Nystroem out-of-sample extensions used to calculate the values of the diffusion coordinates at each given point.. The backend implementations of the algorithms are different so I'm not sure if I can just port this method over. It would also be great if said sklearn-api would have an option for custom transformers. It looks like this was already implemented but having direct access to a standalone model object w/ this capability would be incredibly useful! Nothing like this exists for DiffusionMaps right now. I'm trying to implement it myself but I also hit a snag when trying to generalize the transformer objects to build connectivity graphs: https://stackoverflow.com/questions/78486997/how-to-reproduce-kneighbors-graphinclude-self-true-using-kneighborstransfor Any help on this front would be amazing especially if I could just use It directly w/ scanpy as this is my preferred analysis package (I actually started to deprecate my own software suite https://github.com/jolespin/soothsayer because scanpy worked so well). I work quite a bit in both the microbial ecology realm and single cell transcriptomics using scanpy for both. I'm trying to make a push for the microbial ecology community to start using this software as the problems being solved are very very similar. |
What kind of feature would you like to request?
Additional function parameters / changed functionality / changed defaults?
Please describe your wishes
It would be extremely helpful if the embedding manifold tools had scikit-learn style API.
For example, https://pydiffmap.readthedocs.io/en/master/reference/diffusion_map.html
Having the .fit, .transform, and .fit_transform would make the robust implementations in the backend of ScanPy a lot more accessible for users. Right now, the usage feels a bit restrictive and I'm having difficulty leveraging the power of the methods if it's not part of some similar workflow that is in the tutorials.
I'm trying to use the code in the backend of ScanPy implement this API myself but ScanPy is an extremely confusing package from an outside developer. There are nested functions and tests for even simple steps (many of which handle edge cases making the package robust).
More specifically, I'm trying to use the ScanPy implementation of Diffusion Maps as I would use those from pyDiffMap or the spectral clustering from Sklearn.
I would like to be able to fit a model with data. Pickle it. Then transform new samples based on the fitted model. This would provide a useful interface for users looking for a non linear alternative to pca.
The text was updated successfully, but these errors were encountered: