Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UMAP-rs not efficient #2

Open
jianshu93 opened this issue Nov 10, 2023 · 0 comments
Open

UMAP-rs not efficient #2

jianshu93 opened this issue Nov 10, 2023 · 0 comments

Comments

@jianshu93
Copy link

Dear Cell-ranger team,

It seems scan-rs is still using the very old vantage point data structure for nearest neighbor search, a key step of UMAP and t-SNE. However, recent breakthroughs in nearest neighbor search, e.g., proximity graph based algorithm has been proposed, which can be much faster and also accurate in terms of recall (e.g. HNSW, NSG). More important, it can be efficiently parallelized. In addition to the NNS step, UMAP steps, including cross entropy optimization, embedding space initialization, are all single threaded, thus slow for large dataset such as millions or billions of samples (it will be soon easy to have such large-scale dataset). I think the non-linear dimension reductions step can be further improved/accelerated.

Thanks,

Jianshu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant