Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different clustering results when setting metrics = "cosine", using callable cosine function and direct cos distance input. #3021

Open
2 of 3 tasks
zby3 opened this issue Apr 22, 2024 · 0 comments
Labels

Comments

@zby3
Copy link

zby3 commented Apr 22, 2024

Please make sure these conditions are met

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of scanpy.
  • (optional) I have confirmed this bug exists on the main branch of scanpy.

What happened?

The following three ways use the same cosine similarity for sc.pp.neighbors following by leiden clustering renders different results:

  1. set metric = "cosine" in sc.pp.neighbors()
  2. use self-written callable cosine similarity function in KNeighborsTransformer and pass to the transformer option in sc.pp.neighbors()
  3. calculate pre-computed cosine similarity matrix then use it to add adata.obsp['connectivities']
    Option 1 generates 85 clusters, option 2 generates 170 clusters and option 3 generates 183 clusters.

Minimal code sample

import scanpy as sc
from sklearn.neighbors import KNeighborsTransformer
import numpy as np
from numpy.linalg import norm
from sklearn.metrics.pairwise import cosine_similarity
adata = sc.datasets.pbmc68k_reduced()

###use built-in cosine similarity option
sc.pp.neighbors(adata, n_neighbors=15,n_pcs=0,metric= "cosine")
sc.tl.umap(adata,random_state =42)
sc.tl.leiden(adata,resolution=10)
clusters= np.array(adata.obs["leiden"]).astype(int)
print('num of clusters: '+str(len(set(clusters))))

###use callable cosine similarity metrics
def cos_distance(A, B):
    # calculate the distance, return a float
    cosine = np.dot(A, B) / (norm(A) * norm(B))
    return cosine
transformer = KNeighborsTransformer(n_neighbors=15, metric=cos_distance)
sc.pp.neighbors(adata, transformer=transformer,n_pcs=0)
sc.tl.umap(adata,random_state =42)
sc.tl.leiden(adata,resolution=10)
clusters= np.array(adata.obs["leiden"]).astype(int)
print('num of clusters: '+str(len(set(clusters))))

###use precomputed cosine distance metrics
dis_mat = cosine_similarity(adata.X)

tmp = sc.neighbors._common._get_indices_distances_from_dense_matrix(dis_mat, n_neighbors=15)
adata.obsp["connectivities"] = sc.neighbors._connectivity.umap(
    knn_indices = tmp[0],
    knn_dists = tmp[1],
    n_obs = dis_mat.shape[0],
    n_neighbors = 15,
)
adata.uns["neighbors"] = {"connectivities_key": "connectivities", "params": {"method": None}}
sc.tl.umap(adata,random_state =42)
sc.tl.leiden(adata,resolution=10)
clusters= np.array(adata.obs["leiden"]).astype(int)
print('num of clusters: '+str(len(set(clusters))))

Error output

num of clusters: 85
num of clusters: 170
num of clusters: 183

Versions

-----
anndata     0.10.6
scanpy      1.10.1
-----
IPython             8.22.2
PIL                 10.2.0
asttokens           NA
console_thrift      NA
cycler              0.12.1
cython_runtime      NA
dateutil            2.9.0
decorator           5.1.1
executing           2.0.1
h5py                3.10.0
igraph              0.11.4
jedi                0.19.1
joblib              1.3.2
kiwisolver          1.4.5
legacy_api_wrap     NA
leidenalg           0.10.2
llvmlite            0.42.0
matplotlib          3.8.3
mpl_toolkits        NA
natsort             8.4.0
numba               0.59.1
numpy               1.26.4
packaging           24.0
pandas              2.2.1
parso               0.8.3
pickleshare         0.7.5
pkg_resources       NA
prompt_toolkit      3.0.42
psutil              5.9.0
pure_eval           0.2.2
pydev_console       NA
pydev_ipython       NA
pydevconsole        NA
pydevd_file_utils   NA
pydevd_plugins      NA
pydevd_tracing      NA
pygments            2.17.2
pynndescent         0.5.11
pyparsing           3.1.2
pytz                2024.1
scipy               1.12.0
session_info        1.0.0
six                 1.16.0
sklearn             1.4.1.post1
stack_data          0.6.2
texttable           1.7.0
threadpoolctl       3.4.0
tqdm                4.66.2
traitlets           5.14.2
typing_extensions   NA
umap                0.5.5
wcwidth             0.2.13
-----
Python 3.11.7 (main, Dec 15 2023, 12:09:56) [Clang 14.0.6 ]
macOS-14.3.1-arm64-arm-64bit
-----
Session information updated at 2024-04-22 09:58

@zby3 zby3 added the Bug 🐛 label Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant