Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sc.pp.neighbors(..., knn=True, n_neighbors=k) does not threshold the adjacency #3014

Open
3 tasks done
mkarikom opened this issue Apr 18, 2024 · 0 comments
Open
3 tasks done
Labels

Comments

@mkarikom
Copy link

mkarikom commented Apr 18, 2024

Please make sure these conditions are met

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of scanpy.
  • (optional) I have confirmed this bug exists on the main branch of scanpy.

What happened?

According to the pp.neighbors() docs we have:

    knn
        If `True`, use a hard threshold to restrict the number of neighbors to
        `n_neighbors`, that is, consider a knn graph. Otherwise, use a Gaussian
        Kernel to assign low weights to neighbors more distant than the
        `n_neighbors` nearest neighbor.

However, the adjacency represented by adata.uns['neighbors']['connectivities_key'] shows many more neighbors than n_neighbors when knn=True

Minimal code sample

import urllib.request
import scanpy as sc

# load the data
h5_data = "https://datasets.cellxgene.cziscience.com/6ff309fa-e9f6-405d-b24e-3c35528f154e.h5ad"
urllib.request.urlretrieve(h5_data, "/tmp/data.h5ad")    
adata = sc.read_h5ad("/tmp/data.h5ad")

# compute the adjacency thresholded at k=10
k=10
sc.pp.neighbors(adata, n_neighbors=k, n_pcs=40, random_state=42,knn=True)
adjacency = (adata.obsp[adata.uns['neighbors']['connectivities_key']].todense() > 0).astype(np.int32)
print(f"adjacency matrix (k={k}) shape: {adjacency.shape}")

# check to see if we got a threshold
max_neighbors = np.max(adjacency.sum(axis=0))
print(f"Max neighbors={max_neighbors}")

Error output

adjacency matrix (k=10) shape: (1011, 1011)
Max neighbors=91

Versions

-----
anndata     0.10.6
scanpy      1.9.8
-----
Bio                         1.83
MOODS                       NA
PIL                         10.2.0
absl                        NA
anyio                       NA
argcomplete                 NA
arrow                       1.3.0
asttokens                   NA
astunparse                  1.6.3
attr                        23.2.0
attrs                       23.2.0
babel                       2.14.0
biothings_client            0.3.1
bpnetlite                   0.6.0
cattr                       NA
cattrs                      NA
certifi                     2024.02.02
cffi                        1.16.0
charset_normalizer          3.3.2
cloudpickle                 3.0.0
colorama                    0.4.6
colorlog                    NA
comm                        0.2.2
cycler                      0.12.1
cython_runtime              NA
dateutil                    2.9.0.post0
debugpy                     1.8.1
decorator                   5.1.1
defusedxml                  0.7.1
dill                        0.3.8
dragonnfruit                0.3.1
exceptiongroup              1.2.0
executing                   2.0.1
fastjsonschema              NA
filelock                    3.13.1
fqdn                        NA
fsspec                      2024.3.1
goatools                    1.3.11
google                      NA
h5py                        3.10.0
hdf5plugin                  4.4.0
idna                        3.6
igraph                      0.11.4
ipykernel                   6.29.3
ipywidgets                  8.1.2
isoduration                 NA
jedi                        0.19.1
jinja2                      3.1.3
joblib                      1.3.2
json5                       0.9.24
jsonpointer                 2.4
jsonschema                  4.21.1
jsonschema_specifications   NA
jupyter_events              0.9.0
jupyter_server              2.13.0
jupyterlab_server           2.25.4
kiwisolver                  1.4.5
leidenalg                   0.10.2
llvmlite                    0.42.0
markupsafe                  2.1.5
matplotlib                  3.6.2
mpl_toolkits                NA
msgpack                     1.0.8
mudata                      0.2.3
muon                        0.1.5
mygene                      3.2.2
natsort                     8.4.0
nbformat                    5.10.3
networkx                    3.2.1
numba                       0.59.1
numexpr                     2.9.0
numpy                       1.26.4
optree                      0.10.0
optuna                      3.6.0
overrides                   NA
packaging                   24.0
pandas                      1.5.3
pandas_flavor               NA
parso                       0.8.3
patsy                       0.5.6
pingouin                    0.5.4
pkg_resources               NA
platformdirs                4.2.0
plotly                      5.20.0
prometheus_client           NA
prompt_toolkit              3.0.43
psutil                      5.9.8
pure_eval                   0.2.2
pyBigWig                    0.3.22
pyarrow                     15.0.2
pychromvar                  0.0.4
pycparser                   2.21
pydev_ipython               NA
pydevconsole                NA
pydevd                      2.9.5
pydevd_file_utils           NA
pydevd_plugins              NA
pydevd_tracing              NA
pydot                       2.0.0
pyfaidx                     0.8.1.1
pygments                    2.17.2
pyjaspar                    3.0.0
pynndescent                 0.5.11
pyparsing                   3.1.2
pysam                       0.22.0
pythonjsonlogger            NA
pytz                        2024.1
ray                         2.10.0
referencing                 NA
requests                    2.31.0
requests_cache              1.2.0
rfc3339_validator           0.1.4
rfc3986_validator           0.1.1
rich                        NA
rpds                        NA
scipy                       1.12.0
seaborn                     0.13.2
send2trash                  NA
session_info                1.0.0
setproctitle                1.2.2
simplejson                  3.19.2
sitecustomize               NA
six                         1.16.0
sklearn                     1.4.1.post1
sniffio                     1.3.1
stack_data                  0.6.3
statsmodels                 0.14.1
swig_runtime_data4          NA
tabulate                    0.9.0
tensorboard                 2.16.2
texttable                   1.7.0
threadpoolctl               3.4.0
torch                       2.2.1+cu121
torchgen                    NA
tornado                     6.4
tqdm                        4.66.2
traitlets                   5.14.2
typing_extensions           NA
umap                        0.5.5
uri_template                NA
url_normalize               1.4.3
urllib3                     2.2.1
uvloop                      0.19.0
wcwidth                     0.2.13
webcolors                   1.13
websocket                   1.7.0
wget                        3.2
xarray                      2024.2.0
yaml                        6.0.1
zmq                         25.1.2
zoneinfo                    NA
-----
IPython             8.22.2
jupyter_client      8.6.1
jupyter_core        5.7.2
jupyterlab          4.1.5
notebook            7.1.2
-----
Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]
Linux-6.5.0-27-generic-x86_64-with-glibc2.35
-----
Session information updated at 2024-04-18 18:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant