Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

atac.pp.scopen fails while allocating a second array after computing scopen but before writing result to disk #92

Open
alexlenail opened this issue Feb 3, 2023 · 4 comments
Labels
bug Something isn't working

Comments

@alexlenail
Copy link

alexlenail commented Feb 3, 2023

Is there a way around this? I think the original scopen project doesn't require this. (https://github.com/CostaLab/scopen/blob/master/vignettes/epiScanpy.ipynb)

02/03/2023 15:54:25, iteration:  484, violation:  0.00052132
02/03/2023 15:55:08, iteration:  485, violation:  0.00051939
02/03/2023 15:55:49, iteration:  486, violation:  0.00051745
02/03/2023 15:56:29, iteration:  487, violation:  0.00051554
02/03/2023 15:57:10, iteration:  488, violation:  0.00051364
02/03/2023 15:57:53, iteration:  489, violation:  0.00051179
02/03/2023 15:58:41, iteration:  490, violation:  0.00050995
02/03/2023 15:59:27, iteration:  491, violation:  0.00050813
02/03/2023 16:00:12, iteration:  492, violation:  0.00050633
02/03/2023 16:00:58, iteration:  493, violation:  0.00050454
02/03/2023 16:01:45, iteration:  494, violation:  0.00050276
02/03/2023 16:02:29, iteration:  495, violation:  0.00050102
02/03/2023 16:03:12, iteration:  496, violation:  0.00049927
02/03/2023 16:03:53, iteration:  497, violation:  0.00049755
02/03/2023 16:04:38, iteration:  498, violation:  0.00049584
02/03/2023 16:05:20, iteration:  499, violation:  0.00049414
[total time:  6h 6m 3s ]
Traceback (most recent call last):
  File "/home/gridsan/lenail/.conda/envs/py39/lib/python3.9/site-packages/anndata/_io/utils.py", line 214, in func_wrapper
    return func(elem, key, val, *args, **kwargs)
  File "/home/gridsan/lenail/.conda/envs/py39/lib/python3.9/site-packages/anndata/_io/specs/registry.py", line 175, in write_elem
    _REGISTRY.get_writer(dest_type, t, modifiers)(f, k, elem, *args, **kwargs)
  File "/home/gridsan/lenail/.conda/envs/py39/lib/python3.9/site-packages/anndata/_io/specs/registry.py", line 24, in wrapper
    result = func(g, k, *args, **kwargs)
  File "/home/gridsan/lenail/.conda/envs/py39/lib/python3.9/site-packages/anndata/_io/specs/methods.py", line 307, in write_basic
    f.create_dataset(k, data=elem, **dataset_kwargs)
  File "/home/gridsan/lenail/.conda/envs/py39/lib/python3.9/site-packages/h5py/_hl/group.py", line 161, in create_dataset
    dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)
  File "/home/gridsan/lenail/.conda/envs/py39/lib/python3.9/site-packages/h5py/_hl/dataset.py", line 48, in make_new_dset
    data = base.array_for_new_object(data, specified_dtype=dtype)
  File "/home/gridsan/lenail/.conda/envs/py39/lib/python3.9/site-packages/h5py/_hl/base.py", line 118, in array_for_new_object
    data = np.asarray(data, order="C", dtype=as_dtype)
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 240. GiB for an array with shape (65627, 491773) and data type float64
@alexlenail alexlenail added the bug Something isn't working label Feb 3, 2023
@gtca
Copy link
Collaborator

gtca commented Feb 6, 2023

Hey @alexlenail,

Thanks for reporting, I think this is because in the current interface the matrix is imputed by default.

It also seems that scOpen's interfaces have been reworked since the interface in muon.atac was written. So I'll try to make an upgrade to the interface in muon.atac as well.

A thing to note here that scOpen itself has --no-impute=False as a default argument and is generally proposed as an imputation method. Following this issue, I think I would be more inclined not to perform imputation by default and rather focus on the latent space but I'd be curious to also hear what you think about that.

@gtca
Copy link
Collaborator

gtca commented Feb 6, 2023

To comment on the issue title, I don't think muon.atac.pp.scopen writes anything on disc...

@alexlenail
Copy link
Author

I ran scopen to impute my ATAC data using the scopen package directly, and it did not cause a memory error, so I think muon is maybe allocating more arrays than it needs to?

@gtca
Copy link
Collaborator

gtca commented Feb 21, 2023

I believe imputation is performed by default via the main interface (see here) but scopen_dr(), which was introduced later than the interface in muon, does not perform imputation.

We'll upgrade the interface!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants