Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrongly ordered DotPlot totals in scanpy 1.10.1 with Pandas 1.x #3062

Open
2 of 3 tasks
rgoya opened this issue May 16, 2024 · 3 comments · May be fixed by #3101
Open
2 of 3 tasks

Wrongly ordered DotPlot totals in scanpy 1.10.1 with Pandas 1.x #3062

rgoya opened this issue May 16, 2024 · 3 comments · May be fixed by #3101

Comments

@rgoya
Copy link

rgoya commented May 16, 2024

Please make sure these conditions are met

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of scanpy.
  • (optional) I have confirmed this bug exists on the main branch of scanpy.

What happened?

In scanpy-1.9.8 DotPlots the default ordering of categories is alphabetical, adjusting to what was requested via groupby. This also worked when multiple columns were requested, eliminating the need to manually compose the alphabetical ordering of all existing combinations of observations in the plot.

The default ordering in scanpy>=1.10.0 DotPlots has changed, and resulting plots display wrong data:

  • Ordering is no longer alphabetical. It seems that the categories are being ordered as if a dendrogram had been requested.
  • Additionally, when adding totals with add_totals(), the bar plots with cell counts do follow the default alphabetical ordering, making the plot display wrong data (!).

The example below shows the misbehaviour using the example in https://scanpy.readthedocs.io/en/stable/generated/scanpy.pl.dotplot.html

Using the code example below; here is the expected plot with scanpy-1.9.8 (same result as in the URL above):
image

and here is the erroneous result with scanpy-1.10.1 and 1.10.0 (wrong ordering, mismatching totals):
image

Minimal code sample

import scanpy as sc

pbmc = sc.datasets.pbmc68k_reduced()

markers = {'T-cell': 'CD3D', 'B-cell': 'CD79A', 'myeloid': 'CST3'}

dp = sc.pl.dotplot(pbmc, markers, 'bulk_labels', return_fig=True)
dp.add_totals().style(dot_edge_color='black', dot_edge_lw=0.5).show()

Error output

(Error output is a bad plot, included in the description above.)

Versions

-----
anndata     0.10.7
scanpy      1.10.1
-----
IPython             8.13.2
PIL                 10.0.0
asciitree           NA
asttokens           NA
astunparse          1.6.3
backcall            0.2.0
cffi                1.15.1
cloudpickle         2.2.1
colorama            0.4.4
cycler              0.10.0
cython_runtime      NA
cytoolz             0.12.0
dask                2023.10.1
dateutil            2.8.2
decorator           5.1.1
defusedxml          0.7.1
dill                0.3.6
dot_parser          NA
entrypoints         0.4
exceptiongroup      1.1.1
executing           1.2.0
fasteners           0.17.3
flytekitplugins     NA
gmpy2               2.1.2
google              NA
h5py                3.8.0
icu                 2.11
igraph              0.11.2
jedi                0.19.1
jinja2              3.1.2
joblib              1.2.0
kiwisolver          1.4.4
legacy_api_wrap     NA
leidenalg           0.10.2
llvmlite            0.42.0
lz4                 4.3.2
markupsafe          2.1.2
matplotlib          3.8.3
mpl_toolkits        NA
mpmath              1.3.0
msgpack             1.0.5
natsort             8.3.1
numba               0.59.1
numcodecs           0.11.0
numexpr             2.7.3
numpy               1.26.4
packaging           23.1
pandas              1.5.3
parso               0.8.3
pexpect             4.8.0
pickleshare         0.7.5
plotly              5.14.1
prompt_toolkit      3.0.38
psutil              5.9.5
ptyprocess          0.7.0
pure_eval           0.2.2
pyarrow             10.0.1
pydot               1.4.2
pygments            2.15.1
pyparsing           3.0.9
pyteomics           NA
pytz                2023.3.post1
scipy               1.13.0
session_info        1.0.0
setuptools          67.7.2
setuptools_scm      NA
six                 1.16.0
sklearn             1.2.2
stack_data          0.6.2
sympy               1.11.1
tblib               1.7.0
texttable           1.6.7
threadpoolctl       3.1.0
tlz                 0.12.0
toolz               0.11.2
torch               2.1.1
torchgen            NA
tqdm                4.65.0
traitlets           5.9.0
typing_extensions   NA
wcwidth             0.2.6
xxhash              NA
yaml                5.4.1
zarr                2.14.2
zc                  NA
zipp                NA
zoneinfo            NA
-----
Python 3.10.10 | packaged by conda-forge | (main, Mar 24 2023, 20:17:34) [Clang 14.0.6 ]
macOS-14.4.1-x86_64-i386-64bit
-----
Session information updated at 2024-05-15 18:46
@rgoya rgoya added Bug 🐛 Triage 🩺 This issue needs to be triaged by a maintainer labels May 16, 2024
@flying-sheep
Copy link
Member

flying-sheep commented May 16, 2024

Hi, thanks for the report!

Note that the plots in the documentation are generated on the fly when building the documentation. The plot you currently see on https://scanpy.readthedocs.io/en/stable/generated/scanpy.pl.dotplot.html has therefore been created with scanpy 1.10.1

Must be a dependency issue, I’ll try to reproduce with the environment you provided.

/edit: I can reproduce it with that environment:

environment.yml
name: scanpy-3062
channels:
  - conda-forge
dependencies:
- ipykernel

- python==3.10.10
- anndata==0.10.7
- scanpy==1.10.1
- IPython==8.13.2
- pillow==10.0.0
- astunparse==1.6.3
- backcall==0.2.0
- cffi==1.15.1
- cloudpickle==2.2.1
- colorama==0.4.4
- cycler==0.10.0
- cytoolz==0.12.0
- dask==2023.10.1
#- dateutil==2.8.2
- decorator==5.1.1
- defusedxml==0.7.1
- dill==0.3.6
- entrypoints==0.4
- exceptiongroup==1.1.1
- executing==1.2.0
- fasteners==0.17.3
- gmpy2==2.1.2
- h5py==3.8.0
#- icu==2.11
- python-igraph==0.11.2
- jedi==0.19.1
- jinja2==3.1.2
- joblib==1.2.0
- kiwisolver==1.4.4
- leidenalg==0.10.2
- llvmlite==0.42.0
- lz4==4.3.2
- markupsafe==2.1.2
- matplotlib==3.8.3
- mpmath==1.3.0
#- msgpack==1.0.5
- natsort==8.3.1
- numba==0.59.1
- numcodecs==0.11.0
- numexpr==2.7.3
- numpy==1.26.4
- packaging==23.1
- pandas==1.5.3
- parso==0.8.3
- pexpect==4.8.0
- pickleshare==0.7.5
- plotly==5.14.1
- prompt_toolkit==3.0.38
- psutil==5.9.5
- ptyprocess==0.7.0
- pure_eval==0.2.2
- pyarrow==10.0.1
- pydot==1.4.2
- pygments==2.15.1
- pyparsing==3.0.9
- pytz==2023.3.post1
- scipy==1.13.0
#- session_info==1.0.0
#- setuptools==67.7.2
- six==1.16.0
- scikit-learn==1.2.2
- stack_data==0.6.2
- sympy==1.11.1
- tblib==1.7.0
- texttable==1.6.7
- threadpoolctl==3.1.0
#- tlz==0.12.0
- toolz==0.11.2
#- pytorch==2.1.1
- tqdm==4.65.0
- traitlets==5.9.0
- wcwidth==0.2.6
#- yaml==5.4.1
- zarr==2.14.2

@flying-sheep
Copy link
Member

flying-sheep commented May 16, 2024

OK, pretty sure this is because your environment uses pandas 1.5

You can circumvent it for now by setting dp.categories_order = dp.dot_color_df.index:

@flying-sheep flying-sheep changed the title DotPlots in scanpy-1.10.1 have wrong ordering and display wrong data. Wrongly ordered DotPlot totals in scanpy 1.10.1 with Pandas 1.x May 16, 2024
@flying-sheep flying-sheep added Area - Plotting 🌺 and removed Triage 🩺 This issue needs to be triaged by a maintainer labels May 16, 2024
@rgoya
Copy link
Author

rgoya commented May 16, 2024

Thanks for the quick response, @flying-sheep!

I can confirm that updating pandas-2.2.2 does fix this. I totally missed this possibility; it's not clear to me why the dots would change ordering, but the totals wouldn't (maybe scanpy relies on default pandas behaviour that changed between 1.x and 2.x?). That said, pandas-2.x unfortunately breaks some dependencies in our environment, so I'll either pin scanpy or use your workaround.

Regarding the ordering and issue title change. Maybe a nit, but it's my understanding that the default ordering is alphabetical (which makese perfect sense as a default!). If this is correct, then I'd suggest that the wrong ordering is not the totals, but the categories themselves.

Given this, the workaround that gives me the expected behaviour would be dp.categories_order = dp.dot_color_df.index.sort_values():
image

@flying-sheep flying-sheep linked a pull request Jun 6, 2024 that will close this issue
3 tasks
@flying-sheep flying-sheep self-assigned this Jun 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants