Wrongly ordered DotPlot totals in `scanpy` 1.10.1 with Pandas 1.x #3062

rgoya · 2024-05-16T01:48:11Z

Please make sure these conditions are met

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of scanpy.
(optional) I have confirmed this bug exists on the main branch of scanpy.

What happened?

In scanpy-1.9.8 DotPlots the default ordering of categories is alphabetical, adjusting to what was requested via groupby. This also worked when multiple columns were requested, eliminating the need to manually compose the alphabetical ordering of all existing combinations of observations in the plot.

The default ordering in scanpy>=1.10.0 DotPlots has changed, and resulting plots display wrong data:

Ordering is no longer alphabetical. It seems that the categories are being ordered as if a dendrogram had been requested.
Additionally, when adding totals with add_totals(), the bar plots with cell counts do follow the default alphabetical ordering, making the plot display wrong data (!).

The example below shows the misbehaviour using the example in https://scanpy.readthedocs.io/en/stable/generated/scanpy.pl.dotplot.html

Using the code example below; here is the expected plot with scanpy-1.9.8 (same result as in the URL above):

and here is the erroneous result with scanpy-1.10.1 and 1.10.0 (wrong ordering, mismatching totals):

Minimal code sample

import scanpy as sc

pbmc = sc.datasets.pbmc68k_reduced()

markers = {'T-cell': 'CD3D', 'B-cell': 'CD79A', 'myeloid': 'CST3'}

dp = sc.pl.dotplot(pbmc, markers, 'bulk_labels', return_fig=True)
dp.add_totals().style(dot_edge_color='black', dot_edge_lw=0.5).show()

Error output

(Error output is a bad plot, included in the description above.)

Versions

-----
anndata     0.10.7
scanpy      1.10.1
-----
IPython             8.13.2
PIL                 10.0.0
asciitree           NA
asttokens           NA
astunparse          1.6.3
backcall            0.2.0
cffi                1.15.1
cloudpickle         2.2.1
colorama            0.4.4
cycler              0.10.0
cython_runtime      NA
cytoolz             0.12.0
dask                2023.10.1
dateutil            2.8.2
decorator           5.1.1
defusedxml          0.7.1
dill                0.3.6
dot_parser          NA
entrypoints         0.4
exceptiongroup      1.1.1
executing           1.2.0
fasteners           0.17.3
flytekitplugins     NA
gmpy2               2.1.2
google              NA
h5py                3.8.0
icu                 2.11
igraph              0.11.2
jedi                0.19.1
jinja2              3.1.2
joblib              1.2.0
kiwisolver          1.4.4
legacy_api_wrap     NA
leidenalg           0.10.2
llvmlite            0.42.0
lz4                 4.3.2
markupsafe          2.1.2
matplotlib          3.8.3
mpl_toolkits        NA
mpmath              1.3.0
msgpack             1.0.5
natsort             8.3.1
numba               0.59.1
numcodecs           0.11.0
numexpr             2.7.3
numpy               1.26.4
packaging           23.1
pandas              1.5.3
parso               0.8.3
pexpect             4.8.0
pickleshare         0.7.5
plotly              5.14.1
prompt_toolkit      3.0.38
psutil              5.9.5
ptyprocess          0.7.0
pure_eval           0.2.2
pyarrow             10.0.1
pydot               1.4.2
pygments            2.15.1
pyparsing           3.0.9
pyteomics           NA
pytz                2023.3.post1
scipy               1.13.0
session_info        1.0.0
setuptools          67.7.2
setuptools_scm      NA
six                 1.16.0
sklearn             1.2.2
stack_data          0.6.2
sympy               1.11.1
tblib               1.7.0
texttable           1.6.7
threadpoolctl       3.1.0
tlz                 0.12.0
toolz               0.11.2
torch               2.1.1
torchgen            NA
tqdm                4.65.0
traitlets           5.9.0
typing_extensions   NA
wcwidth             0.2.6
xxhash              NA
yaml                5.4.1
zarr                2.14.2
zc                  NA
zipp                NA
zoneinfo            NA
-----
Python 3.10.10 | packaged by conda-forge | (main, Mar 24 2023, 20:17:34) [Clang 14.0.6 ]
macOS-14.4.1-x86_64-i386-64bit
-----
Session information updated at 2024-05-15 18:46

The text was updated successfully, but these errors were encountered:

flying-sheep · 2024-05-16T11:34:56Z

Hi, thanks for the report!

Note that the plots in the documentation are generated on the fly when building the documentation. The plot you currently see on https://scanpy.readthedocs.io/en/stable/generated/scanpy.pl.dotplot.html has therefore been created with scanpy 1.10.1

Must be a dependency issue, I’ll try to reproduce with the environment you provided.

/edit: I can reproduce it with that environment:

environment.yml

name: scanpy-3062
channels:
  - conda-forge
dependencies:
- ipykernel

- python==3.10.10
- anndata==0.10.7
- scanpy==1.10.1
- IPython==8.13.2
- pillow==10.0.0
- astunparse==1.6.3
- backcall==0.2.0
- cffi==1.15.1
- cloudpickle==2.2.1
- colorama==0.4.4
- cycler==0.10.0
- cytoolz==0.12.0
- dask==2023.10.1
#- dateutil==2.8.2
- decorator==5.1.1
- defusedxml==0.7.1
- dill==0.3.6
- entrypoints==0.4
- exceptiongroup==1.1.1
- executing==1.2.0
- fasteners==0.17.3
- gmpy2==2.1.2
- h5py==3.8.0
#- icu==2.11
- python-igraph==0.11.2
- jedi==0.19.1
- jinja2==3.1.2
- joblib==1.2.0
- kiwisolver==1.4.4
- leidenalg==0.10.2
- llvmlite==0.42.0
- lz4==4.3.2
- markupsafe==2.1.2
- matplotlib==3.8.3
- mpmath==1.3.0
#- msgpack==1.0.5
- natsort==8.3.1
- numba==0.59.1
- numcodecs==0.11.0
- numexpr==2.7.3
- numpy==1.26.4
- packaging==23.1
- pandas==1.5.3
- parso==0.8.3
- pexpect==4.8.0
- pickleshare==0.7.5
- plotly==5.14.1
- prompt_toolkit==3.0.38
- psutil==5.9.5
- ptyprocess==0.7.0
- pure_eval==0.2.2
- pyarrow==10.0.1
- pydot==1.4.2
- pygments==2.15.1
- pyparsing==3.0.9
- pytz==2023.3.post1
- scipy==1.13.0
#- session_info==1.0.0
#- setuptools==67.7.2
- six==1.16.0
- scikit-learn==1.2.2
- stack_data==0.6.2
- sympy==1.11.1
- tblib==1.7.0
- texttable==1.6.7
- threadpoolctl==3.1.0
#- tlz==0.12.0
- toolz==0.11.2
#- pytorch==2.1.1
- tqdm==4.65.0
- traitlets==5.9.0
- wcwidth==0.2.6
#- yaml==5.4.1
- zarr==2.14.2

flying-sheep · 2024-05-16T12:03:40Z

OK, pretty sure this is because your environment uses pandas 1.5

You can circumvent it for now by setting dp.categories_order = dp.dot_color_df.index:

rgoya · 2024-05-16T17:39:19Z

Thanks for the quick response, @flying-sheep!

I can confirm that updating pandas-2.2.2 does fix this. I totally missed this possibility; it's not clear to me why the dots would change ordering, but the totals wouldn't (maybe scanpy relies on default pandas behaviour that changed between 1.x and 2.x?). That said, pandas-2.x unfortunately breaks some dependencies in our environment, so I'll either pin scanpy or use your workaround.

Regarding the ordering and issue title change. Maybe a nit, but it's my understanding that the default ordering is alphabetical (which makese perfect sense as a default!). If this is correct, then I'd suggest that the wrong ordering is not the totals, but the categories themselves.

Given this, the workaround that gives me the expected behaviour would be dp.categories_order = dp.dot_color_df.index.sort_values():

rgoya added Bug 🐛 Triage 🩺 This issue needs to be triaged by a maintainer labels May 16, 2024

flying-sheep changed the title ~~DotPlots in scanpy-1.10.1 have wrong ordering and display wrong data.~~ Wrongly ordered DotPlot totals in scanpy 1.10.1 with Pandas 1.x May 16, 2024

flying-sheep added Area - Plotting 🌺 and removed Triage 🩺 This issue needs to be triaged by a maintainer labels May 16, 2024

flying-sheep linked a pull request Jun 6, 2024 that will close this issue

Fix dotplot totals for pandas 1.x #3101

Open

3 tasks

flying-sheep self-assigned this Jun 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrongly ordered DotPlot totals in `scanpy` 1.10.1 with Pandas 1.x #3062

Wrongly ordered DotPlot totals in `scanpy` 1.10.1 with Pandas 1.x #3062

rgoya commented May 16, 2024 •

edited

flying-sheep commented May 16, 2024 •

edited

flying-sheep commented May 16, 2024 •

edited

rgoya commented May 16, 2024 •

edited

Wrongly ordered DotPlot totals in scanpy 1.10.1 with Pandas 1.x #3062

Wrongly ordered DotPlot totals in scanpy 1.10.1 with Pandas 1.x #3062

Comments

rgoya commented May 16, 2024 • edited

Please make sure these conditions are met

What happened?

Minimal code sample

Error output

Versions

flying-sheep commented May 16, 2024 • edited

flying-sheep commented May 16, 2024 • edited

rgoya commented May 16, 2024 • edited

Wrongly ordered DotPlot totals in `scanpy` 1.10.1 with Pandas 1.x #3062

Wrongly ordered DotPlot totals in `scanpy` 1.10.1 with Pandas 1.x #3062

rgoya commented May 16, 2024 •

edited

flying-sheep commented May 16, 2024 •

edited

flying-sheep commented May 16, 2024 •

edited

rgoya commented May 16, 2024 •

edited