-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
swifter.groupby() does not support with dropna=False #202
Comments
Hey @yangyxt Thanks for raising this issue. I tried to look into it and test with a synthetic dataframe. I included a NaN in the groups and didn't encounter this issue. Looking more closely at your error message, it looks as though you may have a tuple in your groupby column.
Can you check if the column |
Hi @jmcarpenter2
Some of these requirements seem very arbitrary, so it may just be a sporadic error. Below is a script that produces the error. I have tested it on two different machines. However, I have also had other scripts that produced the error on one machine, but not the other.
Output:
|
Thank you for this very clear and reproducible code and logging! I will look into this shortly |
I tried running this code locally and did not run into the issue.. The only major difference I am seeing between our environments is that yours is Windows. I am going to start a new initiative to start testing this code on Windows machines as well as part of my CI. Also related to #175 #148 and potentially #176 |
Added Windows CI but it didnt uncover anything :/ |
I found that the swifter groupby apply chain will encounter the error when trying to sort index, if I set dropna to False for the groupby step.
Here is the error log:
Traceback (most recent call last): File "/paedyl01/disk1/yangyxt/ngs_scripts/acmg_automated_anno.py", line 76, in wrapper result = func(*args, **kwargs) File "/paedyl01/disk1/yangyxt/ngs_scripts/acmg_automated_anno.py", line 484, in BP2_PM3_compound_with_patho return df.swifter.groupby([gene_col], as_index=False, dropna=False).apply(check_compound_per_gene, File "/home/yangyxt/anaconda3/envs/dask/lib/python3.9/site-packages/swifter/swifter.py", line 661, in apply return self._ray_apply(func, *args, **kwds) File "/home/yangyxt/anaconda3/envs/dask/lib/python3.9/site-packages/swifter/swifter.py", line 650, in _ray_apply return pd.concat(ray.get(apply_chunks), axis=self._axis).sort_index() File "/home/yangyxt/anaconda3/envs/dask/lib/python3.9/site-packages/pandas/util/_decorators.py", line 311, in wrapper return func(*args, **kwargs) File "/home/yangyxt/anaconda3/envs/dask/lib/python3.9/site-packages/pandas/core/frame.py", line 6447, in sort_index return super().sort_index( File "/home/yangyxt/anaconda3/envs/dask/lib/python3.9/site-packages/pandas/core/generic.py", line 4685, in sort_index indexer = get_indexer_indexer( File "/home/yangyxt/anaconda3/envs/dask/lib/python3.9/site-packages/pandas/core/sorting.py", line 94, in get_indexer_indexer indexer = nargsort( File "/home/yangyxt/anaconda3/envs/dask/lib/python3.9/site-packages/pandas/core/sorting.py", line 417, in nargsort indexer = non_nan_idx[non_nans.argsort(kind=kind)] TypeError: '<' not supported between instances of 'int' and 'tuple' ERROR:2022-09-28 13:40:29,310:wrapper:83:Exception raised in main_anno_process. exception: '<' not supported between instances of 'int' and 'tuple'
The dataframe put to use swifter.groupby() has a common numerical index. From 0 to len(df).
The groupby column might have some rows with NA values and I do wish to keep them. I guess that's why this issue happened. I 'm not sure whether this can be fixed or optimized. Pls take a look.
The text was updated successfully, but these errors were encountered: