`stack_anomalous` inside `groupby` breaks `as_index=False` #192

kmdalton · 2022-10-19T14:44:34Z

When called inside a groupby.apply context, stack_anomalous overrides the as_index=False setting and appends the grouping column to the index of the returned dataset with name None

import numpy as np
import reciprocalspaceship as rs

cell = [34., 45., 98., 90., 90., 90.]
spacegroup = 19
dmin = 4.
repeats = 1

h,k,l = rs.utils.generate_reciprocal_asu(cell, spacegroup, dmin, anomalous=True).T

ds = None
for i in range(repeats):
    _ds = rs.DataSet({
        "H" : h,
        "K" : k,
        "L" : l,
        "I" : np.random.random(len(h)),
        "SIGI" : np.random.random(len(h)),
    }, cell=cell, spacegroup=spacegroup, merged=True).infer_mtz_dtypes()
    _ds['repeat'] = i
    if ds is not None:
        ds = rs.concat((ds, _ds))
    else:
        ds = _ds

ds = ds.set_index(['H', 'K', 'L'])

print(f"Before: {ds.index}")

# Somehow calling `stack_anomalous` overides `as_index=False`
result = ds.groupby('repeat', as_index=False).apply(lambda x: x.stack_anomalous())

print(f"After: {result.index}")

which gives the following output:

Before: MultiIndex([(-8, -3, -5),
            (-8, -3, -4),
            (-8, -3, -3),
            (-8, -3, -2),
            (-8, -3, -1),
            (-8, -2, -7),
            (-8, -2, -6),
            (-8, -2, -5),
            (-8, -2, -4),
            (-8, -2, -3),
            ...
            ( 8,  2,  4),
            ( 8,  2,  5),
            ( 8,  2,  6),
            ( 8,  2,  7),
            ( 8,  3,  0),
            ( 8,  3,  1),
            ( 8,  3,  2),
            ( 8,  3,  3),
            ( 8,  3,  4),
            ( 8,  3,  5)],
           names=['H', 'K', 'L'], length=24470)
After: MultiIndex([(0, -8, -3, -5),
            (0, -8, -3, -4),
            (0, -8, -3, -3),
            (0, -8, -3, -2),
            (0, -8, -3, -1),
            (0, -8, -2, -7),
            (0, -8, -2, -6),
            (0, -8, -2, -5),
            (0, -8, -2, -4),
            (0, -8, -2, -3),
            ...
            (9, -8, -2, -3),
            (9, -8, -2, -4),
            (9, -8, -2, -5),
            (9, -8, -2, -6),
            (9, -8, -2, -7),
            (9, -8, -3, -1),
            (9, -8, -3, -2),
            (9, -8, -3, -3),
            (9, -8, -3, -4),
            (9, -8, -3, -5)],
           names=[None, 'H', 'K', 'L'], length=44610)

The repeat column still persists in the result dataset.

The text was updated successfully, but these errors were encountered:

kmdalton · 2022-10-19T14:55:32Z

is this a bug? maybe this is just what pandas does when groupby.apply returns a different length dataframe?

kmdalton added bug Something isn't working and removed bug Something isn't working labels Oct 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`stack_anomalous` inside `groupby` breaks `as_index=False` #192

`stack_anomalous` inside `groupby` breaks `as_index=False` #192

kmdalton commented Oct 19, 2022

kmdalton commented Oct 19, 2022

stack_anomalous inside groupby breaks as_index=False #192

stack_anomalous inside groupby breaks as_index=False #192

Comments

kmdalton commented Oct 19, 2022

kmdalton commented Oct 19, 2022

`stack_anomalous` inside `groupby` breaks `as_index=False` #192

`stack_anomalous` inside `groupby` breaks `as_index=False` #192