BUG: Different behavior from .agg("mean") and .agg(["mean"]) on a grouby df with a datetime64[ns] column #47166

leodtprojectsd · 2022-05-30T08:53:53Z

Pandas version checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

from pandas import Timestamp
import pandas as pd 

print ("pandas Version:",  pd.__version__)
#dataframe
df = pd.DataFrame.from_dict({'filename': ['03_', '03_', '03_', '05_', '05_', '05_', '05_', '05_', '08_', '08_'], 
 'date_time': [Timestamp('2022-05-24 12:10:56'), Timestamp('2022-05-24 12:11:24'), Timestamp('2022-05-24 12:11:51'), 
               Timestamp('2022-05-24 12:41:54'), Timestamp('2022-05-24 12:42:21'), Timestamp('2022-05-24 12:42:49'),
               Timestamp('2022-05-24 12:43:16'), Timestamp('2022-05-24 12:43:44'), Timestamp('2022-05-24 12:57:30'), 
               Timestamp('2022-05-24 12:57:58')],
  'r': [80466.36, 71467.12, 72641.21, 76961.35, 86747.23, 81995.81, 74451.46, 69401.51, 73670.12, 78180.65]})

print ("df column types: ", df.info(),)

print ('\nWorks with: df.groupby(["filename"]).agg(["mean"])\n', df.groupby(["filename"]).agg(["mean"]))
print ('\nNot working with: df.groupby(["filename"]).agg("mean")\n', df.groupby(["filename"]).agg("mean"))
print ('\nNot working with: df.groupby(["filename"]).mean()\n', df.groupby(["filename"]).mean())


OUT: 
pandas Version: 1.3.5
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   filename   10 non-null     object        
 1   date_time  10 non-null     datetime64[ns]
 2   r          10 non-null     float64       
dtypes: datetime64[ns](1), float64(1), object(1)
memory usage: 368.0+ bytes
df column types:  None

Works with: df.groupby(["filename"]).agg(["mean"]) #See date_time column appearing
                              date_time          r
                                  mean       mean
filename                                         
03_      2022-05-24 12:11:23.666666752  74858.230
05_      2022-05-24 12:42:48.800000000  77911.472
08_      2022-05-24 12:57:44.000000000  75925.385

Not working with: df.groupby(["filename"]).agg("mean") #date_time column is gone
                   r
filename           
03_       74858.230
05_       77911.472
08_       75925.385

Not working with: df.groupby(["filename"]).mean() #date_time column is gone
                   r
filename           
03_       74858.230
05_       77911.472
08_       75925.385

Issue Description

I expected the same behavior from

df.groupby(["filename"]).agg(["mean"])
df.groupby(["filename"]).agg("mean")
df.groupby(["filename"]).mean()

Instead, when used with a df that has a column with datetime64[ns] data, only .agg(["mean"]) works, while .agg("mean") and .mean() drop the datetime64[ns] column

Expected Behavior

I expect that agg(["mean"]), agg("mean"), and mean(), behave the same.

Installed Versions

INSTALLED VERSIONS ------------------ commit : 66e3805 python : 3.7.13.final.0 python-bits : 64 OS : Linux OS-release : 5.4.188+ Version : #1 SMP Sun Apr 24 10:03:06 PDT 2022 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8

pandas : 1.3.5
numpy : 1.21.6
pytz : 2022.1
dateutil : 2.8.2
pip : 21.1.3
setuptools : 57.4.0
Cython : 0.29.30
pytest : 3.6.4
hypothesis : None
sphinx : 1.8.6
blosc : None
feather : 0.4.1
xlsxwriter : None
lxml.etree : 4.2.6
html5lib : 1.0.1
pymysql : None
psycopg2 : 2.7.6.1 (dt dec pq3 ext lo64)
jinja2 : 2.11.3
IPython : 5.5.0
pandas_datareader: 0.9.0
bs4 : 4.6.3
bottleneck : 1.3.4
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.2.2
numexpr : 2.8.1
odfpy : None
openpyxl : 3.0.10
pandas_gbq : 0.13.3
pyarrow : 6.0.1
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.4.36
tables : 3.7.0
tabulate : 0.8.9
xarray : 0.20.2
xlrd : 1.1.0
xlwt : 1.3.0
numba : 0.51.2
None

The text was updated successfully, but these errors were encountered:

guyrt · 2022-05-31T14:56:05Z

FWIW, in v1.2.5 none of these options operate on the datatime64[ns] column! @leodtprojectsd are you working on a PR for this bug? If not, I'd like to work on one.

rhshadrach · 2022-05-31T21:54:52Z

Thanks for the report! When using list or dict in agg, the DataFrame is broken up into Series before each function is applied. What you're seeing is the difference in numeric_only between DataFrame.groupby(...).mean and Series.groupby(...).mean. See:

https://pandas.pydata.org/pandas-docs/dev/user_guide/groupby.html#automatic-exclusion-of-nuisance-columns
https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.GroupBy.mean.html

You can get the same result with

print(df.groupby("filename").agg("mean", numeric_only=False))

So I think that makes this a duplicate of #46560.

rhshadrach · 2022-05-31T21:57:35Z

I'm going to close this as a duplicate - @guyrt and @leodtprojectsd please reply here if you believe I've missed something and happy to reopen.

leodtprojectsd · 2022-06-01T07:16:53Z

@guyrt, wasn't working on it, but I think @rhshadrach reply covers it, thanks!

leodtprojectsd added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels May 30, 2022

leodtprojectsd changed the title ~~BUG:~~ BUG: Different behavior from .agg("mean") and .agg(["mean"]) on a grouby df with a datetime64[ns] column May 30, 2022

rhshadrach added Groupby Nuisance Columns Identifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.apply Duplicate Report Duplicate issue or pull request and removed Needs Triage Issue that has not been reviewed by a pandas team member labels May 31, 2022

rhshadrach closed this as not planned Won't fix, can't repro, duplicate, stale May 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Different behavior from .agg("mean") and .agg(["mean"]) on a grouby df with a datetime64[ns] column #47166

BUG: Different behavior from .agg("mean") and .agg(["mean"]) on a grouby df with a datetime64[ns] column #47166

leodtprojectsd commented May 30, 2022 •

edited

guyrt commented May 31, 2022

rhshadrach commented May 31, 2022 •

edited

rhshadrach commented May 31, 2022

leodtprojectsd commented Jun 1, 2022

BUG: Different behavior from .agg("mean") and .agg(["mean"]) on a grouby df with a datetime64[ns] column #47166

BUG: Different behavior from .agg("mean") and .agg(["mean"]) on a grouby df with a datetime64[ns] column #47166

Comments

leodtprojectsd commented May 30, 2022 • edited

Pandas version checks

Reproducible Example

Issue Description

Expected Behavior

Installed Versions

guyrt commented May 31, 2022

rhshadrach commented May 31, 2022 • edited

rhshadrach commented May 31, 2022

leodtprojectsd commented Jun 1, 2022

leodtprojectsd commented May 30, 2022 •

edited

rhshadrach commented May 31, 2022 •

edited