REGR: Groupby first/last/nth treats None as an observation #38330

rhshadrach · 2020-12-06T16:13:18Z

closes BUG: GroupBy.first does not skip missing values in string-valued columns #38286
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

This is a regression from 1.0.x to 1.1.x, introduced by #33462. Assuming it doesn't get into the 1.2 rc, not sure if it should go into 1.2 during the rc phase or wait until 1.3.

Had to decide on some edge-case behaviors in odd situations with missing values, see #38286 (comment)

jreback

pls find out when this was originally changed

this was on purpose

rhshadrach · 2020-12-06T16:21:48Z

@jreback This was originally changed in #33462 so that

pd.DataFrame({"id": ["a"], "value": [None]}, dtype=object).groupby("id").first()

returns None rather than np.nan. That behavior is retained here.

jreback · 2020-12-06T16:25:54Z

so you think that the original issues are wrong? this is deleting that test case and reverting

simonjayhawkins · 2020-12-06T16:25:56Z

This is a regression from 1.0.x to 1.1.x

if this is the case, should go in 1.1.5.

rhshadrach · 2020-12-06T16:29:28Z

@jreback I believe this is reverting the fix in #33462 and solving the issue there in the proper manner, along with the regression it caused. The original test case still exists, is the first parameter to test_first_last_with_none

rhshadrach · 2020-12-06T16:29:55Z

I can restore the original test and add the new one for the regression if that's preferable.

jreback · 2020-12-06T16:47:03Z

yes if this also doesn't change the original issue then leave the tests

ok for 1.1.5

rhshadrach · 2020-12-06T17:08:38Z

@jreback test restored, added to whatsnew 1.5

jreback · 2020-12-06T17:11:46Z

doc/source/whatsnew/v1.1.5.rst

@@ -27,6 +27,7 @@ Fixed regressions
 - Fixed regression in :meth:`DataFrame.fillna` not filling ``NaN`` after other operations such as :meth:`DataFrame.pivot` (:issue:`36495`).
 - Fixed performance regression in ``df.groupby(..).rolling(..)`` (:issue:`38038`)
 - Fixed regression in :meth:`MultiIndex.intersection` returning duplicates when at least one of the indexes had duplicates (:issue:`36915`)
+- Fixed regression in :meth:`.GroupBy.first`, :meth:`.GroupBy.last`, and :meth:`.GroupBy.nth` where ``None`` was considered a non-NA value (:issue:`38286`)


this doesn't affect nth right?

Indeed, thanks. Fixed.

jreback

can u run some asvs / perf for this
i am worried we are converting to object type

rhshadrach · 2020-12-06T18:34:16Z

@jreback asv run with asv continuous -f 1.1 upstream/master HEAD -b ^groupby resulted in BENCHMARKS NOT SIGNIFICANTLY CHANGED.

Linux py39 failure is unrelated:

FAILED pandas/tests/io/pytables/test_store.py::TestHDFStore::test_append_with_data_columns
= 1 failed, 143282 passed, 5482 skipped, 1091 xfailed, 4 warnings in 1132.31s (0:18:52) =

jreback · 2020-12-06T18:46:52Z

thanks for the quick patch @rhshadrach

simonjayhawkins · 2020-12-06T18:49:41Z

@meeseeksdev backport 1.1.x

…e as an observation

…servation (#38333) Co-authored-by: Richard Shadrach <45562402+rhshadrach@users.noreply.github.com>

amalcgcg · 2020-12-07T17:56:28Z

We just upgraded to 1.1.5 this morning, and it works great! Thank you for the blazing fast discussion, decision, fix, merge, and backport!

BUG: Groupby first/last/nth treats None as an observation

46a80a7

rhshadrach changed the title ~~BUG: Groupby first/last/nth treats None as an observation~~ REGR: Groupby first/last/nth treats None as an observation Dec 6, 2020

rhshadrach added Groupby Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Regression Functionality that used to work in a prior pandas version labels Dec 6, 2020

jreback requested changes Dec 6, 2020

View reviewed changes

rhshadrach added 2 commits December 6, 2020 11:55

Reverted test changes, whatsnew

f93a8bd

Reverted test changes, whatsnew

7753f39

jreback reviewed Dec 6, 2020

View reviewed changes

jreback added this to the 1.1.5 milestone Dec 6, 2020

Remove nth from whatsnew

54c1a0b

jreback reviewed Dec 6, 2020

View reviewed changes

jreback approved these changes Dec 6, 2020

View reviewed changes

jreback merged commit 55e3bff into pandas-dev:master Dec 6, 2020

This comment has been minimized.

Sign in to view

lumberbot-app bot added the Still Needs Manual Backport label Dec 6, 2020

simonjayhawkins pushed a commit to simonjayhawkins/pandas that referenced this pull request Dec 6, 2020

Backport PR pandas-dev#38330: REGR: Groupby first/last/nth treats Non…

08cf43f

…e as an observation

simonjayhawkins mentioned this pull request Dec 6, 2020

Backport PR #38330: REGR: Groupby first/last/nth treats None as an observation #38333

Merged

simonjayhawkins removed the Still Needs Manual Backport label Dec 6, 2020

simonjayhawkins added a commit that referenced this pull request Dec 6, 2020

Backport PR #38330: REGR: Groupby first/last/nth treats None as an ob…

247ecef

…servation (#38333) Co-authored-by: Richard Shadrach <45562402+rhshadrach@users.noreply.github.com>

rhshadrach deleted the groupby_nth_obj branch December 7, 2020 21:55

dependabot bot mentioned this pull request Mar 16, 2021

Bump pandas from 1.1.4 to 1.1.5 kotamya/pandas-estat#82

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REGR: Groupby first/last/nth treats None as an observation #38330

REGR: Groupby first/last/nth treats None as an observation #38330

rhshadrach commented Dec 6, 2020 •

edited

jreback left a comment

rhshadrach commented Dec 6, 2020 •

edited

jreback commented Dec 6, 2020

simonjayhawkins commented Dec 6, 2020

rhshadrach commented Dec 6, 2020

rhshadrach commented Dec 6, 2020

jreback commented Dec 6, 2020

rhshadrach commented Dec 6, 2020

jreback Dec 6, 2020

rhshadrach Dec 6, 2020

jreback left a comment

rhshadrach commented Dec 6, 2020

jreback commented Dec 6, 2020

simonjayhawkins commented Dec 6, 2020

This comment has been minimized.

amalcgcg commented Dec 7, 2020

REGR: Groupby first/last/nth treats None as an observation #38330

REGR: Groupby first/last/nth treats None as an observation #38330

Conversation

rhshadrach commented Dec 6, 2020 • edited

jreback left a comment

Choose a reason for hiding this comment

rhshadrach commented Dec 6, 2020 • edited

jreback commented Dec 6, 2020

simonjayhawkins commented Dec 6, 2020

rhshadrach commented Dec 6, 2020

rhshadrach commented Dec 6, 2020

jreback commented Dec 6, 2020

rhshadrach commented Dec 6, 2020

jreback Dec 6, 2020

Choose a reason for hiding this comment

rhshadrach Dec 6, 2020

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

rhshadrach commented Dec 6, 2020

jreback commented Dec 6, 2020

simonjayhawkins commented Dec 6, 2020

This comment has been minimized.

amalcgcg commented Dec 7, 2020

rhshadrach commented Dec 6, 2020 •

edited

rhshadrach commented Dec 6, 2020 •

edited