Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter out numeric_only warnings from pandas #9496

Merged
merged 13 commits into from Sep 15, 2022

Conversation

jrbourbeau
Copy link
Member

This PR is a temporary workaround for #9471.

@jrbourbeau jrbourbeau self-assigned this Sep 14, 2022
@jrbourbeau jrbourbeau mentioned this pull request Sep 14, 2022
7 tasks
Copy link
Member Author

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so I think this is ready for a review now. The goal of this PR is to filter out all of the numeric_only deprecation warnings that will start being emitted in pandas=1.5. Note that this doesn't resolve #9471 -- it's just a bandaid. However, given pandas=1.5 is scheduled for release on Monday, I think this is an okay bandaid to apply. We will need to go back and more fully support the numeric_only changes in follow-up though.

@jrbourbeau jrbourbeau changed the title [WIP] Filter out numeric_only warnings from pandas Filter out numeric_only warnings from pandas Sep 14, 2022
axis=axis,
skipna=skipna,
split_every=split_every,
out=out,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the min and max cases we are not passing numeric_only, because for these cases are part of " that users get a deprecation messages if (a) the defaults are used and (b) the result would change based on the new default (False) in 2.0." ? citing this comment pandas-dev/pandas#46560 (comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Locally I saw strange test failures when passing numeric_only here. Additionally, we shouldn't need to pass numeric_only through as the @_numeric_only decorator should handle selecting only numeric data for us automatically

@@ -2348,6 +2402,7 @@ def std(
skipna=skipna,
ddof=ddof,
enforce_metadata=False,
numeric_only=numeric_only,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a comment for line 2680.
kurtosis_shape = num._meta_nonempty.values.var(axis=0).shape since we are checking the meta_nonempty, don't we need the check_numeric_only_deprecation here too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're using .values to get a NumPy array we shouldn't trigger the warning from pandas

@@ -2671,7 +2731,7 @@ def sem(self, axis=None, skipna=True, ddof=1, split_every=False, numeric_only=No
result.divisions = (self.columns.min(), self.columns.max())
return result

def quantile(self, q=0.5, axis=0, method="default"):
def quantile(self, q=0.5, axis=0, numeric_only=True, method="default"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at pandas to check for consistency. this default will be a no_default in 1.5, and it says in the future it'll be False. see https://pandas.pydata.org/docs/dev/reference/api/pandas.DataFrame.quantile.html

Do we need to warn here too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not warn here now as numeric_only=False isn't supported in Dask today. Let's handle this in follow-up work though

@@ -1216,7 +1248,7 @@ def test_reductions_frame_dtypes_numeric_only():
}
)

ddf = dd.from_pandas(df, 3)
ddf = dd.from_pandas(df, 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this cosmetic, or there is a reason to only have one partition?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good catch. I changed npartitions to 1 while debugging, but forgot to change it back. Will fix that

@ncclementi
Copy link
Member

Leaving this here for reference pandas-dev/pandas#46560

@jrbourbeau
Copy link
Member Author

Thanks for reviewing @ncclementi!

As a side note, @ian-r-rose mentioned offline that he was fine with the changes here but thought it would be nice to be more explicit about passing the name of the method called in _getattr_numeric_only. I agree and 3a79096 should handle that accordingly.

dask/dataframe/core.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@ian-r-rose ian-r-rose left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jrbourbeau!

@ian-r-rose
Copy link
Collaborator

Looks good to me, everything is passing except for other known upstream issues that are in flight in other PRs

@ian-r-rose ian-r-rose merged commit 1a8533f into dask:main Sep 15, 2022
@jrbourbeau jrbourbeau deleted the numeric_only-warnings branch September 16, 2022 01:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants