Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DataFrame and Series median method #9483

Merged
merged 8 commits into from Sep 13, 2022
Merged

Conversation

jrbourbeau
Copy link
Member

@jrbourbeau jrbourbeau commented Sep 12, 2022

This PR adds new median and median_approximate methods to dask.dataframe.

Previously there was hesitation around using approximate algorithms in a DataFrame.median method, which could lead to confusion for users who are expecting dask and pandas to always return the same result. This PR proposes we use median in cases where we can provide an exact median calculation (i.e. when axis=1 or when there's only a single partition in the DataFrame/Series). Otherwise, we raise an informative error directing users to median_approximate and (internally) use the existing approximate quantiles implementation.

Closes #4362, supersedes #3819

TODO:

  • Add median methods to dd.Series
  • Add median to groupby aggregations To reduce the scope of this PR, let's handle this in as a follow-up
  • Update Dask DataFrame API docs

@github-actions github-actions bot added the documentation Improve or add to documentation label Sep 12, 2022
@jrbourbeau jrbourbeau mentioned this pull request Sep 13, 2022
7 tasks
@jrbourbeau jrbourbeau changed the title [WIP] Add DataFrame.median method Add DataFrame.median method Sep 13, 2022
Copy link
Collaborator

@ian-r-rose ian-r-rose left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small suggestion, otherwise this looks good to me

dask/dataframe/tests/test_dataframe.py Show resolved Hide resolved
@ian-r-rose
Copy link
Collaborator

Test failure is a known flaky one #8816

@jrbourbeau jrbourbeau changed the title Add DataFrame.median method Add DataFrame and Series median method Sep 13, 2022
@jrbourbeau
Copy link
Member Author

Thanks for reviewing @ian-r-rose!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dataframe documentation Improve or add to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

median function for dask dataframe
2 participants