Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix flaky test_dataframe_aggregations_multilevel #9701

Merged
merged 4 commits into from Dec 1, 2022

Conversation

rjzamora
Copy link
Member

@rjzamora rjzamora commented Nov 29, 2022

I'm struggling to reproduce the recent groupby CI failures locally. However, it seems possible that these errors could be the result of _mul_cols being passed an empty group within apply. This PR adds a possible fix if this happens to be the root cause.

@rjzamora rjzamora added dataframe bug Something is broken labels Nov 29, 2022
@rjzamora rjzamora changed the title [Experiment] Try to fix groupby-CI failures Fix groupby-CI failures Nov 29, 2022
@rjzamora rjzamora marked this pull request as ready for review November 29, 2022 23:46
@rjzamora
Copy link
Member Author

Update: Seems like this may fix the problem - Just added a comment, and re-running CI.

Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woo, thanks @rjzamora! Just to clarify, this should close #8795, correct?

dask/dataframe/groupby.py Outdated Show resolved Hide resolved
Co-authored-by: James Bourbeau <jrbourbeau@users.noreply.github.com>
@rjzamora
Copy link
Member Author

Just to clarify, this should close #8795, correct?

I think so, but not completely sure. Honestly not sure what else may be causing the problem.

@jrbourbeau
Copy link
Member

I see a similar thing happening here

def _drop_duplicates_reindex(df):
# Fix index in a groupby().apply() context
# https://github.com/dask/dask/issues/8137
# https://github.com/pandas-dev/pandas/issues/43568
result = df.drop_duplicates()
result.index = [0] * len(result)
return result

I'll suggest we update there too and then merge this PR in.

The change here seems consistent with fixing #8795 and also straightforward. We can always reopen #8795 if we see the flaky test failure again.

Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rjzamora!

@jrbourbeau jrbourbeau changed the title Fix groupby-CI failures Fix flaky test_dataframe_aggregations_multilevel Dec 1, 2022
@jrbourbeau jrbourbeau merged commit f309f9f into dask:main Dec 1, 2022
@rjzamora rjzamora deleted the fix-_mul_cols branch December 1, 2022 02:20
@hendrikmakait hendrikmakait mentioned this pull request Jan 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is broken dataframe
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants