Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raise errors for unsupported operations on certain types #15712

Merged
merged 12 commits into from
May 21, 2024

Conversation

galipremsagar
Copy link
Contributor

@galipremsagar galipremsagar commented May 10, 2024

Description

Fixes: #15668

This PR raises errors for groupby operations on un-supported types.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@galipremsagar galipremsagar added bug Something isn't working breaking Breaking change labels May 10, 2024
@galipremsagar galipremsagar self-assigned this May 10, 2024
@github-actions github-actions bot added the cuDF (Python) Affects Python cuDF API. label May 10, 2024
@galipremsagar galipremsagar marked this pull request as ready for review May 17, 2024 15:06
@galipremsagar galipremsagar requested a review from a team as a code owner May 17, 2024 15:06
@@ -167,6 +167,46 @@ cdef class GroupBy:
included_aggregations_i = []
col_aggregations = []
for agg in aggs:
str_agg = str(agg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Is it not enough to raise the TypeError if the if valid_aggregations == "ALL" or agg_obj.kind in valid_aggregations: condition below fails?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally yes, but that is a very loose condition for many operations vs dtype matrix we have and libcudf seems to be silently returning empty columns for those. We need to dig deeper for other types aswell.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I suppose this PR is a good enough intermediate solution until we can centeralize this in get_valid_aggregation or similar.

@galipremsagar
Copy link
Contributor Author

/merge

@galipremsagar galipremsagar added the 5 - Ready to Merge Testing and reviews complete, ready to merge label May 21, 2024
@rapids-bot rapids-bot bot merged commit 6b1248e into rapidsai:branch-24.06 May 21, 2024
69 checks passed
@galipremsagar galipremsagar mentioned this pull request May 21, 2024
3 tasks
rapids-bot bot pushed a commit that referenced this pull request May 22, 2024
After we made our groupby fail more aggressively for unsupported types in #15712, `Groupby.collect` started to fail on string column, where this isn't a supported aggregation on string column in pandas and this method doesn't exist in pandas Groupby, hence this PR suggest the alternative equivalent and deprecates the API to be removed in next release.

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)
  - Richard (Rick) Zamora (https://github.com/rjzamora)

Approvers:
  - Matthew Roeschke (https://github.com/mroeschke)
  - Richard (Rick) Zamora (https://github.com/rjzamora)

URL: #15808
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge breaking Breaking change bug Something isn't working cuDF (Python) Affects Python cuDF API.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[BUG] Groupby operations should fail on un-supported types instead of passing silently
2 participants