New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable column projection for groupby slicing #9667
Conversation
dask/dataframe/groupby.py
Outdated
|
||
# Check if we can project columns | ||
projection = None | ||
if isinstance(self._slice, (str, list)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What other types are valid here?
EDIT: Aside from None
, given that this is an optional kwarg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The slice=
kwarg is set by the key in a DataFrameGroupBy.__getitem__
call. Therefore, I suppose it can be anything representing a column selection. I opted to keep the projection simple here for now, but we certainly can generalize this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully the latest code supports all (reasonable) self._slice
types (numpy scalar, str, list, tuple, index-like, and series-like)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think tuples are just as common as lists. We're also handling this for by
.
If we include these suggestions, I'm +1 for merging. I think this is a really nice improvement! Thanks @rjzamora
Co-authored-by: Florian Jetter <fjetter@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After #9442, dask will insert a
getitem
operation to enable column projection for operations likegroupby().agg
. As pointed out in this external commit, dask does not yet capture a simplegroupby().[<slice>]
case.This PR simply adds a
getitem
operation for groupby slicing.pre-commit run --all-files