pivot_table with aggfunc='min'? #9722
Replies: 1 comment
-
Hey @sorenwacker. That's a good question. Pivot tables, in my mind, are meant to be visually consumed. In other words, if your large-than-memory dask dataframe can be aggregated, pivoted, and then viewed, it'd be better to do the aggregation in dask, materialize to pandas, and then pivot with your pandas dataframe. In other words, a pivoted, out-of-memory dataframe is just as difficult to visually consume as a non-pivoted out-of-memory dataframe, it makes more sense to continue to operateon /aggregate the data in "tidy" format until you can bring it into memory. For that reason, I don't think pivot tables have ever been a high-priority for the development team. That said, the team is generally receptive to PRs that improve feature parity with pandas. In fact, here's one such case where someone noticed that pivot_table didn't support And then here's their PR where it was added: #8649 |
Beta Was this translation helpful? Give feedback.
-
Hi,
I came across these post on stackoverflow, where someone asked, if it is possible to do a pivot_table with aggfunc = 'min'. That was in August 2019 and this feature has not yet found its way into the repository. I don't know if such a feature request was ever created, but I wonder why that feature was never implemented. Is there a techincal barrier that makes it particularily difficult to implement the minimum function for a pivot_table aggregation in dask? If someone know the details, I would be interested what these difficulties are. Maybe someone can point me to a resource to read, like a discussion, or documentation, if such things are discussed there in enough detail to understand the issue.
Beta Was this translation helpful? Give feedback.
All reactions