-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect output for nested reducers #3417
Comments
Another example, this time for >>> DT = dt.Frame(G=[1,1,1,2,2,2], V=[None, None, None, None, 3, 5])
>>> DT[:, dt.countna(dt.mean(f.V)), dt.by(f.G)] # wrong output
| G V
| int32 int64
-- + ----- -----
0 | 1 3
1 | 2 0
[2 rows x 2 columns] Expected output | G V
| int32 int64
-- + ----- -----
0 | 1 1
1 | 2 0
[2 rows x 2 columns] |
This was referenced Mar 19, 2023
oleksiyskononenko
added a commit
that referenced
this issue
Mar 21, 2023
Chained reducers is a pretty rare use case, because the second reducer gets a scalar (per group) as an input. Some SQL engines don't even allow that. However, we never seems to forbid chaining explicitly, at the same time, we had no tests and didn't think such a use case through. As a result, we either produced wrong results (see #3417) or even segfaulted when chaining was involved. In this PR we fix all the known related issues and add some tests. We also - refactor `FExpr` reducers to inherit from `FExpr_ReduceUnary`; - improve reducers logic with respect to grouped/non-grouped frames. WIP for #3417
oleksiyskononenko
added
bug
Any bugs / errors in datatable; however for severe bugs use [segfault] label
groupby
Group-by functionality and Reducers
labels
Mar 21, 2023
This has been resolved for |
oleksiyskononenko
changed the title
[BUG] Incorrect output for nested aggregation calls
Incorrect output for nested reducers
Mar 21, 2023
samukweku
pushed a commit
that referenced
this issue
Mar 27, 2023
Chained reducers is a pretty rare use case, because the second reducer gets a scalar (per group) as an input. Some SQL engines don't even allow that. However, we never seems to forbid chaining explicitly, at the same time, we had no tests and didn't think such a use case through. As a result, we either produced wrong results (see #3417) or even segfaulted when chaining was involved. In this PR we fix all the known related issues and add some tests. We also - refactor `FExpr` reducers to inherit from `FExpr_ReduceUnary`; - improve reducers logic with respect to grouped/non-grouped frames. WIP for #3417
closing this as nested reduction is fixed: DT[:, dt.mean(dt.sum(f.C0))]
Out[3]:
| C0
| float64
-- + -------
0 | 3
[1 row x 1 column] |
It is only fixed for FExprs and still persists for Exprs, I guess it should only be closed once we get rid of Exprs. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Nesting aggregation calls sometimes produces incorrect results
The expected behavior is
datatable version: 1.1
python version: 3.9
operating system: linux
The text was updated successfully, but these errors were encountered: