Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect output for nested reducers #3417

Open
samukweku opened this issue Jan 31, 2023 · 4 comments
Open

Incorrect output for nested reducers #3417

samukweku opened this issue Jan 31, 2023 · 4 comments
Assignees
Labels
bug Any bugs / errors in datatable; however for severe bugs use [segfault] label groupby Group-by functionality and Reducers

Comments

@samukweku
Copy link
Collaborator

samukweku commented Jan 31, 2023

Nesting aggregation calls sometimes produces incorrect results

>>> from datatable import dt, f
>>> DT = dt.Frame([1, 2])
>>> DT[:, dt.mean(dt.sum(f.C0))]
   |         C0
   |    float64
-- + ----------
 0 | 2.8823e+18
[1 row x 1 column]
>>> DT[:, dt.sum(dt.mean(f.C0))]
   |          C0
   |     float64
-- + -----------
 0 | 2.31584e+77
[1 row x 1 column]

The expected behavior is

>>> DT[:, dt.mean(dt.sum(f.C0))]
   |    C0
   | float64
-- + -----
 0 |     3
[1 row x 1 column]
>>> DT[:, dt.sum(dt.mean(f.C0))]
   |      C0
   | float64
-- + -------
 0 |     1.5
[1 row x 1 column]

datatable version: 1.1
python version: 3.9
operating system: linux

@samukweku
Copy link
Collaborator Author

samukweku commented Mar 14, 2023

Another example, this time for countna():

>>> DT = dt.Frame(G=[1,1,1,2,2,2], V=[None, None, None, None, 3, 5])
>>> DT[:, dt.countna(dt.mean(f.V)), dt.by(f.G)] # wrong output
   |     G      V
   | int32  int64
-- + -----  -----
 0 |     1      3
 1 |     2      0
[2 rows x 2 columns]

Expected output

   |     G      V
   | int32  int64
-- + -----  -----
 0 |     1      1
 1 |     2      0
[2 rows x 2 columns]

oleksiyskononenko added a commit that referenced this issue Mar 21, 2023
Chained reducers is a pretty rare use case, because the second reducer gets a scalar (per group) as an input. Some SQL engines don't even allow that. However, we never seems to forbid chaining explicitly, at the same time, we had no tests and didn't think such a use case through. As a result, we either produced wrong results (see #3417) or even segfaulted when chaining was involved. In this PR we fix all the known related issues and add some tests. We also

- refactor `FExpr` reducers to inherit from `FExpr_ReduceUnary`;
- improve reducers logic with respect to grouped/non-grouped frames.

WIP for #3417
@oleksiyskononenko oleksiyskononenko added this to the Release 1.1.0 milestone Mar 21, 2023
@oleksiyskononenko oleksiyskononenko added bug Any bugs / errors in datatable; however for severe bugs use [segfault] label groupby Group-by functionality and Reducers labels Mar 21, 2023
@oleksiyskononenko oleksiyskononenko self-assigned this Mar 21, 2023
@oleksiyskononenko
Copy link
Contributor

This has been resolved for FExprs. Once all the reducers are converted from Expr to FExpr the problem should be gone

@oleksiyskononenko oleksiyskononenko changed the title [BUG] Incorrect output for nested aggregation calls Incorrect output for nested reducers Mar 21, 2023
samukweku pushed a commit that referenced this issue Mar 27, 2023
Chained reducers is a pretty rare use case, because the second reducer gets a scalar (per group) as an input. Some SQL engines don't even allow that. However, we never seems to forbid chaining explicitly, at the same time, we had no tests and didn't think such a use case through. As a result, we either produced wrong results (see #3417) or even segfaulted when chaining was involved. In this PR we fix all the known related issues and add some tests. We also

- refactor `FExpr` reducers to inherit from `FExpr_ReduceUnary`;
- improve reducers logic with respect to grouped/non-grouped frames.

WIP for #3417
@samukweku
Copy link
Collaborator Author

closing this as nested reduction is fixed:

DT[:, dt.mean(dt.sum(f.C0))]
Out[3]: 
   |      C0
   | float64
-- + -------
 0 |       3
[1 row x 1 column]

@oleksiyskononenko
Copy link
Contributor

It is only fixed for FExprs and still persists for Exprs, I guess it should only be closed once we get rid of Exprs.

@samukweku samukweku reopened this Apr 24, 2023
@st-pasha st-pasha removed this from the Release 1.1.0 milestone Nov 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Any bugs / errors in datatable; however for severe bugs use [segfault] label groupby Group-by functionality and Reducers
Projects
None yet
Development

No branches or pull requests

3 participants