Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Having mutliple aggregates in select errors #137

Open
JustusAdam opened this issue Sep 11, 2019 · 5 comments
Open

Having mutliple aggregates in select errors #137

JustusAdam opened this issue Sep 11, 2019 · 5 comments
Labels
bug Something isn't working question Further information is requested

Comments

@JustusAdam
Copy link
Collaborator

Noria seems not to allow for a select query with two simultaneous aggregations.

I set up this very simple table:

CREATE TABLE tab (x int, y int, PRIMARY KEY(x));

And a query with two aggregations:

VIEW test: 
    SELECT count(y), sum(y) 
    FROM tab 
    WHERE x = ? 
    GROUP BY x;

Which throws a cannot group by aggregation column in
noria-server/dataflow/src/ops/grouped/aggregate.rs:27:9.

By contrast if the select is only count(x) or sum(x) it works just fine.
I should also note that it also errors on count(*), sum(y).

My question would be whether this query is correct actually should work?

My guess is that it doesn't due to how the query graph is constructed. (One
aggregate becomes the successor of the other and does not get the initial set of
records/has its automatic group/state key include the computed column from the
ancestor.)

@JustusAdam JustusAdam added bug Something isn't working question Further information is requested labels Sep 11, 2019
@jonhoo
Copy link
Contributor

jonhoo commented Sep 11, 2019

I think the underlying issue here is that the query as written requires a single aggregation that computes multiple columns, which isn't currently implemented. The workaround would be to compute the two values separately and then JOIN them on x.

@ms705
Copy link
Member

ms705 commented Sep 11, 2019

The more technical reason is that this will result in a chain of two aggregation operators, both of which need y as an input column. The query planner lays them out in sequence, so the first operator produces output columns (x, count(y)). Note that this does not include y. Noria detects this and "pulls" y through the operator as a group column, so that it produces (x, y, count(y)). However, this (correctly) triggers the assertion, since y now shows up both as a group column and as the aggregated column.

The proper fix is for the query planner to compute these aggregations separately and concatenate the resulting columns (an operator we don't support yet), or to generate a single operator that does both aggregations at the same time (probably how you got here ;-) ).

EDIT: my earlier proposed nested view workaround doesn't work for the same reason (it would have to group by the aggregation column), so @jonhoo's join-based workaround is currently the only way to do this.

@JustusAdam
Copy link
Collaborator Author

Thanks that's pretty much what I expected. I kinda want to express a similar pattern in my project so I'll have to figure out a way to teach the internals to do this. I'll let you know if I make any progress.

Anyway feel free to close this if you like.

@JustusAdam
Copy link
Collaborator Author

The workaround proposed by jon worked, thanks.

@jonhoo jonhoo reopened this Oct 29, 2019
@jonhoo
Copy link
Contributor

jonhoo commented Oct 29, 2019

I'm going to leave this open since it is something we should fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants