Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overlapping partial key error with count(*) #145

Open
JustusAdam opened this issue Oct 29, 2019 · 2 comments
Open

Overlapping partial key error with count(*) #145

JustusAdam opened this issue Oct 29, 2019 · 2 comments

Comments

@JustusAdam
Copy link
Collaborator

JustusAdam commented Oct 29, 2019

So I am trying to run this query (follows at the end), which I split into multiple subviews to circumvent some of the current quirks of the SQL noria supports.

Its rather long, I apologize, but because I don't really understand what the error is trying to tell me I am unable to reduce the example.

The actual failure occurs in noria-server/src/controller/migrate/materialization/mod.rs:606:41 and reads

Oct 29 10:58:28.895 CRIT partially overlapping partial indices, conflict: 1, cols: [Some(0), Some(1)], child: 16, pcols: [0], parent: 16

The part of the query affected is the aggregation node produced by the count(*) in pageview_counts.

And here is the query graph, as dumped by the system:
fail.dot.pdf

I previously had the error that it could not find a bogokey to aggregate the count(*) over. I then changed pageview_counts1 to also SELECT ts2, which I believe is what it is now using for the count(*).

It may also be that the conflict is between the GROUP BY, which is over user_id ts1and the actual key used for result lookups which is justuser_id`.

Let me know if you have any idea how to fix this or what I could be using as a workaround.

CREATE TABLE clicks
(user_id int,
 pagetype int,
 ts int);

-- Workaround because tables cant join on themselves
clicks2:
SELECT *
FROM clicks;

candidate_paths0:
SELECT
  c1.user_id,
  c1.ts as ts1,
  c2.ts as ts2, 
FROM
  clicks c1 JOIN
  clicks2 c2 ON c1.user_id = c2.user_id
WHERE
  c1.pagetype = 0 AND
  c2.pagetype = 1;

candidate_paths:
SELECT
  user_id,
  ts1,
  ts2
FROM
  candidate_paths0
WHERE
  ts1 < ts2
ORDER BY
  user_id, ts1, ts2
;

matching_paths:
SELECT
  user_id, max(ts1) as ts1, ts2
FROM candidate_paths
GROUP BY user_id, ts2;

pageview_counts0:
SELECT c.user_id, ts1, ts2, ts
FROM
  clicks c JOIN
  matching_paths ON c.user_id = matching_paths.user_id;

pageview_counts1:
SELECT
  user_id,
  ts1,
  ts2
FROM
  pageview_counts0
WHERE
  ts1 <= ts AND
  ts2 >= ts;

pageview_counts:
SELECT
  user_id,
  count(*) as pageview_count
FROM
  pageview_counts1
GROUP BY
  user_id, ts1;

VIEW
clickstream_ana:
SELECT
  user_id,
  sum(pageview_count)
FROM pageview_counts
WHERE user_id = ?;
@jonhoo
Copy link
Contributor

jonhoo commented Oct 29, 2019

I'm currently away at SOSP, so can't dig too much into this at the moment, but that error was added a while ago when we realized that having two partial indexes that share some number of columns can lead to some weird upquery behavior. I can't immediately remember why though. In theory multiple indexes should be entirely independent, and shoudn't interact. It might be that there's some code that looks for any index that contains a column, rather than an index what consists only of the lookup column. You could try to comment that check out and see if things still fail in weird ways?

@JustusAdam
Copy link
Collaborator Author

Okay. That's definitely something I can try. I have a test suite set up to check the results, so that should be good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants