Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Live queries (push notifications) #143

Open
mitar opened this issue Oct 28, 2019 · 7 comments
Open

Live queries (push notifications) #143

mitar opened this issue Oct 28, 2019 · 7 comments

Comments

@mitar
Copy link

mitar commented Oct 28, 2019

I am moving this into a separate issue from a broad issue of #111.

@mjjansen mentioned push notifications. And @jonhoo replied:

Push notifications (basically, pushing parts of the data-flow to the client) is something that's definitely on our radar, and was actually one of the motivations for using data-flow in the first place. Data-flow is so amenable to distribution that in theory this should just be a matter of moving some of the data-flow nodes to a client machine. In practice it gets a little more tricky though. We don't have an implementation of it currently, and it's not at the top of our roadmap, but it is a feature we'd love to see!

I commented:

So +1 for push notifications (or I would say live query, I think this is the more common term). I do not think Noria has to provide any web API here, just expose things through Rust API, and then users can hook their own logic in Rust to push them to websockets or whatever.

And @jonhoo replied:

So, push notifications are tricky because they imply full materialization everywhere, which comes at a steep cost. There might be a good way to register interest in keys and then subscribe to updates for those keys, but that's not something we're actively working on. Might be a neat additional feature to add eventually though — it shouldn't be too hard, as most of the infrastructure is already there.

@mitar
Copy link
Author

mitar commented Oct 28, 2019

push notifications are tricky because they imply full materialization everywhere

Hm, I am not sure if I follow here. Why would that be the case. It only implies full materialization for active "live queries"?

But couldn't we see the client as where the fully materialized view should happen, while Noria does not have to even store the materialized view itself, just keep computing any changes and pushing them to the client?

One common use case I have seen with live queries where it becomes tricky (and why I got interested in Noria's support for changing queries at runtime) is pagination and infinity scroll. With existing systems which support live queries you have to re-create a live query with new page limits. With Noria you would only update the limits on existing query. In this way you would also limit what has to be fully materialized only to what is really shown to the user in the browser window (plus maybe a bit extra just to compensate for latency). So as you would scroll Noria could be told which values is it able to forget about and which ones to keep up-to-date. And for UI this would translate great: the displayed values outside of the visible area could be kept shown but would not be updating anymore. (Of course if you want to allow ctrl+f search in the browser over latest values you would have to keep updating them.)

One important aspect in practice is that not everything has to update in real-time. To my understanding you are already doing updates in batches inside Noria. And similarly would be for a live query. You would batch updates send to the client. If I could then for each subscription to a live query define the expected interval for this batching, I could control and limit the resource utilization. If I need refresh every 10 seconds, then you do not have to really update the materialized view more often (except if some other query comes in from another client, for same data/view).

I think another question here is about backpressure. What happens when the client (or connection to the client) cannot consume updates as fast as you are generating them. With internal materialized views you maybe do not encounter that. You can write updates to underlying database store, but over the network such limits might happen. Noria could detect this in that case and also slow down in how much it has to keep materialized views up-to-date.

I find this an interesting read. It is a completely opposite design that you have though. :-)

@jonhoo
Copy link
Contributor

jonhoo commented Oct 29, 2019

Sorry, when I say "full materialization", I mean full materialization for the live query, not full materialization as in "all results of all queries". Where this distinction comes up is if you have a query like

SELECT x FROM y WHERE z = ?

Full materialization implies that the results are kept for all values of z, whereas partial materialization would mean that results are only kept for the values of z that the application has queried for. Noria's partial materialization discards writes for keys that have not been read, which means that it does not produce any deltas for them, so the output stream of updates would not have all changes! The reason for this is that we need the current state for many operators in order to produce the delta. Consider something like a count operator. If the count for z = 7 isn't known, then we cannot produce the updated count when we receive a new input with z = 7.

As for pagination and infinite scroll, that's somewhere where I think we'll need support for cursors. In theory this shouldn't be too hard to add to Noria, but it's not something that we currently have.

The other concerns you list (batching and backpressure) are both good points, and would become relevant once we add support for client-side streams. This is something that's been on our roadmap for a while, but the research interests just haven't quite lined up with them yet. That is, there are more important issues to fix first from a research perspective (like fault tolerance and more robust sharding).

@mitar
Copy link
Author

mitar commented Oct 29, 2019

Thanks for this explanation. I am not sure though why would then full materialization be needed and not just partial materialization for particular values of z the user is requesting in their query? So at least, why live queries imply full materialization everywhere? I think it applies non-partial materialization only when using aggregation operators which might require also information about rows not directly exposed to the user in their query (like min and max).

From my experience from other similar systems in fact the hardest problem is handling order + limit. Because every time any other row changes outside of the limit you still have to check if it now falls inside the limit, given the order. So in that case I think you would need something like full materialization for the whole unlimited query, even if just a small subset is really part of the result. Not sure if you are implementing it this way though, it looks inefficient. :-)

As for pagination and infinite scroll, that's somewhere where I think we'll need support for cursors.

Hm, are you sure? I thought that the nature of Noria is that cursors would not really be needed. It would just be a change to an existing query? To increase the limit for example.

@jonhoo
Copy link
Contributor

jonhoo commented Oct 31, 2019

Right, so that was my proposal — a mechanism would have to be added for the application to register interest in particular values for z. Once that's in place, then they could listen for a stream of updates to the materialization of z. I guess my phrasing wasn't great: I meant to say that currently streaming implies full materialization, or we'd need some mechanism for saying "materialize this key and then give me its stream".

Yup, order + limit is tricky, and requires at the very least some kind of support for range indices, which Noria does not currently have (although it is work in progress — cc @jonathanGB). I think effectively the way to implement it is using cursors. The external interface might not need to mention cursors (like in DB2 and friends), but internally it'd likely be implemented as such. You need some kind of representation of how far through a resultset a query has gotten, and where it should continue for subsequent reads.

@mitar
Copy link
Author

mitar commented Oct 31, 2019

I think effectively the way to implement it is using cursors.

I agree. Given how common such use case (pagination or infinity scroll) is in my experience with live queries, I agree that having a special case for that would be probably useful.

@mitar
Copy link
Author

mitar commented Nov 5, 2019

One thing I remembered now is that there is a middle ground here: maybe live queries could work that you just notify the client that the view has changed. And then the client could go back and simply re-query. Given how Noria works, that re-query would be fast/cheap (in comparison with other systems where you have to recompute the whole thing). That would also allow clients to throttle how often they want updates from the server.

@jonhoo
Copy link
Contributor

jonhoo commented Nov 6, 2019

@mitar It's true that Noria could provide that feature too, though that still has the same issue: Noria won't know when a key that hasn't been materialized changes, and so also couldn't inform clients of that case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants