Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistency between new write and materialized view #136

Open
a12one opened this issue Sep 7, 2019 · 2 comments
Open

Consistency between new write and materialized view #136

a12one opened this issue Sep 7, 2019 · 2 comments

Comments

@a12one
Copy link

a12one commented Sep 7, 2019

I learned about Noria from the TwoSigma talk and I find Noria extremely interesting and it can potentially for a great fit for my use case.

If I understand correctly, the materialized view (cache) is eventually consistent with the new writes but not atomically i.e. the re-calculation of the materialized view is being done async to the write operation (with the machine's best effort)? If this is the case, may I ask if every (transaction of) write operation will trigger a re-calc of the cache? Or the re-calc inside Noria actually has its own interval (if there is, how long?) to check if a re-calc is required i.e. doing it in its own pace - to avoid the "backlog" which can happen (e.g. peak-time) when the write operation is more frequent than the time it takes to do the re-calc of the materialized view?

I understand from another GitHub issue that Noria at the moment is still a research-prototype and probably not as mature as MySQL etc for a general production use case, but may I ask which subset of the system/feature is actually mature enough to use with production data? thank you very much!

@jonathanGB
Copy link
Member

take the following with a grain of salt, I'm not an author of this project :)

Yes the materialized view is eventually consistent. Every write is stored durably, then is propagated throughout the graph to inner and materialized views that depend on the new information. In the meantime, reads can access stale data - which is assumed to be fine for normal usage. Note as well that because we are working with partially materialized views (i.e. entries in the cache can be evicted to maintain a reasonable memory footprint), read queries to a view might require to launch upqueries to fetch information further up the graph - which I believe is done asynchronously.

The underlying storage of the cache uses evmap (which is the double hashmap discussed in the two-sigma presentation). As mentioned in the README:

Specifically, readers only see the operations that preceeded the last call to WriteHandle::refresh by a writer. This lets writers decide how stale they are willing to let reads get. They can refresh the map after every write to emulate a regular concurrent HashMap, or they can refresh only occasionally to reduce the synchronization overhead at the cost of stale reads.

For read-heavy workloads, the scheme used by this module is particularly useful. Writers can afford to refresh after every write, which provides up-to-date reads, and readers remain fast as they do not need to ever take locks.

From this description, I would tend to believe that Noria refreshes after every write, but I might very well be wrong about this. At least, we know that both usages are possible.

For more details, I would recommend reading the paper!

@jonhoo
Copy link
Contributor

jonhoo commented Sep 8, 2019

Hi! Jonathan's summary is essentially correct, though let me try to give a more direct answer to your question. The materialized views in Noria are not re-calculated in the same way that traditional systems that provide materialized views do. Specifically, we do not "refresh" the view at some fixed frequency, or when certain events happen. Instead, every update incrementally updates all dependent views, so the delay between a write and when it is visible should only be the (relatively short) time it takes for the write to go through the query operators and reach the view in question. You can essentially consider this the same as refreshing the view on every write, though in practice it's more efficient than this because a) writes are processed in batch, and b) the view is updated incrementally rather than fully re-executed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants