Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User defined functions - wasm? #144

Open
mitar opened this issue Oct 28, 2019 · 4 comments
Open

User defined functions - wasm? #144

mitar opened this issue Oct 28, 2019 · 4 comments

Comments

@mitar
Copy link

mitar commented Oct 28, 2019

In #111 I have seen mention of user defined functions. I think for Noria it probably makes sense that there are two kinds of them:

  • Some which are defined in terms of data-flow primitive operations. I think existing internal views you can make are a case of that. But probably/maybe having some slightly richer way to express those (or just more basic operations) would be useful.
  • An opaque imperative functions one could define from the app. For those, I would suggest that maybe Noria simply uses wasm as the language to define those. Instead of getting into the hell of supporting a wide range of custom languages. One would then have to define just what are some properties of this opaque function (deterministic vs. not) for dataflow to be able to be computed correctly.

Am I missing something here? Are there some fundamental issues with providing support for UDF? What metadata about a function would Noria need? Typing information for inputs and outputs?

@jonhoo
Copy link
Contributor

jonhoo commented Oct 31, 2019

The semantic requirements for operators in Noria are currently somewhat ill-defined, and getting them right in the presence of partial is pretty finicky. We think it should absolutely be doable, it would just require fully working out the contract such operators need to adhere to. For example, we know that all Noria operators must be commutative over their inputs (e.g., no SHA1 operator), and we know that they must be deterministic.

@ms705
Copy link
Member

ms705 commented Oct 31, 2019

@JustusAdam is working on UDF support for his masters thesis, with UDFs written in a subset of Rust.

As @jonhoo points out, the challenge with UDFs is to check that they are compatible with (a) incremental execution, (b) the invariants of the partially-stateful dataflow model, and (c) with our implementation invariants (e.g., a streaming SHA1 doesn't work because its sequence of outputs does not commute over the order of input arrivals; note that a normal, one-off SHA1 on a column value is fine).

@mitar
Copy link
Author

mitar commented Oct 31, 2019

Hm, you want to check that they are compatible? Not just require from the author of UDF to assure that?

I find wasm really cool for such embedded operations. One can then compile into it from different other languages, and it is then easy to embed it into Rust and just run it. So instead of building your own custom Rust-like language and requiring everyone to write in that. From app perspective, I can then write my app in JavaScript, connect to Noria, and write my UDFs in JavaScript as well, compiling it down to wasm and giving it to Noria to run. So from app's perspective everything can be in the same language. Pretty cool.

@JustusAdam
Copy link
Collaborator

The choice of language is by far the smallest problem here and the fact that my prototype uses a custom Rust dialect has nothing to do with whether or not a UDF will work in noria. (It has instead to do with optimisations and parallelizing)

In the end it will always fall to the user to ensure that custom operators conform to the semantic noria requres. However this necessitates providing a proper interface for users to program against and a set of rules the operator has to obey, neither of which currently exists/has been written down. I think that was what @jonhoo was trying to say. I'll be talking about the requirements for custom operators in my thesis, however it won't be comprehensive enough for all types of UDF yet. I have however implemented both grouping UDF's (with state) as well as pure functions. (i.e. 1 input 1 output, no side effects), multi-tuple functions with or without state (i.e. table functions in noria terms) are much more difficult to get right and would at least require a good documentation. I think after I've handed in my thesis we should talk about to what extend we want to integrate the facilities I created into noria.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants