New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add WASM UDF implementation in Rust #11351
Conversation
During the development of this patch I realized an issue that isn't linked to this patch, but to Rust as a whole - the behavior on OOM errors. |
e7e8bf9
to
d14861d
Compare
configure.py
Outdated
@@ -1893,7 +1892,7 @@ def configure_abseil(build_dir, mode, mode_config): | |||
pool = console | |||
description = TEST {mode} | |||
rule rust_lib.{mode} | |||
command = CARGO_HOME=build/{mode}/rust/.cargo cargo build --release --manifest-path=rust/Cargo.toml --target-dir=build/{mode}/rust -p ${{pkg}} | |||
command = CARGO_HOME=build/{mode}/.cargo RUSTFLAGS="--cfg=rustix_use_libc" cargo build --manifest-path=rust/Cargo.toml --target-dir=build/{mode} --profile=rust-{mode} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This RUSTFLAGS
value fixes a weird issue where rustix
- a rust crate that wasmtime depends on - returns 0 as the system page size when using linux_raw
instead of libc
as the backend for systemcalls
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case it helps, I believe this problem is fixed by bytecodealliance/rustix#385, which is included in rustix 0.35.9.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the info. I updated the dependencies and confirmed that everything works now without this flag
CI state |
CI state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love it! @piodul FYI, a fair portion of Rust code to review
@@ -9,3 +9,30 @@ inc = { path = "inc", version = "0.1.0" } | |||
|
|||
[lib] | |||
crate-type = ["staticlib"] | |||
|
|||
[profile.rust-dev] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: maybe the prefix should indicate that this is scylla-specific - scylla-dev
instead of rust-dev
? rust-
might suggest that it's some kind of an official standard, and it isn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rust-
has a benefit that the directory that contains the build files also has rust
in its name (currently it's $builddir/{mode}/rust-{mode}
), but I think something like builddir/{}/rust/scylla-{mode}
would be just as good. On the other hand, maybe a comment would suffice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a comment would suffice for me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment added in the rebase
if let Ok(new_layout) = std::alloc::Layout::from_size_align(new_size, 64 * 1024) { | ||
let new_ptr = match self.layout { | ||
Some(layout) => unsafe { std::alloc::realloc(self.ptr, layout, new_layout.size()) }, | ||
None => unsafe { std::alloc::alloc(new_layout) }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to confirm - is this whole thing done simply to force wasm to use host malloc/free for allocations, so that it gets automatically handled by the Seastar allocator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, otherwise the memory will be allocated with mmap
that we don't override in seastar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, let's explicitly explain it in the comment, along with mentioning mmap. Technically, nothing guarantees that std::alloc
doesn't switch to mmap one day
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(or, just use an unsafe malloc/free pair directly to be sure)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wait, it's an unsafe block anyway. In this case I strongly suggest you just forward-declare malloc, realloc, free, whatever's needed, and use them directly. Then we're 100% sure there won't be any mmap sneaking in.
And in case we have strict layout requirements, like a 64k page for wasm, perhaps instead of malloc we should just go with aligned_alloc (https://linux.die.net/man/3/aligned_alloc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with this implementation it uses aligned_alloc (or malloc for small allocations), which we properly override in Seastar indeed, and it will most likely stay that way due to the 64k alignment, because mmap doesn't exactly let you choose a multi-page alignment anyway. But in theory on on certain hugepage configurations it may be beneficial to use mmap over aligned_alloc, so in my opinion we should be prepared for that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I used aligned_alloc
and free
in the new rebase. I've also added a comment about mmap
.
I removed the std::alloc::realloc
because it should probably not be used here anyway, considering it doesn't keep the alignment
if (eptr) { | ||
co_await coroutine::return_exception_ptr(eptr); | ||
} | ||
co_await coroutine::maybe_yield(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😌
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you do yield here. Why are the others different? Need comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't yield in other places mostly because they should finish quickly, the other cases are also not used in future functions so it would require quite a bit more changes to yield there - I'll add comments.
eb8d1c0
to
acdb049
Compare
CI state |
4bc21e3
to
7185307
Compare
I've rebased onto master, added a comment and removed some unused code. |
CI state |
@wmitros for starters, please also run
|
"errno-dragonfly", | ||
"libc", | ||
"winapi", | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does rust have the equivalent of ccache that can store cached build results outside the build directory? So that rm -rf build; ninja
will be fast?
I'm thinking of our builders which start from a clean directory each time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mozilla's sccache works with rust: https://github.com/mozilla/sccache . Disclaimer: I have never used it personally so I can't tell how well it works in practice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Let's start using it when we see rust compilation times having an impact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The incremental
option is supposed to help with rebuild times, but it does not help with the rm -rf build; ninja
case and I only enabled it for the dev
mode, as it might make the final binary less optimized.
I might just try this sccache
because in the case of rm -rf build; ninja
the rust compilation is very impactful (although this could have been expected, as rust is the only thing that is being rebuilt then)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried sccache and it helps a lot with the rm -rf build; ninja
case. I don't expect we want to add it as one of our dependencies, but explain how to use it in some docs. I'll need to research it a bit more so maybe we can do this as a follow-up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We install ccache into the dbuild environment so we may as well install sscscscscache. Certainly it could be in a separate pull request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see it even supports C++, so if it's better in some way, we can replace ccache.
} | ||
Poll::<Result<wasmtime::Instance>>::Ready(Err(e)) => { | ||
return Err(anyhow!("Failed to instantiate module: {:?}", e)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does instantiate_async() return a future?
If it's in order to yield, then aren't we wasting the opportunity by looping? We should return to the caller and let them yield too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(instantiating an instance is less important than execution, since it's rarer, but still)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the documentation, we return a future here only to potentially yield when we "asynchronously invoke the wasm start function in case it calls any imported function which is an asynchronous host function". The only imports come from WASI, but I don't think they are used in the start function (actually I couldn't find the start function implementation - it is also possible that there is none).
If we also go with #11351 (comment) no imports will be async at all, so it won't be possible that this future yields in the middle
self.wasmtime_memory.size(&store.wasmtime_store) | ||
} | ||
fn grow(self: &mut Memory, store: &mut Store, delta: u64) -> Result<u64> { | ||
self.wasmtime_memory.grow(&mut store.wasmtime_store, delta) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does wasmtime use to allocate memory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Need to understand if it will be accounted for in seastar memory management)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from the WebAssembly Memory allocator that uses the seastar aligned_alloc
, the rust code uses the seastar allocator as its global allocator (though currently this is an implementation detail that could possibly change), but wasmtime (the rust crate) still mostly uses mmap
- in particular to allocate memory during module compilation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I skimmed through wasmtime code and it indeed mmaps a lot. Some of it is actual file-backed mapping, but often it's used as glorified malloc with MAP_ANONYMOUS.
We could technically add an opt-in Seastar feature which makes extrernal mmap-based libraries more Seastar friendly:
- Override
mmap
to fall back to original behavior, except whenMAP_ANONYMOUS
is set, in which case we just call aligned_alloc with page size boundary, which effectively gives us pages. Then, we can apply requested protection with a separatemprotect
call. munmap
gets a little complicated, because I don't know any reliable way of checking if a page is file-backed or anonymous, except for examining/proc/self/maps
output. But if we assume that the number of anonymous mappings is relatively small, perhaps adding a hashmap overhead is acceptable if it means that we regain the ability to allocate everything with a Seastar allocator. Then, we could just track our anonymous pages ourselves and overridemunmap
behavior to a regularfree
if the page is tracked as "ours".
@avikivity Does it sound too obscure to you, or is it feasible? The main question is how many more mmap-ed libraries we're likely to integrate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't file-backed mmap()
cause blocking memory accesses and stalls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not just stalls, also OOM since we already allocated all memory for our own use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overriding ::mmap is likely to end up with a lot of tears, it's too fundamental and too not designed to be overridden.
Can we patch wasmtime (upstream) not to use mmap? It seems like premature optimization. The system memory allocator will call mmap when the conditionals call for it.
Perhaps the issue is the need to make the memory executable later with mprotect() (typical jit operation is to first mmap with PROT_WRITE, then mprotect to PROT_EXEC.
If so, perhaps we can pass callbacks to wasmtime for mmap/mprotect/munmap so we can intercept them. The default callbacks will just call the system mmap/mprotect/munmap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Completely removing mmap
from wasmtime seems difficult, but maybe it is not essential after all.
I inspected the wasmtime code and I found that the mmap
s are used in:
- the default
Linear Memory
allocator - which we have replaced - the
Pooling Memory Allocator
(where we preallocate all the memory for all the instances that we can later use) - which we do not use - in the
copy-on-write
implementation of the initialization ofLinear Memories
- which is only used in the before-mentionedPooling Memory Allocator
- for allocation of the stacks for asynchronous calls - this could probably be replaced with
malloc
- for allocation of the alternate signal stack - this could also probably be replaced with
malloc
- in compilation of a module from a file - which we currently do not use
- for allocation of for the jit code to be later
mprotect
ed withPROT_EXEC
- themmap
s are all withMAP_ANONYMOUS
but I'm not sure we can help it (according to themprotect
manualmprotect()
is unspecified if it is applied to a of memory that was not obtained viammap()
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for allocation of for the jit code to be later mprotected with PROT_EXEC - the mmaps are all with MAP_ANONYMOUS but I'm not sure we can help it (according to the mprotect manual mprotect() is unspecified if it is applied to a of memory that was not obtained via mmap())
All memory in the seastar allocator is obtained via mmap()
. So we should be able to mprotect memory coming from it. But we have to be very careful to make sure the result of mprotect
doesn't spill over to memory used by other objects and to remove it after not needed.
if new_size == 0 { | ||
new_ptr = ptr::null_mut() | ||
} else { | ||
new_ptr = unsafe { aligned_alloc(WASM_PAGE_SIZE, new_size) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This answers my previous question.
aligned_alloc isn't defined if new_size isn't a multiple of WASM_PAGE_SIZE, is it guaranteed by the caller?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I look at our aligned_alloc
overwrite it looks like we align the size up when necessary, but I see that in the linux specification it's as you described.
I'd expect the size here to be aligned, but I don't see it specified directly, so I think I'll just use memalign here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note that memalign
is deprecated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(you could just align the new size yourself)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll align the size myself then
@@ -6,8 +6,6 @@ | |||
* SPDX-License-Identifier: AGPL-3.0-or-later | |||
*/ | |||
|
|||
#ifdef SCYLLA_ENABLE_WASMTIME | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check that the series continues building without wasmtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wasmtime is now downloaded as part of the compilation of the .rs files, so it should never be missing, but it may not work correctly in ARM (it didn't work when we used libwasmtime.a) - I'll try to check this soon. If it doesn't work with ARM I'll bring the SCYLLA_ENABLE_WASMTIME
back
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait what? Didn't we add it to the toolchain?
In any case it may not be available on all architectures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We did add Rust to the toolchain, but we didn't add all the crates that will be used in this patch.
Actually, we should probably discuss how do we want to deal with downloading the new crates.
Currently, the dependency will be downloaded if it's missing during the compilation (it won't be re-downloaded if it's already present).
If we don't want that, we could update the toolchain every time a dependency is changed.
We can also update the toolchain only periodically, and some dependencies could ever need to be downloaded during compilation.
I didn't really see benefits to the latter options because after the first build, there basically is no difference between them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, where do we want to keep the downloaded crates? Cargo uses $HOME/.cargo
by default.
To test if everything works with ARM this would help a lot
lang/wasm.cc
Outdated
rets->push_i32(0); | ||
|
||
auto fut = wasmtime::get_func_future(store, *malloc_func, *argv, *rets); | ||
while(!fut->resume()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code style: space after while, it isn't a function.
Also a comment indicating that we need to yield here.
If we set need_thread, then it may be enough to call seastar::thread::maybe_yield() here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can modify the operator()
in the init(_nullable)_arg_visitor
so that it returns a future, then we can use the maybe_yield
like when executing the UDF
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is just pushing arguments, maybe unneeded. But comments are definitely needed.
lang/wasm.cc
Outdated
argv->push_i32((int32_t)val.i64()); | ||
auto rets = wasmtime::get_val_vec(); | ||
auto free_fut = wasmtime::get_func_future(store, *free_func, *argv, *rets); | ||
while(!free_fut->resume()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
Looks good. Please add tests that prove we yield on a long loop. |
configure.py
Outdated
@@ -675,6 +675,14 @@ def find_headers(repodir, excluded_dirs): | |||
'raft/log.cc', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in the first commit message: "in this pach" -> "in this patch"
configure.py
Outdated
'rust/Cargo.lock', | ||
'rust/inc/src/lib.rs', | ||
'rust/inc/Cargo.toml', | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if the detection of the *.rs
files were automatic. When working with a rust-only project, cargo
usually detects dependent source files automatically and knows what and when to recompile. IMO we should consider it as an improvement in a followup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did also consider this but had a problem with incorporating it into the ninja build rules.
But now that I look at it, maybe we can simply add | always
to the build rule for librust_combined.a
and remove this list whatsoever - this doesn't cost us because if cargo determines no files changed, it will not compile them again
It even looks like this modification doesn't make rust build literally always, but only when there's a dependency on librust_combined.a
rust/wasmtime_bindings/src/lib.rs
Outdated
|
||
loop { | ||
match inst_fut.as_mut().poll(&mut ctx) { | ||
Poll::<Result<wasmtime::Instance>>::Pending => {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the turbofish needed here? It makes the line quite long and verbose. Would Poll::Pending
work here instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It works without it. I'll also remove it for the Poll::Ready
case
rust/wasmtime_bindings/Cargo.toml
Outdated
[dependencies] | ||
cxx = { version = "1.0.73", features = ["c++20"] } | ||
wasmtime = { version = "0.40.0", features = ["async"] } | ||
wasmtime-wasi = { version = "0.40.0", features = ["tokio"] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need the tokio
feature? We aren't running a tokio runtime, and we would rather avoid code which assumes running in the tokio runtime context (it will panic if it tries to access stuff from tokio).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tokio
feature was the only alternative to the sync
feature, so it looked like using it was the only option for the async wasi, but now that I think about it, we might not need it here and just use the sync
WASI - I'll test if the yielding works without it
rust/wasmtime_bindings/src/lib.rs
Outdated
} | ||
|
||
fn get_abi(instance: &Instance, store: &mut Store, memory: &Memory) -> Result<u32> { | ||
match instance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style suggestion: the function returns a Result<>
, so you can use the ?
operator and some methods on Option
to make the code flatter. For example:
let export = instance
.wasmtime_instance
.get_export(&mut store.wasmtime_store, "_scylla_abi")
.ok_or_else(|| anyhow!("ABI version export not found - please export `_scylla_abi` in the wasm module"))?;
let global = global
.get(&mut store.wasmtime_store)
.ok_or_else(|| anyhow!("Exported object _scylla_abi is not a global"))?;
// ...
Other functions would benefit from this conversion, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also added some map_err
in the same style in the rebase
replica/database.cc
Outdated
@@ -361,7 +361,7 @@ database::database(const db::config& cfg, database_config dbcfg, service::migrat | |||
, _feat(feat) | |||
, _shared_token_metadata(stm) | |||
, _sst_dir_semaphore(sst_dir_sem) | |||
, _wasm_engine(std::make_unique<wasm::engine>()) | |||
, _wasm_engine(std::make_unique<rust::Box<wasmtime::Engine>>(wasmtime::create_engine())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't rust::Box
something like a std::unique_ptr
already? Would it be possible to just use rust::Box
directly and remove the double indirection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried using the rust::Box
alone in the rebase and looks like it works for now
"v128", | ||
"externref", | ||
"funcref", | ||
"externref", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was the order changed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It matched the order of enum elements in lang/wasmtime.hh
Lines 492 to 497 in 41b91e3
/// WebAssembly's `v128` type from the simd proposal | |
V128, | |
/// WebAssembly's `externref` type from the reference types | |
ExternRef, | |
/// WebAssembly's `funcref` type from the reference types | |
FuncRef, |
rust/wasmtime_bindings/src/lib.rs
Outdated
wasmtime_engine: wasmtime::Engine, | ||
} | ||
|
||
fn create_engine() -> Result<Box<Engine>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
am I right to assume that Result gets translated by cxx to produce a function that returns a T if the result is Ok, and throws an exception in case of Err?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI state |
There seems to be a build error:
Weird, is this something you even changed in your patch? |
Yes, this is something I added to seastar and it became usable only recently, with the last seastar update. |
CI state |
This time it seems to be a legitimate failure in the
Looks like the reference count for a module went below zero. In debug mode, Rust panics on under/overflow. |
Yes, I found that while adding the "handle" for modules there was one place where I didn't remove the manual reference counting. I removed it and now the references are counted only in the handle constructor/destructor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One last suggestion from my side, otherwise LGTM.
// Wasmtime instances hold references to modules, so the module can only be dropped | ||
// when all instances are dropped. For a given module, we can have at most one | ||
// instance for each scheduling group. | ||
// This function is called each time a new instance is created for a given module. | ||
// If there were no instances for the module before, i.e. this module was not | ||
// compiled, the module is compiled and the size of the compiled code is added | ||
// to the total size of compiled code. If the total size of compiled code exceeds | ||
// the maximum size as a result of this, the function will evict modules until | ||
// there is enough space for the new module. If it is not possible, the function | ||
// will throw an exception. If this function succeeds, the counter of instances | ||
// for the module is increased by one. | ||
void track_module_ref(wasmtime::Module& module, wasmtime::Engine& engine); | ||
|
||
// This function is called each time an instance for a given module is dropped. | ||
// If the counter of instances for the module reaches zero, the module is dropped | ||
// and the size of the compiled code is subtracted from the total size of compiled code. | ||
void remove_module_ref(wasmtime::Module& module) noexcept; | ||
|
||
// When a WASM UDF is executed, a separate stack is first allocated for it. | ||
// This stack is used by the WASM code and it is not tracked by the seastar allocator. | ||
// This function will evict cached modules until the stack can be allocated. If enough | ||
// memory can't be freed, the function will throw an exception. | ||
void reserve_wasm_stack(); | ||
|
||
// This function should be called after a WASM UDF finishes execution. Its stack is then | ||
// destroyed and this function accounts for the freed memory. | ||
void free_wasm_stack() noexcept; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Is there a reason why those methods are public and not private? They seem to be internal to how the instance cached works. The methods track_module_ref
/remove_module_ref
now should probably be exclusively used by module_handle
, so they could also be made private and module_handle
could become a friend of instance_cache
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, added in rebase
CI state |
CI state |
Rust's cargo caches downloaded sources in ~/.cargo. However dbuild won't provide access to this directory since it's outside the source directory. Prepare for sharing the cargo cache between the host and the dbuild environment by: - Creating the cache if it doesn't already exist. This is likely if the user only builds in a dbuild environment. - Propagating the cache directory as a mounted volume. - Respecting the CARGO_HOME override.
Currently, the rust build system in Scylla creates a separate static library for each incuded rust package. This could cause duplicate symbol issues when linking against multiple libraries compiled from rust. This issue is fixed in this patch by creating a single static library to link against, which combines all rust packages implemented in Scylla. The Cargo.lock for the combined build is now tracked, so that all users of the same scylla version also use the same versions of imported rust modules. Additionally, the rust package implementation and usage docs are modified to be compatible with the build changes. This patch also adds a new header file 'rust/cxx.hh' that contains definitions of additional rust types available in c++.
A cargo profile is created for each of build modes: dev, debug, sanitize, realease and coverage. The names of cargo profiles are prefixed by "rust-" because cargo does not allow separate "dev" and "debug" profiles. The main difference between profiles are their optimization levels, they correlate to the levels used in configure.py. The debug info is stripped only in the dev mode, and only this mode uses "incremental" compilation to speed it up.
The C++ bindings provided by wasmtime are lacking a crucial capability: asynchronous execution of the wasm functions. This forces us to stop the execution of the function after a short time to prevent increasing the latency. Fortunately, this feature is implemented in the native language of Wasmtime - Rust. Support for Rust was recently added to scylla, so we can implement the async bindings ourselves, which is done in this patch. The bindings expose all the objects necessary for creating and calling wasm functions. The majority of code implemented in Rust is a translation of code that was previously present in C++. Types exported from Rust are currently required to be defined by the same crate that contains the bridge using them, so wasmtime types can't be exported directly. Instead, for each class that was supposed to be exported, a wrapper type is created, where its first member is the wasmtime class. Note that the members are not visible from C++ anyway, the difference only applies to Rust code. Aside from wasmtime types and methods, two additional types are exported with some associated methods. - The first one is ValVec, which is a wrapper for a rust Vec of wasmtime Vals. The underlying vector is required by wasmtime methods for calling wasm functions. By having it exported we avoid multiple conversions from a Val wrapper to a wasmtime Val, as would be required if we exported a rust Vec of Val wrappers (the rust Vec itself does not require wrappers if the type it contains is already wrapped) - The second one is Fut. This class represents an computation tha may or may not be ready. We're currently using it to control the execution of wasm functions from C++. This class exposes one method: resume(), which returns a bool that signals whether the computation is finished or not. Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>
This patch replaces all dependencies on the wasmtime C++ bindings with our new ones. The wasmtime.hh and wasm_engine.hh files are deleted. The libwasmtime.a library is no longer required by configure.py. The SCYLLA_ENABLE_WASMTIME macro is removed and wasm udfs are now compiled by default on all architectures. In terms of implementation, most of code using wasmtime was moved to the Rust source files. The remaining code uses names from the new bindings (which are mostly unchanged). Most of wasmtime objects are now stored as a rust::Box<>, to make it compatible with rust lifetime requirements. Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>
The new implementation for WASM UDFs allows executing the UDFs in pieces. This commit adds a test asserting that the UDF is in fact divided and that each of the execution segments takes no longer than 1ms.
Different users may require different limits for their UDFs. This patch allows them to configure the size of their cache of wasm, the maximum size of indivitual instances stored in the cache, the time after which the instances are evicted, the fuel that all wasm UDFs are allowed to consume before yielding (for the control of latency), the fuel that wasm UDFs are allowed to consume in total (to allow performing longer computations in the UDF without detecting an infinite loop) and the hard limit of the size of UDFs that are executed (to avoid large allocations)
The wasmtime runtime allocates memory for the executable code of the WASM programs using mmap and not the seastar allocator. As a result, the memory that Scylla actually uses becomes not only the memory preallocated for the seastar allocator but the sum of that and the memory allocated for executable codes by the WASM runtime. To keep limiting the memory used by Scylla, we measure how much memory do the WASM programs use and if they use too much, compiled WASM UDFs (modules) that are currently not in use are evicted to make room. To evict a module it is required to evict all instances of this module (the underlying implementation of modules and instances uses shared pointers to the executable code). For this reason, we add reference counts to modules. Each instance using a module is a reference. When an instance is destroyed, a reference is removed. If all references to a module are removed, the executable code for this module is deallocated. The eviction of a module is actually acheved by eviction of all its references. When we want to free memory for a new module we repeatedly evict instances from the wasm_instance_cache using its LRU strategy until some module loses all its instances. This process may not succeed if the instances currently in use (so not in the cache) use too much memory - in this case the query also fails. Otherwise the new module is added to the tracking system. This strategy may evict some instances unnecessarily, but evicting modules should not happen frequently, and any more efficient solution requires an even bigger intervention into the code.
The main source of big allocations in the WASM UDF implementation is the WASM Linear Memory. We do not want Scylla to crash even if a memory allocation for the WASM Memory fails, so we assert that an exception is thrown instead. The wasmtime runtime does not actually fail on an allocation failure (assuming the memory allocator does not abort and returns nullptr instead - which our seastar allocator does). What happens then depends on the failed allocation handling of the code that was compiled to WASM. If the original code threw an exception or aborted, the resulting WASM code will trap. To make sure that we can handle the trap, we need to allow wasmtime to handle SIGILL signals, because that what is used to carry information about WASM traps. The new test uses a special WASM Memory allocator that fails after n allocations, and the allocations include both memory growth instructions in WASM, as well as growing memory manually using the wasmtime API. Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>
Before the changes intorducing the new wasmtime bindings we relied on an downloaded static library libwasmtime.a. Now that the bindings are introduced, we do not rely on it anymore, so all references to it can be removed.
The last rebase only fixed a conflict with master. Assuming I didn't mess up the last nitpick, we got a LGTM from @piodul, so can you take a (maybe final) look @avikivity? |
CI state |
This series adds the implementation and usage of rust wasmtime bindings.
The WASM UDFs introduced by this patch are interruptable and use memory allocated using the seastar allocator.
This series includes #11102 (the first two commits) because #11102 required disabling wasm UDFs completely. This patch disables them in the middle of the series, and enables them again at the end.
After this patch,
libwasmtime.a
can be removed from the toolchain.This patch also removes the workaround for ##9387 but it hasn't been tested with ARM yet - if the ARM test causes issues I'll revert this part of the change.