Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relocatable module experiment #890

Draft
wants to merge 14 commits into
base: main
Choose a base branch
from
Draft

Relocatable module experiment #890

wants to merge 14 commits into from

Conversation

dcodeIO
Copy link
Member

@dcodeIO dcodeIO commented Oct 9, 2019

Experimental branch investigating #887 that doesn't work because we can't have (data ...) and (elem ...) offsets in the form someConstantGlobal + someConstantValue. See also this comment. Pinned here for reference, even though it's not possible currently.

@sampaioletti
Copy link

sampaioletti commented Oct 9, 2019

Just to validate this as a future option (when supported by the spec) I'll pull in a reference the the design doc

https://github.com/WebAssembly/design/blob/master/Modules.md#initializer-expression

In the future, operators like i32.add could be added to allow more expressive base + offset load-time calculations.

@sampaioletti
Copy link

sampaioletti commented Oct 9, 2019

After rereading https://github.com/WebAssembly/design/blob/master/DynamicLinking.md (for the 100th time) I feel like this may be possible with the current spec

I played around with it a little, and your pr is currently generating the following for two globals

(data (i32.add (global.get $__memory_base)(i32.const 8)) "\16\00\00\00\01\00\00\00\01\00\00\00\16\00\00\00a\00p\00p\001\00 \00g\00l\00o\00b\00a\00l\00")
(data (i32.add (global.get $__memory_base)(i32.const 48)) "@\00\00\00\01\00\00\00\01\00\00\00@\00\00\00t\00h\00i\00s\00 \00i\00s\00 \00a\00n\00o\00t\00h\00e\00r\00 \00g\00l\00o\00b\00a\00l\00 \00f\00r\00o\00m\00 \00a\00p\00p\001\00")

I dug through a couple of implementations and it appears the the data segments are additive/relative by default, so you only need this

(data (global.get $__memory_base) "\16\00\00\00\01\00\00\00\01\00\00\00\16\00\00\00a\00p\00p\001\00 \00g\00l\00o\00b\00a\00l\00")
(data (i32.const 48) "@\00\00\00\01\00\00\00\01\00\00\00@\00\00\00t\00h\00i\00s\00 \00i\00s\00 \00a\00n\00o\00t\00h\00e\00r\00 \00g\00l\00o\00b\00a\00l\00 \00f\00r\00o\00m\00 \00a\00p\00p\001\00")

I've done a quick translation of the above using wat2wasm, and it actually seemed to function correctly?

I set my first import of memory_base to 8 to mimic what you did in your generated code and it built/ran fine. I'm sure I'm missing something, but it looks promising to me.

Thoughts?

@dcodeIO
Copy link
Member Author

dcodeIO commented Oct 9, 2019

I dug through a couple of implementations and it appears the the data segments are additive/relative by default

Afaik these are absolute, unfortunately. If your module works, I either assume because __memory_base = 0 or it just so happens to run fine with corrupt memory.

@sampaioletti
Copy link

Yeah you were correct they are absolute.. I do have good initial results merging the data statements(by hand) into one large statement with the imported offset.. But I haven't figured out if there is a way to have binaryen generate that.

@dcodeIO
Copy link
Member Author

dcodeIO commented Oct 9, 2019

That's a great idea!

@dcodeIO
Copy link
Member Author

dcodeIO commented Oct 9, 2019

Gave that a try and seems to be working. Requires importing a suitable memory as well, though, because the compiler doesn't know the correct initial size in pages (assumes ___memory_base = 0). Function table is not yet implemented as this requires changing the respective Binaryen API to also take indexes instead of a zero-based array.

@dcodeIO
Copy link
Member Author

dcodeIO commented Oct 9, 2019

Thinking one step further: It won't be possible to use the same workaround for the table since we can't "merge" table elements like memory segments. Unfortunate.

@sampaioletti
Copy link

Very cool, I'll play around with a bit and see where it goes. Thanks!

@sampaioletti
Copy link

So far looks quite good..but i have the issue of not really needing tables for our project (yet?) but from a memory side seems to work a treat. I'll integrate that with my original shared-runtime example and see what the next issue becomes

@dcodeIO
Copy link
Member Author

dcodeIO commented Oct 9, 2019

The table becomes important as soon as a function is passed around by reference, like when providing a callback to Array#forEach or similar. It's possible to write code without that ofc, but still too common to make relocatable modules a general compiler feature.

@dcodeIO
Copy link
Member Author

dcodeIO commented Oct 10, 2019

Regarding #891: The compiler didn't yet relocate inlined offsets, which it does now, but leads to various issues in these cases. Accessing staticArray[0] for example will compile a static @inline string that cannot actually be inlined anymore since it's a computed offset now. That's where it becomes more complicated :)

@sampaioletti
Copy link

I couldn't sleep thinking about this so I put some more time into it I do apologize this isn't compatible with your current commit, as it would have been to much work to rebase my repo (at the moment).

I am able to build, pass validation and perform the following

const someStaticStuff: i32[] = [0];

someStaticStuff;

//verifies that the ptr is > memory_base and that the strings match
@external("env","log")
declare function log<T>(obj:T,length:i32):void
//verifies that the ptr is greater than memory_base
@external("env","checkPtr")
declare function checkPtr<T>(obj:T):void
const test1="relocatable"
export function main():void{
    someStaticStuff[0]=1
    checkPtr(someStaticStuff)
    log(test1,test1.length)
    const test2="relocatable"    
    log(test2,test2.length)
    let test3=String.UTF16.encode(test2)
    log(test3,test2.length)
}

sampaioletti@10aca34

Please have a look if you dont mind I'd like your advice on if you would prefer the way you did your last commit, if so I'll spend some time to update before I make a PR

@dcodeIO
Copy link
Member Author

dcodeIO commented Oct 10, 2019

Also still thinking how to do this. I guess a better approach to my previous is to make the relocation opaque to the module's logic so we can keep using constants and so on, with the downside that any pointer leaving the module must be relocated by the loader / RT. Last change makes it so that even if we have a const foo = "somestring";, the global foo will hold the non-relocated pointer, which is fine as long as all subsequent loads and stores add the necessary offset. The missing piece there is that a shared runtime must account for this in that a normal call to __retain receives a non-relocated pointer which the RT must relocate upon the whole shared memory.

@dcodeIO
Copy link
Member Author

dcodeIO commented Oct 10, 2019

However, that makes objects from one module non-interoperable with other modules without adding relocation on the boundary, which will become problematic once modules can call into each other without the help of a loader. At this point everything would go through a loader, though. In fact, the loader would sometimes even pass negative pointers around this way, that become properly absolute in the domain of a relocated module again, for example if the calling module is located before the called module in linear memory. Hmm hmm...

One can think of this as each module having its own virtual zero-based address space, not knowing that its memory region is actually being relocated to somewhere else, with the loader taking all the necessary steps to make pointers interoperable between modules. Let's say we have module A and B in this order in linear memory, if module A calls an export of module B with a static pointer, the loader would make that ptr_star = memory_base_a + ptr - memory_base_b and pass it to B. When B loads or stores to it, it will compute ptr_star + memory_base_b again, effectively making it an access relative to A's memory.

@sampaioletti
Copy link

sampaioletti commented Oct 10, 2019

I continued down the path I started to see how far I could get and was able to get it working with my shared-rt example. Take a look and see what you think..i also updated my relocatable fork if you would like to only see the changes to the compiler for relocatable. (keep in mind my relocatable is out of sync with yours)

https://github.com/sampaioletti/assemblyscript/tree/shared-rt-sp-relocatable/examples/shared-rt

I also created a DESIGN.md trying to start explaining how shared-rt works

I need to develop some tests and start look`ing into tables...but I feel like I'm still making progress.

Oh and the README in shared-rt is out of date so dont read it (:

@sampaioletti
Copy link

#890 (comment)

I've been playing with tables and I can't get the compiler to generate more than one elem section? It always just appends to the first.. what would cause it to generate more than one line so I can test some things?

@dcodeIO
Copy link
Member Author

dcodeIO commented Oct 11, 2019

I've been playing with tables and I can't get the compiler to generate more than one elem section? It always just appends to the first..

That's how it's currently generated, yeah. Might have something to do with how Binaryen handles this. The API there is BinaryenSetFunctionTable that takes an array of functions with no way yet to specify an offset. A combined elem is actually good news though, because we can make it work by implementing a start offset on the Binaryen side.

(elem (i32.const 0) $null $start:empty~anonymous|0 $start:empty~anonymous|1)
     ; ^ use __table_base

Also, your design doc aligns well with my thoughts :)

@dcodeIO
Copy link
Member Author

dcodeIO commented Oct 11, 2019

Last commit implements table relocation using a custom Binaryen build including WebAssembly/binaryen#2380.

@dcodeIO
Copy link
Member Author

dcodeIO commented Oct 11, 2019

Remaining challenge is to obtain memory_size / table_size prior to instantiation. Without that, we don't know the parameters of the imported memory and table. We can't even reliably instantiate a module in some sort of sandbox to determine these since doing so already requires an imported memory and table that must both be large enough. I don't see a good solution to this yet.

@dcodeIO
Copy link
Member Author

dcodeIO commented Oct 11, 2019

One way to expose this information might be to hijack the proposed dylink section (while it's not available) by adding an API to Binaryen that lets us emit arbitrary custom sections, and use it by means of WebAssembly.Module.customSections today.

@dcodeIO
Copy link
Member Author

dcodeIO commented Oct 12, 2019

Last commit hijacks the dylink section as mentioned using latest Binaryen. Removes the memory_size and table_size exports in favor of evaluating the section, which we can do before instantiation. Should theoretically be sufficient now to build a working dynamic linker.

@sampaioletti
Copy link

sampaioletti commented Oct 12, 2019 via email

@dcodeIO
Copy link
Member Author

dcodeIO commented Oct 14, 2019

So, regarding a shared runtime, it appears that with each module having its own set of runtime types along the respective compiler generated logic to visit these, that garbage collection has to remain local to the module at least partly, requiring changing a few APIs to make this possible. Affected APIs are __release and __collect, which call __visit_members, and __visit itself. The difference to the static runtime would be that there's another argument when these are called on the shared runtime, a reference to __visit_members, and that calling __collect globally would do a collection within each module, in turn triggering a collection with the respective visitor. At least that's what I figured so far.

The general problem that remains is that everything but hard coded types (ArrayBuffer, String) are incompatible between modules and that some sort of serialization is required for everything else.

@dcodeIO
Copy link
Member Author

dcodeIO commented Oct 14, 2019

Last commit has a shared runtime concept with an overview here. It still remains to be seen if this works, though :)

@sampaioletti
Copy link

@dcodeIO thats looking real good, I still need to explore a few things and I would like to write some tests that support our use cases. As I mentioned before I reworked the compiler test runner to allow child modules to be instantiated. Is that a direction you would like to go, or are we flirting with more of an 'integration' test and should make a separate runner i.e. test:integration -- shared-rt or...

I at least want to do my internal testing in a way that I could eventually (without much rework) contribute them if they benefit the project

@dcodeIO
Copy link
Member Author

dcodeIO commented May 25, 2020

This draft aged surprisingly well, and it might still be feasible to return to relocation eventually. Keeping it open just in case. (see)

@dcodeIO dcodeIO added vacuumed and removed vacuumed labels May 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants