Mozlandia JS

Servo meeting with the JS team

String representation types (kmc/SimonSapin)
(missing) C API woes (jdm)
- C++ linkage
- Overloads
- Bitfields
- Option builder objects
SpiderMonkey upgrade (jdm)
- Requirement for NSPR threads
- Missing Proxy trace hook
- Hooks to enumerate GC objects (thingGCRooters, autoGCRooters)
Promises (jdm)
JSRuntime slimming (jdm)
Current status of native objects in JS heap (jdm)
Rust GC hook requirements to integrate with collector/moving GC (pcwalton)

Upgrade blockers

jdm: 2.5 year old SM in Servo. Biggest problem is no C API. Workarounds by extern blocks with headers to remove name mangling. Glue file with w/e C APIs we need. Bitfields make it tough because there's no portable canonical impl. Option builder patterns we work around by setting options for specific things. Methods that take C++ objects we can't represent at all auto_id_vector is a killer and we have no solution.
jdm: Additionally, an NSPR thread per runtime is hard because we don't have a thread.
john: Thread per?
jdm: There's the replacement thing. The runtime assumes that the first thread you call JS_Init on is your main thread (TLS key, etc.) and if you do it elsewhere, that doesn't happen.
jorendorf: Need to abstract from thread and let you provide APIs to you. Used to have it, but don't have it now.
jdm: In the modern (April) one, proxy class doesn't have trace hooks, which we need. Has that changed?
jorendorf: Trace hooks are still absent.
jdm: Tough to integrate types like heap and root into Rust. Having them compatible with the C++ representation so we can put them in the same data structures is tough. If there were hooks that let us tell you about the rooters, we could.
john: There are callbacks to say trace roots in the embedding.
terrence: That should work fine.
jdm: Why was I using the rooters?
terrence: That's how we do it. Can't incrementalize that.
sfink: have to pass in handles.
larsberg: Callbacks will stay around?
terrence: Yes.
jdm: There are classes that rely on autogc stuff that you can build in C++ but we have to build manually in Rust.
terrence: Which ones?
jdm: auto_id_vector. It registers itself as a collection. Tough because it's so easy to move them.
terrence: need binary compatibility with the vector class. Not sure how to do that? C API to vector?
john: Can build your own versions and then trace them.
sfink: Need APIs that accept our versions. And unsafe versions.
jdm: Lots of things that can't be passed through a C API.
sfink: That's a reasonable general strategy.
terrence: They derive from vector, so you should be able to just trace them yourself, cast at the C level, and push them in. Still horrid, but would work.
jdm: Good to have a workaround. That's what's stopping us right now. I'm making lots of changes to the header files right now. Not sure what the right solution is there?
sfink: About to namespace all the things
jdm: All those require a glue_cpp thing.
sfink: Yes.
terrence: Can you bind directly to C++?
jdm: Overloads.
pcwalton: Templates are the hard part. Can handle overloads.
jdm: Have to instantiate each template at each type and wrap them.
pcwalton: In theory, we could write rust bindgen things to make that easier. Uses clang, and could make it understand some simple C++ name mangling? Maybe bindgen our way out of it...
jorendorff: It's the structures, etc. that are hard.
waldo: Bitfield problems are just for compile-time options.
jdm: Root type.
waldo: Root doesn't have it anymore.
terrence: Could maybe remove auto_vec stuff. Think we only use it in one place.
waldo: General problem of wanting to fill up a vecotr and have it automanaged.
jorendorff: What is the pattern we'll use here? What does Rust do? Reason we have these classes in C++ is for RAII.
pcwalton: Rust does the same thing. But, it's different because it's not C++. The C++ destructor doesn't translate to a Rust destructor because they're a little different. The destructors are pretty much the same, but you can't convert C++ to Rust due to semantic differences.
jorendorff: Rust should have primitives for wrapping C++ destructor-style stuff.
pcwalton: Ugh. Reinventing COM.
waldo: More than just RAII. Want to be able to insert more stuff, too for auto binding stuff. It's messy and manual.
jdm: I've written the Rust thing that puts itself in a global thing, with a GC hook, etc...
john: Sounds perfect; we should just accept that.
terrence: Should make a version that takes the item and length.
john: Wrap it in an auto_id vector, and then it'd be rooted for the duration of the acll.
jdm: That would work around it, at least.
terrence: It's a hard problem: two high-level languages talking through a low-level language.
jorendorff: Maybe another room with just 4-5 people and a whiteboard...
jdm: Mainly wanted to bring this up.

Threading

jorendorff: WHAT are you doing?
jdm: We have a native thread with a JSRuntime. Unrelated to the thread we call JS_Init on.
terrence: How does that even work w/o TLS?
jdm: Many threads use JS. Each have their own Runtime.
jorendorff: Rust tasks?
jdm: Native threads. All JS threads are native threads.
jorendorff: No more user-scheduled tasks? Great. That's easier.
terrence; Strict thread affinity?
jdm: Yes. I'll get back to you on what the problem was there.

Many JS Runtimes

jdm: Are they fat?
terrence: Yes, but slowing.
jdm: Per top-level domain.
terrence: Use small chunk side. Will be hundreds on some browser domains. We have one runtime + workers, so it's capped at 21, so we don't worry on desktop. On b2g, we can do a smaller chunksize (256kb instead of 1MB), which lowers the runtime to that. Start decommitted, so take address space, but the GC space is decommitted.

Allocating Rust objects in the JS heap

jdm: Can we fuse reflectors and the JS objects there? What's the status of native objects that can be traced by the GC?
terrence: Now have proxies for it. JSObject has multiple derivations, but is moving where it has a proxy for the main one there and all of the stuff is built off of it.
dherman: You mean a C++ impl proxy?
terrence: Yes. Lets you put whatever you want under the surface of the C++ impl.
waldo: Ideally, you'll have normal objects and then exotic objects...
jorendorff: Hrm, what does that mean? I think you just wanted to land support for allocating actual space in the JS object for rust stuff?
jdm: I want to know if I can allocate memory in Rust that the GC knows how to trace with minimal overhead? So we could allocate a separate...
jorendorff: We have some separate types.
terrence/terrence: We have some extra work we're doing.
sfink: Trace objects let you do that.
terrence: Function also does that. Array also has some support for that.
jorendorff: In terms of an external API, though, that's not going to work for them. I can't do this at all today.
waldo: He wants an object with some set of associated reserved slots.
jorendorff: Not a callback; a declarative data description?
waldo: That's for us to describe.
jorendorff: Just "and some bytes at the end" would help.
sfink: We need that anyway.
terrence: We're suggesting use reserved slots to fake it, but doesn't work for a billion DOM objects.
jdm: Want to have things be as slim as possible.
terrence: Have two words right now. Shape+height. Useful for proxies as well.
jorendorff: Sometimes, you don't want a JSObject at all; just want to use your GC.
jdm: That's what we want.
pcwalton: Yes, we need the interop somehow...
terrence: We do have a GC

allocating part 2

jorendorff: Will you work on this?
jdm: I enjoy it, but not sure if it’s there.
jorendorff: Is Rust in shape for GC pointers to be added?

String representation issues

simonsapin: Rust is UTF8. Our HTML parser and CSS parser are too. So, we wonder whether we should use UTF8 in the DOM as well, something compatible with JS strings, or if we SM could also support UTF8?
johan: Ton of work. Can template a lot of the stuff and make it faster…
waldo: All about the indexing performance, which may or may not matter for you. Right now, we do have simple functions that are for one of the impulse…
sfink: Kind of a big deal for embedders. They really want UTF8. I’m not volunteering to do it, but we need it…
terrence: Could get network straight into a string, which would be good. Right now, we scan and Latin1 or 2-byte. That might be good enough? If you use JSStrings, we can fix them later for off-the-wire UTF8. Most things are Latin1…
john: Is this more than just # of bytes per character and about JS?
Simonsapin: Yes, in JS, strings are 16-bit units. They’re not necessarily UTF16. Can have invalid code points. Compared to UTF8, it’s lossy. I wrote a “WTF8” that is UTF8 + unpaired surrogates. You could encode UCS2 to it and back losslessly.
terrence: Anything at the API level is a 2-byte char. Under, it’s just a matter of indexing. UTF8 no fixed-size code points.
Simonsapin: The WTF8 trick is about correctness; the other problem is performance. Could do tricks like remembering the index for simple offsets (to handle benchmarks). But it’s a question of whether web content does random indexing?
johan: Crypto benchmarks
waldo: OTOH, it’s purely readonly memory. So, why aren’t they using int16 arrays or something? There’s somebody who will be slowed down by this.
simonsapin: Code should be using arraybuffers anyway
zwarich: Can always convert to utf16 when somebody calls .at.
terrence: One scan, remember the index, etc. We don’t do it today because it’s mostly latin1 today. Unless you go to baidu it’s different, but those are 4-byte
johan: In FF, we copy strings to pass between the DOM and JS. I’m not sure if it’s really important right now.
simonsapin: UTF8 DOM and do the bindings?
terrence: Our DOM is UTF16… we copy already in FF.
simonsapin: So we could just do the conversions.
terrence: Can we just forget about this, do the copy, and move on? When I first joined 7 years ago, we decided not to for FF, to save things.
waldo: Things have changed in 7 years. UTF8 has changed a lot.
pcwalton: There are definitely higher-priority issues in Servo.
zwarich: The memory overhead is hard. In webkit, the webcorestring can point to the bytes in the resource so it can be passed into JS without the copies. The HTML parser still copies the string in that case.
waldo: I think our parser only takes in char16s. No reason it has to; we can always add more templates, double our parsers, etc.
terrence: Can we get UTF16 and go straight to a buffer?
zwarich: Never. latin, MS cyrillic, or UTF8.
simonsapin: What should we do?
larsberg: Sounds like we’re copying.
waldo: That’s fine.
zwarich: Technically, copying can change the space complexity…
terrence: It’s true in Gecko today.
jack: We’re trying to get better here in Servo, right?
waldo: Enjoy writing it!
larsberg: If we made it, would you even land it.
waldo: If it didn’t regress existing code.
jorendorff: Especially if it’s a small change in complexity, like landing a change in CharAt.
terrence: It’s pretty template.
Simonsapin: Is it necessary to preserve unpaired surrogates in the DOM?
waldo: WebIDL has been removing that at API layers. ACID3 had a test for it, though. In pages themselves, you’d have to ask boris or dbaron. You can write them out in your HTML…
terrence: We’ll round-trip it.
SimonSapin: Should we try to preserve it?
waldo: I think you kinda have to.
sfink: Can’t just not do it.
Simonsapin: hsivonen was saying maybe not…
waldo: He might be more aggressive about bringing sanity.
terrence: Yeah, we can’t break ACID3.
jorendorff: Could try it in FF, and find out it doesn’t work, and save a bunch of time.

GC Integration with Rust

pcwalton: Right now, we use a bunch of smart pointers to integrate with SM GC.
terrence: What’s the state of the Rust GC?
pcwalton: It never existed in reality. There’s just SM’s GC in Rust.
terrence: So you just use Rust hooks…
pcwalton: We don’t have them, and that’s the problem. We can do copy constructors / destructors. We have these handles (like root_temporary). They’re a lot of pain because it’s hard to make them memory safe.
jdm: In the presence of moving values.
pcwalton: Move semantics in Rust.
zwarich: no move constructors in rust; we just always move.
pcwalton: Handles today are both verbose and have a perf issue.
waldo: Double indirection probably doesn’t matter in practice. It just makes people worry about it.
pcwalton: Challenge is that LLVM doesn’t support GC…
zwarich: Stack scanning from Azul just landed! Precise roots, etc.
pcwalton: Great! The Azul people added exact stack scanning to LLVM. LLVM’s GC is that you proivide a plugin that tells you the stack addresses and registers for the roots at all safe points.
waldo: And types?
pcwalton: No. Assumes there’s one type and you can figure it out. That’s plenty of our purposes for JS Objects, but not Rust in general. You have to emit tables the GC can read for each stack frames to determine pointers. Question: How can we integrate that with SM?
terrence: yes!
jorendorff: The stack maps LLVM emits now, if they’re positions you can write as well, can you move?
pcwalton: Yes.
john: Can add a callback, walk the stack map, and be done.
terrence: Need to check if it’s in the nursery or not…
jorendorff: He wants an address he can just call to say “here’s a root.”
terrence: We can add an API to make that easy to use.
jorendorff: We’ll call you when we’re doing GC, let you know, you give us the stack maps, etc. It’s just a chunk of C.
jdm: We use that API for root collections.
sfink: How are you doing heap write barriers?
jdm: We duplicate the heap class. It’s gross. So, so gross.
pcwalton: Probably have to do that. Could eliminate rooted_root and temporary?
jdm: That would be exciting!
larsberg: Fantastic.
sfink: Similar to what we need.
terrence: Still need a handle to call API methods in SM, since we need that. But we don’t care where it is, where it points, etc. WE’ll just follow blindly and assume it’s rooted.
pcwalton: So, root and temporary could be gone?
jdm: Yes, equivalent of rooted type and persistent rooted type.
sfink: may need persistent equivalent rooted type, since we need it elsewhere.
terrence: Persistent list, etc. doesn’t seem important…
jdm: Just need to hold pointers across calls. We can write compiler plugins that do stack analysis, too.
terrence: GCs are rough.
pcwalton: Should help with the LLVM integration.
sfink: Weak maps?
jdm: Not using them. Got around the hash table keys moving by keying on the Rust hash maps, since those don’t move…
terrence: There are some tricks to work around it. There are API hooks to get around it. Callback exists for you to do fixups.

C API

jdm: Allow binding to as much C++ things as possible from Rust?
terrence: That or a lot of manual effort maintaining bindings.
jdm: Could we land it in SM, maybe?
waldo: Tests for it, included in SM?
terrence: Otherwise, we’ll break it all the time.
waldo: Things break if they don’ have tests
jdm: The glue code is just non-mangled C functions that call things that are. It just needs to get built, no?
sfink: That’s the test.
jdm: Would like to move that way from each of our SM upgrades into the SM code. Maybe JS_not_nice_friend_API?
terrence: JS_enemy_API. I like it.
sfink: Give us something that tests it out, and then we will land it.
jdm: As long as we don’t silently break, it’s fine? Fair enough.

Promises

jdm: Promises and event queue. For context, know they’re in gecko. Would be nice to know they’re coming in SM eventually… but, if that’s not true, we’re in trouble. ServiceWorkers, etc. are all based on this.
waldo: Don’t have a timeframe. Need an event loop / event queue abstraction.
jdm: 6 months?
waldo: No.
terrence: Still trying to figure things out on the spec side where there are two event loops (browser & SM).
jdm: So, no promises in 2015. [ed. note, this refers to Promises in SpiderMonkey, separate from Firefox and reusable by Servo, and nothing related to code that runs inside of Firefox]

NSPR

sfink: we have an emulation layer for it, and I think we’re going to ditch it?
jdm: Interesting choice…
waldo: i haven’t been watching
jdm: That would make me sad. nspr was hard to integrate in our build.
terrence: It is a condition variable, though, so even though it’s small it’s still hard and important.
sfink: made it easier to build nspr, which removed a lot of the problems. Should be aware of it.

Data objects

jdm: The other bits of open questions were that you’d be open to data objects.
waldo: Brian (Hackett) will make that possible. Probably by Q2.
sfink: Unwrapping stuff in slots?
waldo: Thought he was going for hunk-o-memory.
sfink: Yeah, he’s pushed that for array buffers in the past. Probably want to talk directly to brian.

Trace hooks

jdm: Missing trace hooks on proxy from your end? I’ve added it back. Is that correct or not?
sfink: Talk with Eric. Assume it’s just not there because we don’t need it. cc efaust

GC rooters

jdm: GC rooters & auto rooters solution is to use the callbacks and our own mechanisms.
sfink: and make them look like auto vectors

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mozlandia JS

Servo meeting with the JS team

Upgrade blockers

Threading

Many JS Runtimes

Allocating Rust objects in the JS heap

allocating part 2

String representation issues

GC Integration with Rust

C API

Promises

NSPR

Data objects

Trace hooks

GC rooters

Clone this wiki locally