Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JS Symbols round trip as Strings if sent to Ruby and back #215

Open
hackmud-dtr opened this issue Aug 16, 2021 · 12 comments
Open

JS Symbols round trip as Strings if sent to Ruby and back #215

hackmud-dtr opened this issue Aug 16, 2021 · 12 comments

Comments

@hackmud-dtr
Copy link
Contributor

I've noticed there are a few issues with how Symbols are marshaled between JS and Ruby. The most obvious issue being: if you take a JS Symbol, send it to Ruby, and then back into JS, you get it back as a string, not a symbol. There's a few other related issues (including at least one hard crash), but I'm hesitant to try to fix these without double-checking what the project wants to do with symbols in general. They are clearly being handled wrong (from JS's point of view, at least) right now, but I don't know if that behavior is being relied upon somewhere, or is deliberate rather than accidental, so I'd like to sanity check this before I spend time implementing a fix.

Looking at this from a purely JS-centric perspective, I believe the problem is that JS symbols are being marshaled into Ruby symbols, and JS symbols and Ruby symbols are completely at cross purposes: JS symbols are meant to be primitives that use instance-equality: Symbol("foo")==Symbol("foo") is false in JS because they are not the same instance. Trying to serialize most JS symbols is sort of perverse, they are pretty much deliberately made to not serialize. Ruby Symbols on the other hand seem to be glorified interned strings, except that if you ask for the symbol :foo you always get the same instance. Which is completely the opposite of JS symbols, but somewhat similar to JS's Symbol.for() function. So there's a pretty big mismatch there. And then since Ruby symbols are just special strings, when you try to send them back into JS, they wind up converting to JS strings rather than back into Symbols as they should.

From a JS perspective, the ideal situation would be:

  1. Symbols round-trip as symbols, not as strings

  2. Symbols created with Symbol.for or that are well-known symbols (such as Symbol.iterator) round trip as themselves: i.e. the before-value and the after value == each other. (v8 keeps track of it's Symbol.for'd symbols, so it shouldn't be a problem to differentiate those from generic symbols)

  3. Other symbols (i.e. Symbol('foo') ) could maintain equality before and after the round trip or not (I honestly don't know what "correct" here is because JS symbols aren't really meant to be serialized). Alternatively, other symbols could just fail to marshal since trying to serialize an entity that by definition is unserializable is just weird.

I'd be happy to take a stab at implementing the above to get Symbols handled properly (and then I can start to address other symbol related problems), but as I said, I want to double check that this isn't working against the project's goals before I spend too much time on this. And if anyone has a strong preference for what to do with 'generic' symbols like Symbol("foo"), I'd love to hear it; as I said, it is not at all clear to me what "correct" behavior is there; everything will seem wrong to someone. My inclination is "do whatever is easiest" which is almost certainly going to be "they round trip as symbols, but do not maintain equality" (which makes sense: if you round trip {} to Ruby and back, it stays an object, but does not == it's pre-round-trip self)

Here is a very simple example showing some of these issues:

require 'mini_racer'

context = MiniRacer::Context.new

context.attach('round_trip',proc{|a| a});

context.eval <<~JS
    var sym_gen=Symbol('foo');
    var sym_for=Symbol.for('foo');
    var sym_wk= Symbol.iterator;

    var sym_gen_rt=round_trip(sym_gen);
    var sym_for_rt=round_trip(sym_for);
    var sym_wk_rt= round_trip(sym_wk)

    var object={};
    var object_rt=round_trip(object);
JS


puts context.eval 'typeof sym_gen_rt' # expect "symbol", get "string"; alternatively, expect invalid conversion error
puts context.eval 'typeof sym_for_rt' # expect "symbol", get "string"
puts context.eval 'typeof sym_wk_rt'  # expect "symbol", get "string"

# The following are all currently false because of round-tripping as strings.
# The expectation values are what I'd expect if that were fixed and they round-tripped as symbols.
puts context.eval 'sym_gen_rt===sym_gen'  # expect (I have no idea), get false
puts context.eval 'sym_for_rt===sym_for'  # expect true, get false
puts context.eval 'sym_wk_rt===sym_wk'    # expect true, get false

puts context.eval 'object_rt === object'  # expect false, get false, included to show symbols breaking equality isn't unprecedented
@SamSaffron
Copy link
Collaborator

What a pickle.

@seanmakesgames any thoughts here?

We could return a JSSymbol type object instead of Ruby Symbol if we want to handle round tripping. The sad thing though is that it would hurt usability for people who are not round-tripping.

Also we would be extra careful here cause we don't want to hold refs to objects inside the V8 VM and there are complications with multi V8 VMs running. (pass a symbol from a different context to current context)

I am not sure there is any great solution here, short of not round tripping these things.

Also, keep in mind, we don't round trip functions and all sorts of other things, so this is going to be part of a much bigger pickle.

@nightpool
Copy link
Contributor

nightpool commented Aug 17, 2021 via email

@nightpool
Copy link
Contributor

if we're NOT holding onto the reference, I'm not sure that it makes sense to round trip symbols at all. There's no sensible way to compare Symbol('foo') and Symbol('foo') if they're not the same instance reference.

@hackmud-dtr
Copy link
Contributor Author

hackmud-dtr commented Aug 17, 2021

Yeah, my idea was to not touch ruby symbols at all -- they stay strings. Map JS symbols to some ruby class that stores details. GC shoudn't be a huge problem if we go with the "don't maintain equality" route, which I am slowly coming around to think makes the most sense.

Symbols made with Symbol.for are guaranteed by v8 to never be GC'd -- they are shared even across contexts/realms (per spec). Ditto for "well-known" symbols like Symbol.iterator -- they are shared across all contexts/realms. (I.e. the expectation is that if you did Symbol.for('foo') in one context and Symbol.for('foo') in another context, they should be the exact same element in memory)

My rough implementation idea would basically be this:

The ruby-side class would store the "description" of the symbol (i.e. the string that it currently round trips to), plus a field saying whether the symbol was made with Symbol.for (can query v8 to determine that), or is a Well-Known symbol like Symbol.iterator (again, can query v8 for that), or is a "generic" symbol (for lack of a better term -- a symbol that isn't one of the other 2 categories). So conceptually, you might have {name:"foo",type:"Symbol.for"}, {name:"Symbol.iterator",type:"well-known"}, {name:"foo",type:"generic"} (though realistically, i wouldn't use strings for type).

So, when going JS to Ruby, you just ask v8, is this a Symbol.for symbol? Is this a well known symbol? If not, it's generic. Never actually store the JS symbol anywhere in ruby at all (so no refs keeping things alive)

When going back ruby to JS, if the instance says it's Symbol.for, we can talk to v8 to get the copy from it's cache of those (or, if it's a new process, make it create a new one). Then any Symbol.for("same description") in JS will get the same instance, which is expected. For well-known symbols, we can again ask v8 for it's instances and use them. For generic symbols, we just make a new one (which won't preserve equality).

So on the ruby side, it is just saving the (ruby) string name/description, plus a flag saying which type of symbol it is. No attempt to hold onto any JS memory at all. And when going the other direction, we use v8's caches to get the proper symbol if applicable, and just create a new one if not.

For people who like current behavior (i.e. treat it just as a string), the name/description property of the hypothetical new ruby class would have the same value that JS Symbols currently map to, so they'd just need to grab that field rather than using the string raw. A breaking change for them, but a pretty minimal one I'd hope.

@seanmakesgames
Copy link
Contributor

Great stuff in here!
From the hackmud perspective, we don't actually use / need any of the features of symbols. What we are looking for as an embedder is well defined and stable behavior.

I like dtr's plan here, but would also be totally fine with going back to not serializing js symbols to ruby side (nil) or making them strings to match the one-way of ruby symbols to js, and then documenting the behavior in the readme. Then, wait for someone else who reports this behavior and requests the feature (and their specific usage / use-case). That way we aren't implementing something that complicates the surface area and memory management for a feature we aren't going to use.

Sean

@hackmud-dtr
Copy link
Contributor Author

Yeah. If no one is actually complaining about symbols being weird, and the current behavior works for most people using Mini-Racer (or at least, doesn't cause problems), leaving it is (obviously) significantly easier than fixing it. I just noticed how weirdly (from a JS perspective) symbols were handled when I was looking at another symbol bug, and didn't feel comfortable touching that without knowing if the current behavior was intentional or accidental. If no one cares one way or the other, then I can fix the other bug without changing too much. And if there's consensus that making symbols work better (again, from JS's perspective), I'm happy to do that first instead.

My read on the thread so far is basically: approximately no one actually uses symbols for much, so this is probably more work than they deserve until someone actually needs them. Given how weird JS symbols are, "no one actually uses them" would not surprise me in the slightest. They are a feature in search of a problem, if you ask me. But if someone actually needs them, I can take a stab at making this work any time.

@hackmud-dtr
Copy link
Contributor Author

One last thing: after doing some investigation, it seems (Ruby) symbols created via rb_intern_str (which is used when a JS symbol is sent to Ruby) are never eligible for Ruby's GC. I might be wrong here, I only dabble in Ruby, but a friend who is much more knowledgeable than I says that rb_intern_str creates immortal symbols.

So if memory management / leaks is a concern, it is worth noting that right now, every JS symbol sent to Ruby effectively leaks the associated Ruby symbol, since that will never get cleaned up as far as I can tell. It shouldn't hold onto any v8 memory, so that side should all clean up fine (though if the JS symbol was made with Symbol.for, the same 'never deleted' issue happens on the JS side for basically the same reasons).

@seanmakesgames
Copy link
Contributor

Again, my vote is for well defined and stable behavior. In the current implementation, I would probably turn off the functionality of symbols on js side altogether to prevent their marshalling.

@SamSaffron
Copy link
Collaborator

are never eligible for Ruby's GC

This is changed a while back, I think it was either in 2.5 or 2.4. These symbols are not eternal, they get collected.

@Fayti1703
Copy link
Contributor

Symbols created by rb_str_intern can get collected. Symbols created by rb_intern_str cannot.

@SamSaffron
Copy link
Collaborator

Yikes, well lets get this API call fixed. Care to post a PR?

@SamSaffron
Copy link
Collaborator

nevermind, just pushed a fix @Fayti1703, thanks for alerting me about it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants