Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JavaScript engines are using UTF-16 internally but String are UTF-8 encoded by default #2117

Merged
merged 2 commits into from
Dec 12, 2020

Conversation

ggrossetie
Copy link
Member

Reasoning

Buffer.byteLength('foo') // 3
Buffer.byteLength('écrire') // 7

'foo'.$bytesize() // should be 3 not 6

I've been able to enable 22 specs 🎉
Most notably the String#bytesize method is now working as expected.

On the other hand, I had to disable 2 specs, one on String#intern and another one on String#to_sym.
But if you take a closer look, they were working by luck because the value of the encoding attribute on String primitive was UTF-16LE.
In fact, the method String#to_sym was always returning a UTF-16LE Symbol even when the String was not a UTF-16LE encoded String.

Implementation

The method force_encoding should only update the encoding attribute but it should not modify how the string is encoded. As a result, the methods bytesize and each_byte should not rely on the encoding attribute.

I've introduced an "internal encoding" that will be updated when encode is called (but not when force_encoding is called). The bytesize and each_byte now rely on the "internal encoding" instead of the "encoding" attribute.

/cc @mojavelinux

@hmdne
Copy link
Member

hmdne commented Oct 2, 2020

This is a very good idea and a good step towards more Ruby compatibility.

I am (very slowly) preparing a patch for improving compatibility with Ruby marshals. This will come in handy and make a lot more tests pass.

@ggrossetie
Copy link
Member Author

@elia Do you have an opinion on it?

Copy link
Member

@elia elia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Mogztter thanks for fixing this, can you rebase this on master? (I was able to do that myself) after that we can merge 💪

@ggrossetie
Copy link
Member Author

ggrossetie commented Dec 12, 2020

thanks for fixing this, can you rebase this on master? (I was able to do that myself)

Thanks!
bundle install seems to be failing on CI/GitHub Actions. I forced a new build maybe it's an intermittent failure, will see...

EDIT: I found the root cause: rubyjs/libv8#310

Ugh, think it's because bundler 2.2 has just come out, and it's fetching the build-from-source gem rather than the prepackaged binary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants