Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix cli default external encoding #332

Merged
merged 1 commit into from Nov 6, 2020
Merged

Conversation

casperisfine
Copy link
Contributor

I believe this might be a MRI bug, but it's not confirmed yet. Either way even if it's confirmed we'll need a workaround for a while.

When using bootsnap precompile in some docker containers, the source files are improperly parsed, e.g:

APPROXIMATIONS = {
  "Þ" => "Th",
  "ß" => "ss",
}
p APPROXIMATIONS
root@f7234bf937db:/app# ruby /tmp/utils.rb 
{"\u00DE"=>"Th", "\u00DF"=>"ss"}
root@f7234bf937db:/app# ruby -e 'RubyVM::InstructionSequence.compile_file("/tmp/utils.rb")'
/tmp/utils.rb:2: warning: key "" is duplicated and overwritten on line 3
Traceback (most recent call last):
	1: from -e:1:in `<main>'
-e:1:in `compile_file': /tmp/utils.rb:2: invalid multibyte char (US-ASCII) (SyntaxError)
/tmp/utils.rb:2: invalid multibyte char (US-ASCII)
/tmp/utils.rb:3: invalid multibyte char (US-ASCII)
/tmp/utils.rb:3: invalid multibyte char (US-ASCII)

It appears that InstructionSequence.compile_file doesn't properly default to UTF-8 for source file encoding like the main parser does. On some containers I've managed to fix it with LANG=en_US.UTF-8, but on some others it don't work.

I suppose there's an inconsistency on how Encoding.default_external is initialized (cc @XrXr @tenderlove).

In the meantime this can be easily worked around by setting Encoding.default_external if it is nil

@tenderlove
Copy link
Contributor

It appears that InstructionSequence.compile_file doesn't properly default to UTF-8 for source file encoding like the main parser does.

This doesn't surprise me and also it sounds like a bug in MRI

@casperisfine
Copy link
Contributor Author

Hum, weird. When I was reproducing locally , setting default_external worked, but testing on our CI against this branch I still have the issue. I'll have to dig this more.

@casperisfine
Copy link
Contributor Author

So I've identified the MRI bug and wrote a failing test case: Shopify/ruby@8b001a0

The issue is that Kernel.load use the command line arguments to decide the encoding, whereas InstructionSequence.compile_file doesn't. However it's not trivial to fix.

@casperisfine casperisfine merged commit 1aae379 into master Nov 6, 2020
@casperisfine
Copy link
Contributor Author

Issue opened upstream: https://bugs.ruby-lang.org/issues/17308

@casperisfine casperisfine deleted the fix-precompile-encoding branch November 9, 2020 08:43
@casperisfine casperisfine temporarily deployed to rubygems November 10, 2020 09:17 Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants