Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when using special characters in Gemfile: "incompatible character encodings: UTF-8 and ASCII-8BIT (Encoding::CompatibilityError)" #1376

Closed
ivoanjo opened this issue Jun 22, 2018 · 13 comments
Assignees
Milestone

Comments

@ivoanjo
Copy link
Contributor

ivoanjo commented Jun 22, 2018

Hello there!

  • TruffleRuby version: truffleruby 1.0.0-rc2, like ruby 2.4.4, GraalVM CE Native [x86_64-linux]
  • Linux version: Linux maruhime 4.13.0-45-generic #50-Ubuntu SMP Wed May 30 08:23:18 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
  • OS version: Ubuntu 17.10

I was trying out my oddly-named Persistent-💎 gem, where I use the 💎 on several method and class names, and unfortunately doing bundle install I get an truffleruby: incompatible character encodings: UTF-8 and ASCII-8BIT (Encoding::CompatibilityError).

I was able to create the smallest test case for a failure ever: just create a Gemfile with the following:

💎

and here is the result:

$ bundler -v
Bundler version 1.16.2
$ bundle install
truffleruby: incompatible character encodings: UTF-8 and ASCII-8BIT (Encoding::CompatibilityError)

on MRI I get:

$ bundle install

[!] There was an error parsing `Gemfile`: Undefined local variable or method `💎' for Gemfile. Bundler cannot continue.

 #  from /home/knuckles/ruby/persistent-dmnd/Gemfile:1
 #  -------------------------------------------
 >  💎
 #  -------------------------------------------

(And of course, on MRI, doing bundle install with the full normal gem works).

@eregon
Copy link
Member

eregon commented Jun 22, 2018

Hello,
Thank you for the nice bug report!

I can reproduce the issue.
The fun bit is this is actually failing while trying to add 💎 to the error output 😄

@eregon
Copy link
Member

eregon commented Jun 22, 2018

OK, one difference is

instance_eval "\xF0\x9F\x92\x8E\n".b
# or
instance_eval "💎".b

gives on MRI

-e:1:in `instance_eval': undefined local variable or method `"\xF0\x9F\x92\x8E"' for main:Object (NameError)
	from -e:1:in `instance_eval'

$ ruby -e 'e = (instance_eval "\xF0\x9F\x92\x8E\n".b rescue $!); p e.message'         
"undefined local variable or method `\"\\xF0\\x9F\\x92\\x8E\"' for main:Object"
$ ruby -e 'e = (instance_eval "\xF0\x9F\x92\x8E\n".b rescue $!); p e.message.encoding'
#<Encoding:US-ASCII>

while on TruffleRuby it gives:

(eval):1:in `<top (required)>': wrong constant name � (NameError)
	from -e:1:in `instance_eval'
	from -e:1:in `<main>'

$ ruby -e 'e = (instance_eval "\xF0\x9F\x92\x8E\n".b rescue $!); p e.message'
"wrong constant name ðÂ\u009FÂ\u0092Â\u008E"
$ ruby -e 'e = (instance_eval "\xF0\x9F\x92\x8E\n".b rescue $!); p e.message.encoding'
#<Encoding:UTF-8>
$ ruby -e 'e = (instance_eval "\xF0\x9F\x92\x8E\n".b rescue $!); p e.message.valid_encoding?'
true

So basically on MRI the missing method is handled by Bundler::Plugin::DSL#method_missing inside Bundler, but on TruffleRuby it's a missing constant and that just bubbles up higher.
With that, the description of the error is:

TruffleRuby:
[:description, "There was an error parsing `Gemfile`: wrong constant name ðÂ\u009FÂ\u0092Â\u008E", #<Encoding:UTF-8>]
MRI:
[:description, "There was an error parsing `Gemfile`: Undefined local variable or method `\xF0\x9F\x92\x8E' for Gemfile", #<Encoding:ASCII-8BIT>]

And the binary encoding of the description seems quite lucky for Bundler, and it's there because the method_missing is defined like:

      def method_missing(name, *args)
        raise PluginGemfileError, "Undefined local variable or method `#{name}' for Gemfile" unless Bundler::Dsl.method_defined? name
      end

and the passed name is a Symbol with a binary encoding, :"\xF0\x9F\x92\x8E" (which sounds a bit weird, maybe it's for compatibility, :💎 is a valid UTF-8 Symbol).
If name was actually passed as a UTF-8 Symbol, then MRI would fail with the same Encoding::CompatibilityError as us, when trying to show the line with the error.

So this works by luck, in lib/bundler/dsl.rb we have

m = "\n[!] " # UTF-8, CR-7BIT

# If description is binary, then m becomes binary too
# if not it stays UTF-8, but becomes CR-VALID (not only US-ASCII)
m << description
...

# If m is already BINARY, all fine
# If m is UTF-8 CR-VALID and we append a BINARY string => Encoding::CompatibilityError
m << lines[line_numer] # the same as m << "\xF0\x9F\x92\x8E\n".b

m is originally UTF-8 with only US-ASCII characters.
So if a BINARY String is appended when m only contains US-ASCII characters, then m becomes BINARY and further BINARY String appends are fine.
But if not and m is first appended a UTF-8 String with non-US-ASCII characters (such as the error description containing 💎 or some non-ASCII characters), then an append with a BINARY String later raises Encoding::CompatibilityError.

Anyway, the first thing we need to do is figure out why TruffleRuby think 💎 is a constant and not a local variable/method call.

@ivoanjo
Copy link
Contributor Author

ivoanjo commented Jun 23, 2018

Thanks for the awesome deep dive and quick answer! 🎉

@eregon
Copy link
Member

eregon commented Jun 23, 2018

Sorry for so much text 😄
I tried to take notes as I figured most stuff out as there are probably more than one issue lurking in there.
The main reason why these Unicode characters don't work seamlessly is Bundler seems to read the Gemfile as a String with binary encoding: https://github.com/bundler/bundler/blob/9f7bf0ac3ab8d995e3a274cec3c292a5203f4534/lib/bundler.rb#L409-L411
https://github.com/bundler/bundler/blob/9f7bf0ac3ab8d995e3a274cec3c292a5203f4534/lib/bundler/dsl.rb#L46

If this was UTF-8, we would just deal with valid Strings but here it means the Gemfile is just a sequence of bytes, and it's unknown how to interpreter them into characters.
I'm somewhat surprised eval/instance_eval accepts to parse binary strings, because it cannot really know what is meant for non-US-ASCII characters.

I think it's more important to make bundle install work in Persistent💎, and that seems to trigger a slightly different issue, namely the # encoding: UTF-8 at the top of persistent-dmnd.gemspec seems to be ignored when using eval, and therefore we just read it as binary and the constant name is different than the one declared in lib/persistent_dmnd/version.rb, and therefore the constant is not found.

@ivoanjo
Copy link
Contributor Author

ivoanjo commented Jun 23, 2018

Sorry for so much text

No, please do! 👍

I think it's more important to make bundle install work in Persistent💎, and that seems to trigger a slightly different issue, namely the # encoding: UTF-8 at the top of persistent-dmnd.gemspec seems to be ignored when using eval

Shouldn't it default to UTF-8 even without the encoding: UTF-8 pragma?

I believe that from Ruby 2.0 it's the default; I've only added it to the gem so as to also support 1.9.3 (I have a lot of hacks to support all the Ruby versions I can -- this gem stretches Ruby/JRuby in really funny ways).

@eregon
Copy link
Member

eregon commented Jun 23, 2018

Shouldn't it default to UTF-8 even without the encoding: UTF-8 pragma?

Ruby defaults to UTF-8 for running the main script and require/load but here Bundler reads the file with File.open(file, "rb", &:read) and then eval it, so I think the magic comment is really what saves it here for interpreting the Persistent💎 constant.

MRI also fails bundle install if the magic comment is removed, using latest Bundler 1.16.2:

$ chruby 2.4.4
$ gem install bundler
$ bundle install
#<Encoding:ASCII-8BIT>
Required ruby-2.5.1 is not installed.
#<Encoding:ASCII-8BIT>
/home/eregon/.gem/ruby/2.4.4/gems/bundler-1.16.2/lib/bundler/dsl.rb:575:in `to_s': incompatible character encodings: UTF-8 and ASCII-8BIT (Encoding::CompatibilityError)
	from /home/eregon/.gem/ruby/2.4.4/gems/bundler-1.16.2/lib/bundler/dsl.rb:51:in `to_s'
	from /home/eregon/.gem/ruby/2.4.4/gems/bundler-1.16.2/lib/bundler/dsl.rb:51:in `message'
	from /home/eregon/.gem/ruby/2.4.4/gems/bundler-1.16.2/lib/bundler/dsl.rb:51:in `rescue in eval_gemfile'
	from /home/eregon/.gem/ruby/2.4.4/gems/bundler-1.16.2/lib/bundler/dsl.rb:56:in `eval_gemfile'
	from /home/eregon/.gem/ruby/2.4.4/gems/bundler-1.16.2/lib/bundler/dsl.rb:12:in `evaluate'
	from /home/eregon/.gem/ruby/2.4.4/gems/bundler-1.16.2/lib/bundler/definition.rb:35:in `build'
	from /home/eregon/.gem/ruby/2.4.4/gems/bundler-1.16.2/lib/bundler.rb:135:in `definition'
	from /home/eregon/.gem/ruby/2.4.4/gems/bundler-1.16.2/lib/bundler/cli/install.rb:62:in `run'
	from /home/eregon/.gem/ruby/2.4.4/gems/bundler-1.16.2/lib/bundler/cli.rb:224:in `block in install'
	from /home/eregon/.gem/ruby/2.4.4/gems/bundler-1.16.2/lib/bundler/settings.rb:136:in `temporary'
	from /home/eregon/.gem/ruby/2.4.4/gems/bundler-1.16.2/lib/bundler/cli.rb:223:in `install'
	from /home/eregon/.gem/ruby/2.4.4/gems/bundler-1.16.2/lib/bundler/vendor/thor/lib/thor/command.rb:27:in `run'
	from /home/eregon/.gem/ruby/2.4.4/gems/bundler-1.16.2/lib/bundler/vendor/thor/lib/thor/invocation.rb:126:in `invoke_command'
	from /home/eregon/.gem/ruby/2.4.4/gems/bundler-1.16.2/lib/bundler/vendor/thor/lib/thor.rb:387:in `dispatch'
	from /home/eregon/.gem/ruby/2.4.4/gems/bundler-1.16.2/lib/bundler/cli.rb:27:in `dispatch'
	from /home/eregon/.gem/ruby/2.4.4/gems/bundler-1.16.2/lib/bundler/vendor/thor/lib/thor/base.rb:466:in `start'
	from /home/eregon/.gem/ruby/2.4.4/gems/bundler-1.16.2/lib/bundler/cli.rb:18:in `start'
	from /home/eregon/.gem/ruby/2.4.4/gems/bundler-1.16.2/exe/bundle:30:in `block in <top (required)>'
	from /home/eregon/.gem/ruby/2.4.4/gems/bundler-1.16.2/lib/bundler/friendly_errors.rb:124:in `with_friendly_errors'
	from /home/eregon/.gem/ruby/2.4.4/gems/bundler-1.16.2/exe/bundle:22:in `<top (required)>'
	from /home/eregon/.gem/ruby/2.4.4/bin/bundle:23:in `load'
	from /home/eregon/.gem/ruby/2.4.4/bin/bundle:23:in `<main>'

Could you verify that in your environment actually?
I'd like to rule it's not something due to my environment.

@ivoanjo
Copy link
Contributor Author

ivoanjo commented Jun 23, 2018

I stand corrected, indeed the gemspec needs the # encoding: UTF-8, and I can confirm your result.

I went back to check and when I added support for Ruby 1.9.3 this wasn't needed: https://gitlab.com/ivoanjo/persistent-dmnd/commit/29d0977df8429ed1617f07d325b3b6169dc6154e. So I dug a little deeper and it seems this was introduced in the latest bundler point release:

$ ruby -v
ruby 2.5.1p57 (2018-03-29 revision 63029) [x86_64-linux]
$ git diff
diff --git a/persistent-dmnd.gemspec b/persistent-dmnd.gemspec
index 72c50a6..e4b3dad 100644
--- a/persistent-dmnd.gemspec
+++ b/persistent-dmnd.gemspec
@@ -1,4 +1,3 @@
-# encoding: UTF-8
 
 # Persistent-💎: Ruby gem for easily creating immutable data structures
 # Copyright (c) 2017 Ivo Anjo <ivo.anjo@ist.utl.pt>
$ bundle -v
Bundler version 1.16.1
$ bundle install
Warning: the running version of Bundler (1.16.1) is older than the version that created the lockfile (1.16.2). We suggest you upgrade to the latest version of Bundler by running `gem install bundler`.
Using rake 12.3.1
Using bundler 1.16.1
Using byebug 10.0.2
Using coderay 1.1.2
Using concurrent-ruby 1.0.5
Using diff-lcs 1.3
Using ffi 1.9.25
Using hamster 3.0.0
Using method_source 0.9.0
Using persistent-dmnd 1.0.2 from source at `.`
Using pry 0.11.3
Using pry-byebug 3.6.0
Using rspec-support 3.7.1
Using rspec-core 3.7.1
Using rspec-expectations 3.7.0
Using rspec-mocks 3.7.0
Using rspec 3.7.0
Using rufo 0.3.1
Bundle complete! 9 Gemfile dependencies, 18 gems now installed.
Use `bundle info [gemname]` to see where a bundled gem is installed.
$ gem install bundler
Fetching: bundler-1.16.2.gem (100%)
Successfully installed bundler-1.16.2
Parsing documentation for bundler-1.16.2
Installing ri documentation for bundler-1.16.2
Done installing documentation for bundler after 10 seconds
1 gem installed
$ bundle -v
Bundler version 1.16.2
$ bundle install
Traceback (most recent call last):
# ...
.rvm/gems/ruby-2.5.1/gems/bundler-1.16.2/lib/bundler/dsl.rb:575:in `to_s': incompatible character encodings: UTF-8 and ASCII-8BIT (Encoding::CompatibilityError)

So I decided to try bundler 1.16.1 with truffleruby and indeed it works:

$ ruby -v
truffleruby 1.0.0-rc2, like ruby 2.4.4, GraalVM CE Native [x86_64-linux]
$ bundle -v
Bundler version 1.16.1
$ bundle install
Warning: the running version of Bundler (1.16.1) is older than the version that created the lockfile (1.16.2). We suggest you upgrade to the latest version of Bundler by running `gem install bundler`.
Using rake 12.3.1
Using bundler 1.16.1
Using byebug 10.0.2
Using coderay 1.1.2
Using concurrent-ruby 1.0.5
Using diff-lcs 1.3
Using ffi 1.9.25
Using hamster 3.0.0
Using method_source 0.9.0
Using persistent-dmnd 1.0.2 from source at `.`
Using pry 0.11.3
Using pry-byebug 3.6.0
Using rspec-support 3.7.1
Using rspec-core 3.7.1
Using rspec-expectations 3.7.0
Using rspec-mocks 3.7.0
Using rspec 3.7.0
Using rufo 0.3.1
Bundle complete! 9 Gemfile dependencies, 18 gems now installed.
Use `bundle info [gemname]` to see where a bundled gem is installed.

I could also run the specs, saw a few ones broken related to concurrent-ruby, I'll need to check to see if I should report them here or there.

So it appears this issue only happens with bundler 1.16.2.

@eregon
Copy link
Member

eregon commented Jun 23, 2018

Thank you for confirming!
I think this might be worthwhile to report to Bundler, as it is a arguably a regression.
Could you report a bug to Bundler about the magic encoding comment now being needed and potentially breaking Gemfile/gemspecs using non-ASCII characters without the magic encoding comment?

In the meanwhile, I now have a fix locally so TruffleRuby reads the magic comment and bundle install for Persistent💎 works even on Bundler 1.16.2 🎉
I still need to integrate it and check a few things with @nirvdrum.

Please file another issue for the broken specs related to concurrent-ruby, to make it easier to track.

@ivoanjo
Copy link
Contributor Author

ivoanjo commented Jun 24, 2018

Sounds good, I'll report the upstream issues.

It's really nice to see that otherwise TruffleRuby seems to handle the emoji really well -- I was half expecting to trigger a few corner cases there 😉 😈

@ivoanjo
Copy link
Contributor Author

ivoanjo commented Jun 24, 2018

Concurrent-ruby issues should be fixed with ruby-concurrency/concurrent-ruby#734

@eregon eregon self-assigned this Jun 24, 2018
@eregon
Copy link
Member

eregon commented Jun 25, 2018

The fixes are now in master: 9f48ac1...e732478 , including extensive specs for the magic comments and a few encoding-related fixes. They will be in the next release.

This fixes bundle install for Persistent💎 with Bundler 1.16.2.

For the 💎 Gemfile, I have another set of changes to fail early when trying to eval a binary String with no magic encoding comment. That still needs review and still differs from MRI, but at least the error message should be clear.

@eregon eregon added this to the 1.0.0-rc3 milestone Jun 30, 2018
@eregon
Copy link
Member

eregon commented Jul 3, 2018

The main issue of running bundle install in Persistent💎 was fixed in TruffleRuby 1.0.0-rc3.

@eregon eregon closed this as completed Jul 3, 2018
@eregon
Copy link
Member

eregon commented Jul 6, 2018

For information, bb6c24e is the new change that fails early when trying to eval a binary String with no magic encoding comment (mentioned above), and avoids mis-encoding binary strings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants