Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CGI.unescapeHTML("\xFF&") gives ArgumentError: invalid byte sequence in UTF-8. #6342

Closed
tae8838 opened this issue Jul 27, 2020 · 8 comments
Closed

Comments

@tae8838
Copy link

tae8838 commented Jul 27, 2020

Environment Information

jruby 9.2.11.1 (2.5.7) 2020-03-25 b1f55b1a40 OpenJDK 64-Bit Server VM 25.252-b09 on 1.8.0_252-b09 +jit [darwin-x86_64]
macOS 10.15.5

Expected Behavior

$ ruby -v
ruby 2.5.7p206 (2019-10-01 revision 67816) [x86_64-darwin19]
$ irb
irb(main):001:0> require 'CGI'
=> true
irb(main):002:0> CGI.unescapeHTML("\xFF&")
=> "\xFF&"

Actual Behavior

$ ruby -v
jruby 9.2.11.1 (2.5.7) 2020-03-25 b1f55b1a40 OpenJDK 64-Bit Server VM 25.252-b09 on 1.8.0_252-b09 +jit [darwin-x86_64]
$ irb
irb(main):001:0> require 'cgi'
=> true
irb(main):002:0> CGI.unescapeHTML("\xFF&")
Traceback (most recent call last):
        8: from /Users/tae/.rbenv/versions/jruby-9.2.11.1/bin/irb:13:in `<main>'
        7: from org/jruby/RubyKernel.java:1189:in `catch'
        6: from org/jruby/RubyKernel.java:1189:in `catch'
        5: from org/jruby/RubyKernel.java:1442:in `loop'
        4: from org/jruby/RubyKernel.java:1048:in `eval'
        3: from (irb):2:in `evaluate'
        2: from /Users/tae/.rbenv/versions/jruby-9.2.11.1/lib/ruby/stdlib/cgi/util.rb:96:in `unescapeHTML'
        1: from org/jruby/RubyString.java:3126:in `gsub'
ArgumentError (invalid byte sequence in UTF-8)
irb(main):003:0>

I checked similar issues like: #6093 but it doesn't seem relevant since the issue is solved in my jruby version.

Thank you!

@headius
Copy link
Member

headius commented Jul 29, 2020

Same on 9.2 HEAD:

$ jruby -rcgi -e 'CGI.unescapeHTML("\xFF&")'
ArgumentError: invalid byte sequence in UTF-8
          gsub at org/jruby/RubyString.java:3157
  unescapeHTML at /Users/headius/projects/jruby/lib/ruby/stdlib/cgi/util.rb:96
        <main> at -e:1

Could be missing some logic from the CRuby version.

@headius
Copy link
Member

headius commented Jul 29, 2020

This appears to have regressed due to our merging #6097 into JRuby 9.2.11.0.

That change also does not appear to have made it into the JRuby fork of Ruby, where we maintain our copy of the standard library.

@headius
Copy link
Member

headius commented Jul 29, 2020

Actually I am wrong... that PR was for escapeHTML. This may be a similar issue with a similar fix.

@headius
Copy link
Member

headius commented Jul 29, 2020

Big question for me is why CRuby does not have this issue. I will try to look into the root cause so we can avoid shipping a patched cgi/util.rb.

@headius
Copy link
Member

headius commented Jul 29, 2020

A similar patch for unescapeHTML does appear to resolve this, fwiw:

diff --git a/lib/ruby/stdlib/cgi/util.rb b/lib/ruby/stdlib/cgi/util.rb
index daf81c3bf5..993ace57d9 100644
--- a/lib/ruby/stdlib/cgi/util.rb
+++ b/lib/ruby/stdlib/cgi/util.rb
@@ -93,7 +93,8 @@ module CGI::Util
                 when Encoding::ISO_8859_1; 256
                 else 128
                 end
-    string.gsub(/&(apos|amp|quot|gt|lt|\#[0-9]+|\#[xX][0-9A-Fa-f]+);/) do
+    string = string.b
+    string.gsub!(/&(apos|amp|quot|gt|lt|\#[0-9]+|\#[xX][0-9A-Fa-f]+);/) do
       match = $1.dup
       case match
       when 'apos'                then "'"
@@ -119,6 +120,7 @@ module CGI::Util
         "&#{match};"
       end
     end
+    string.force_encoding enc
   end
 
   # Synonym for CGI::escapeHTML(str)

@ahorek
Copy link
Contributor

ahorek commented Jul 29, 2020

Big question for me is why CRuby does not have this issue.

CRuby has an alternative implementation in C, that doesn't have this issue

there's also a java implementation in jruby
https://github.com/jruby/jruby/blob/master/core/src/main/java/org/jruby/ext/cgi/escape/CGIEscape.java
but it's disabled due to this bug #4678

@headius
Copy link
Member

headius commented Jul 29, 2020

@ahorek Yeah I'm catching back up with all this and realizing that when I "reverted" the "fix" for #4531 I basically just disabled the native logic altogether.

The reason we differ from CRuby is that the optimized logic allows this case through, while the unoptimized logic does not, and due to 3d8847e we never use the optimized logic.

The Ruby code is actually incorrect here, because it does not allow cases like this through where the native extension does. I think the fix provided by @ahorek should be applied to the Ruby code in CRuby as well as in our version, since CRuby without the extension fails exactly the same way:

$ rvm ruby-2.5.7 do ruby -rcgi -e 'module CGI::Escape; def unescapeHTML(*); super; end; end; CGI.unescapeHTML("\xFF&")'
Traceback (most recent call last):
	3: from -e:1:in `<main>'
	2: from -e:1:in `unescapeHTML'
	1: from /Users/headius/.rvm/rubies/ruby-2.5.7/lib/ruby/2.5.0/cgi/util.rb:93:in `unescapeHTML'
/Users/headius/.rvm/rubies/ruby-2.5.7/lib/ruby/2.5.0/cgi/util.rb:93:in `gsub': invalid byte sequence in UTF-8 (ArgumentError)

The issues with the prepend + extend logic from the extension still exist in 9.2 HEAD. They prevent super from working in the native extension, resulting in these errors when calling the escaping methods with a non-ASCII-compatible encoding:

  1) Error:
CGIUtilTest#test_cgi_escapeHTML:CP50220:
NoMethodError: super: no superclass method `escapeHTML' for CGI:Class
Did you mean?  escape_html
               unescape_html
               escapeHTML
               unescapeHTML
    org/jruby/RubyBasicObject.java:1715:in `method_missing'
    org/jruby/ext/cgi/escape/CGIEscape.java:359:in `escapeHTML'
    org/jruby/ext/cgi/escape/CGIEscape.java:359:in `escapeHTML'
    /Users/headius/projects/jruby/test/mri/cgi/test_cgi_util.rb:132:in `block in CGIUtilTest'

etc

@tae8838
Copy link
Author

tae8838 commented Aug 3, 2020

@ahorek @headius I ran into this issue because of a bug I had in my app. I just fixed bug and everything is now working fine. If this is the expected behavior then I think we can close the issue. Thank you for looking into it!

@ahorek ahorek closed this as completed Feb 9, 2022
@enebo enebo added this to the Invalid or Duplicate milestone Mar 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants