Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URI#display_uri raises ArgumentError: invalid byte sequence in UTF-8 #224

Closed
roback opened this issue Jan 29, 2016 · 2 comments
Closed

URI#display_uri raises ArgumentError: invalid byte sequence in UTF-8 #224

roback opened this issue Jan 29, 2016 · 2 comments

Comments

@roback
Copy link

roback commented Jan 29, 2016

Addressable::URI#display_uri raises ArgumentError when called on the url http://example.com%C2. The same happens for http://%D5.example.com.

I get the same error both with and without IDNA:

> Addressable::URI.parse("http://example.com%C2").display_uri
ArgumentError: invalid byte sequence in UTF-8
    from /.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/uri.rb:432:in `gsub'
    from /.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/uri.rb:432:in `unencode'
    from /.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/uri.rb:530:in `normalize_component'
    from /.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/uri.rb:1079:in `normalized_host'
    from /.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/uri.rb:1177:in `normalized_authority'
    from /.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/uri.rb:2078:in `normalize'
    from /.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/uri.rb:2103:in `display_uri'
    from (irb):1
> Addressable::URI.parse("http://example.com%C2").display_uri
ArgumentError: invalid byte sequence in UTF-8
    from /.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/idna/native.rb:36:in `split'
    from /.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/idna/native.rb:36:in `to_ascii'
    from /.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/uri.rb:1072:in `normalized_host'
    from /.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/uri.rb:1177:in `normalized_authority'
    from /.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/uri.rb:2078:in `normalize'
    from /.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/uri.rb:2103:in `display_uri'
    from (irb):1

The cause seems to be calling Addressable::URI.unencode for the above urls which results in a string that Ruby doesn't seem to like:

url = Addressable::URI.unencode("http://%D5.example.com")
# => "http://\xD5.example.com"
url.split(".")
# ArgumentError: invalid byte sequence in UTF-8
#     from (irb):10:in `split'
@sporkmonger
Copy link
Owner

sporkmonger commented Aug 7, 2018

These are some gross URIs. 😝

That said, I'm not sure I think this is a bug. Given what display_uri is supposed to do, this is legitimately an exceptional condition. There is no way to correctly render a UTF-8 string for that hostname. However, http://example.com%C2, gross as it is, I think it's actually a valid URI, so raising an invalid URI exception doesn't seem correct either. That makes me think this behavior may actually be correct, if perhaps a little surprising.

reg-name = *( unreserved / pct-encoded / sub-delims )

@dentarg
Copy link
Collaborator

dentarg commented Oct 23, 2022

This doesn't reproduce anymore, closing

irb(main):004:0> Addressable::VERSION::STRING
=> "2.8.1"
irb(main):005:0> Addressable::URI.parse("http://example.com%C2").display_uri
=> #<Addressable::URI:0x86c4 URI:http://example.com%C2/>
irb(main):006:0> Addressable::URI.unencode("http://%D5.example.com")
=> "http://\xD5.example.com"

Probably due to the changes made in #459

@dentarg dentarg closed this as completed Oct 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants