New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug in normalized_host in Addressable (ArgumentError: invalid byte sequence in UTF-8) #62
Comments
I just wanted to see what caused the error :) Addressable first runs > Addressable::URI.unencode_component("some_site.net%C2")
=> "some_site.net\xC2"
> "some_site.net\xC2".split("")
ArgumentError: invalid byte sequence in UTF-8
from (pry):35:in `split' |
@roback what about when not using the |
Removed the > Twingly::URL.parse("http://some_site.net%C2")
ArgumentError: invalid byte sequence in UTF-8
from /Users/mattias/.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/uri.rb:432:in `gsub'
from /Users/mattias/.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/uri.rb:432:in `unencode'
from /Users/mattias/.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/uri.rb:530:in `normalize_component'
from /Users/mattias/.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/uri.rb:1079:in `normalized_host'
from /Users/mattias/.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/uri.rb:1177:in `normalized_authority'
from /Users/mattias/.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/uri.rb:2078:in `normalize'
from /Users/mattias/.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/uri.rb:2103:in `display_uri'
from /Users/mattias/repos/twingly-url/lib/twingly/url.rb:38:in `internal_parse'
from /Users/mattias/repos/twingly-url/lib/twingly/url.rb:26:in `parse'
from (irb):1 The same as above, but with the > Twingly::URL.parse("http://some_site.net%C2")
ArgumentError: invalid byte sequence in UTF-8
from /Users/mattias/.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/idna/native.rb:36:in `split'
from /Users/mattias/.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/idna/native.rb:36:in `to_ascii'
from /Users/mattias/.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/uri.rb:1072:in `normalized_host'
from /Users/mattias/.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/uri.rb:1177:in `normalized_authority'
from /Users/mattias/.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/uri.rb:2078:in `normalize'
from /Users/mattias/.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/uri.rb:2103:in `display_uri'
from /Users/mattias/repos/twingly-url/lib/twingly/url.rb:38:in `internal_parse'
from /Users/mattias/repos/twingly-url/lib/twingly/url.rb:26:in `parse'
from (irb):1 |
Perhaps we can use a combination of url = "http://some_site.net%C2"
url.valid_encoding?
# => true
unencoded_url = Addressable::URI.unencode_component(url)
# => "http://some_site.net\xC2"
unencoded_url.valid_encoding?
# => false |
It wasn't that simple :(
|
Opened sporkmonger/addressable#224 |
Not a fan of this solution, but it works. I cannot come up with a better one 😢 diff --git a/lib/twingly/url.rb b/lib/twingly/url.rb
index 97635e7..50ade75 100644
--- a/lib/twingly/url.rb
+++ b/lib/twingly/url.rb
@@ -35,6 +35,8 @@ module Twingly
scheme = addressable_uri.scheme
raise Twingly::URL::Error::ParseError unless scheme =~ ACCEPTED_SCHEMES
+ guard_against_addressable_bug(addressable_uri)
+
public_suffix_domain = PublicSuffix.parse(addressable_uri.display_uri.host)
raise Twingly::URL::Error::ParseError if public_suffix_domain.nil?
@@ -56,6 +58,18 @@ module Twingly
end
end
+ # Workaround for the following bug in addressable:
+ # https://github.com/sporkmonger/addressable/issues/224
+ def guard_against_addressable_bug(addressable_uri)
+ addressable_uri.display_uri
+ rescue ArgumentError => error
+ if error.message.include?("invalid byte sequence in UTF-8")
+ raise Twingly::URL::Error::ParseError
+ end
+
+ raise
+ end
+
private :new, :internal_parse, :to_addressable_uri
end
diff --git a/spec/lib/twingly/url_spec.rb b/spec/lib/twingly/url_spec.rb
index 729f847..ded80b2 100644
--- a/spec/lib/twingly/url_spec.rb
+++ b/spec/lib/twingly/url_spec.rb
@@ -27,6 +27,7 @@ def invalid_urls
"http://xn--t...-/",
"http://xn--...-",
"leather beltsbelts for menleather beltmens beltsleather belts for menmens beltbelt bucklesblack l...",
+ "some_site.net%C2",
]
end |
Temporary fix for #62. Related to sporkmonger/addressable#224
Just to make this issue more clear, the bug is in [32] pry(main)> Addressable::VERSION::STRING
=> "2.4.0"
[33] pry(main)> Addressable::URI.heuristic_parse("http://some_site.net%C2").normalized_host
ArgumentError: invalid byte sequence in UTF-8
from /Users/dentarg/.gem/ruby/2.2.5/gems/addressable-2.4.0/lib/addressable/uri.rb:432:in `gsub'
[34] pry(main)> wtf
Exception: ArgumentError: invalid byte sequence in UTF-8
--
0: /Users/dentarg/.gem/ruby/2.2.5/gems/addressable-2.4.0/lib/addressable/uri.rb:432:in `gsub'
1: /Users/dentarg/.gem/ruby/2.2.5/gems/addressable-2.4.0/lib/addressable/uri.rb:432:in `unencode'
2: /Users/dentarg/.gem/ruby/2.2.5/gems/addressable-2.4.0/lib/addressable/uri.rb:530:in `normalize_component'
3: /Users/dentarg/.gem/ruby/2.2.5/gems/addressable-2.4.0/lib/addressable/uri.rb:1079:in `normalized_host'
4: (pry):22:in `__pry__' |
No change for addressable 2.5.0: $ pry
[1] pry(main)> require "addressable"
=> true
[2] pry(main)> Addressable::VERSION::STRING
=> "2.5.0"
[3] pry(main)> Addressable::URI.heuristic_parse("http://some_site.net%C2").normalized_host
ArgumentError: invalid byte sequence in UTF-8
from /Users/dentarg/.gem/ruby/2.2.5/gems/addressable-2.5.0/lib/addressable/idna/native.rb:36:in `split'
[4] pry(main)> wtf?
Exception: ArgumentError: invalid byte sequence in UTF-8
--
0: /Users/dentarg/.gem/ruby/2.2.5/gems/addressable-2.5.0/lib/addressable/idna/native.rb:36:in `split'
1: /Users/dentarg/.gem/ruby/2.2.5/gems/addressable-2.5.0/lib/addressable/idna/native.rb:36:in `to_ascii'
2: /Users/dentarg/.gem/ruby/2.2.5/gems/addressable-2.5.0/lib/addressable/uri.rb:1092:in `normalized_host'
3: (pry):3:in `__pry__'
4: /Users/dentarg/.gem/ruby/2.2.5/gems/pry-0.10.4/lib/pry/pry_instance.rb:355:in `eval'
5: /Users/dentarg/.gem/ruby/2.2.5/gems/pry-0.10.4/lib/pry/pry_instance.rb:355:in `evaluate_ruby'
6: /Users/dentarg/.gem/ruby/2.2.5/gems/pry-0.10.4/lib/pry/pry_instance.rb:323:in `handle_line'
7: /Users/dentarg/.gem/ruby/2.2.5/gems/pry-0.10.4/lib/pry/pry_instance.rb:243:in `block (2 levels) in eval'
8: /Users/dentarg/.gem/ruby/2.2.5/gems/pry-0.10.4/lib/pry/pry_instance.rb:242:in `catch'
9: /Users/dentarg/.gem/ruby/2.2.5/gems/pry-0.10.4/lib/pry/pry_instance.rb:242:in `block in eval' $ pry
[1] pry(main)> require "addressable"
=> true
[2] pry(main)> Addressable::URI.heuristic_parse("http://some_site.net%C2").normalized_host
ArgumentError: invalid byte sequence in UTF-8
from /Users/dentarg/.gem/ruby/2.2.5/gems/addressable-2.5.0/lib/addressable/uri.rb:440:in `gsub'
[3] pry(main)> wtf?
Exception: ArgumentError: invalid byte sequence in UTF-8
--
0: /Users/dentarg/.gem/ruby/2.2.5/gems/addressable-2.5.0/lib/addressable/uri.rb:440:in `gsub'
1: /Users/dentarg/.gem/ruby/2.2.5/gems/addressable-2.5.0/lib/addressable/uri.rb:440:in `unencode'
2: /Users/dentarg/.gem/ruby/2.2.5/gems/addressable-2.5.0/lib/addressable/uri.rb:536:in `normalize_component'
3: /Users/dentarg/.gem/ruby/2.2.5/gems/addressable-2.5.0/lib/addressable/uri.rb:1099:in `normalized_host'
4: (pry):2:in `__pry__'
5: /Users/dentarg/.gem/ruby/2.2.5/gems/pry-0.10.4/lib/pry/pry_instance.rb:355:in `eval'
6: /Users/dentarg/.gem/ruby/2.2.5/gems/pry-0.10.4/lib/pry/pry_instance.rb:355:in `evaluate_ruby'
7: /Users/dentarg/.gem/ruby/2.2.5/gems/pry-0.10.4/lib/pry/pry_instance.rb:323:in `handle_line'
8: /Users/dentarg/.gem/ruby/2.2.5/gems/pry-0.10.4/lib/pry/pry_instance.rb:243:in `block (2 levels) in eval'
9: /Users/dentarg/.gem/ruby/2.2.5/gems/pry-0.10.4/lib/pry/pry_instance.rb:242:in `catch'
[4] pry(main)> Addressable::VERSION::STRING
=> "2.5.0" |
Still an error upstream => Addressable::VERSION
irb(main):005:0> Addressable::VERSION::STRING
=> "2.5.2"
irb(main):006:0> Addressable::URI.parse("http://example.com%C2").display_uri
ArgumentError: invalid byte sequence in UTF-8
from /Users/dentarg/.gem/ruby/2.4.2/gems/addressable-2.5.2/lib/addressable/idna/native.rb:36:in `split'
from /Users/dentarg/.gem/ruby/2.4.2/gems/addressable-2.5.2/lib/addressable/idna/native.rb:36:in `to_ascii'
from /Users/dentarg/.gem/ruby/2.4.2/gems/addressable-2.5.2/lib/addressable/uri.rb:1092:in `normalized_host'
from /Users/dentarg/.gem/ruby/2.4.2/gems/addressable-2.5.2/lib/addressable/uri.rb:1210:in `normalized_authority'
from /Users/dentarg/.gem/ruby/2.4.2/gems/addressable-2.5.2/lib/addressable/uri.rb:2133:in `normalize'
from /Users/dentarg/.gem/ruby/2.4.2/gems/addressable-2.5.2/lib/addressable/uri.rb:2158:in `display_uri'
from (irb):6
from /Users/dentarg/.rubies/ruby-2.4.2/bin/irb:11:in `<main>' without idn-ruby: irb(main):001:0> Addressable::URI.parse("http://example.com%C2").display_uri
ArgumentError: invalid byte sequence in UTF-8
from /Users/dentarg/.gem/ruby/2.4.2/gems/addressable-2.5.2/lib/addressable/uri.rb:440:in `gsub'
from /Users/dentarg/.gem/ruby/2.4.2/gems/addressable-2.5.2/lib/addressable/uri.rb:440:in `unencode'
from /Users/dentarg/.gem/ruby/2.4.2/gems/addressable-2.5.2/lib/addressable/uri.rb:536:in `normalize_component'
from /Users/dentarg/.gem/ruby/2.4.2/gems/addressable-2.5.2/lib/addressable/uri.rb:1099:in `normalized_host'
from /Users/dentarg/.gem/ruby/2.4.2/gems/addressable-2.5.2/lib/addressable/uri.rb:1210:in `normalized_authority'
from /Users/dentarg/.gem/ruby/2.4.2/gems/addressable-2.5.2/lib/addressable/uri.rb:2133:in `normalize'
from /Users/dentarg/.gem/ruby/2.4.2/gems/addressable-2.5.2/lib/addressable/uri.rb:2158:in `display_uri'
from (irb):1
from /Users/dentarg/.rubies/ruby-2.4.2/bin/irb:11:in `<main>' |
2,5 years later Bob has replied :-) Pasting sporkmonger/addressable#224 (comment) for your convenience
|
According to the "preferred format" used by DNS. See https://en.wikipedia.org/wiki/Domain_Name_System#Domain_name_syntax,_internationalization Moves one invalid URL to the set of invalid URLs (if you enter http://www..twingly..com/ in the address bar in Chrome, it does a search, doesn't try to visit any site). Close twingly#62
Yes, you're right, we'll probably never be able to remove that rescue. It was less temporary than we first thought :) |
When
Another one:
http://+%D5d.some_site.net
The text was updated successfully, but these errors were encountered: