Skip to content

Commit

Permalink
Merge pull request #459 from jarthod/iso-encoding-problem
Browse files Browse the repository at this point in the history
fix `invalid byte sequence in UTF-8` exception when unencoding URLs containing non UTF-8 characters
  • Loading branch information
sporkmonger committed Jul 30, 2022
2 parents 08d27e8 + b4c9882 commit 068f673
Show file tree
Hide file tree
Showing 2 changed files with 8 additions and 9 deletions.
12 changes: 3 additions & 9 deletions lib/addressable/uri.rb
Expand Up @@ -468,19 +468,13 @@ def self.unencode(uri, return_type=String, leave_encoded='')
"Expected Class (String or Addressable::URI), " +
"got #{return_type.inspect}"
end
uri = uri.dup
# Seriously, only use UTF-8. I'm really not kidding!
uri.force_encoding("utf-8")

unless leave_encoded.empty?
leave_encoded = leave_encoded.dup.force_encoding("utf-8")
end

result = uri.gsub(/%[0-9a-f]{2}/iu) do |sequence|
result = uri.gsub(/%[0-9a-f]{2}/i) do |sequence|
c = sequence[1..3].to_i(16).chr
c.force_encoding("utf-8")
c.force_encoding(sequence.encoding)
leave_encoded.include?(c) ? sequence : c
end

result.force_encoding("utf-8")
if return_type == String
return result
Expand Down
5 changes: 5 additions & 0 deletions spec/addressable/uri_spec.rb
Expand Up @@ -5992,6 +5992,11 @@ def to_str
expect(Addressable::URI.unencode_component("ski=%BA%DAɫ")).to eq("ski=\xBA\xDAɫ")
end

it "should not fail with UTF-8 incompatible string" do
url = "/M%E9/\xE9?p=\xFC".b
expect(Addressable::URI.unencode_component(url)).to eq("/M\xE9/\xE9?p=\xFC")
end

it "should result in correct percent encoded sequence as a URI" do
expect(Addressable::URI.unencode(
"/path?g%C3%BCnther", ::Addressable::URI
Expand Down

0 comments on commit 068f673

Please sign in to comment.