Skip to content

Commit

Permalink
Merge pull request #163 from rails/flavorjones-ensure-utf8-encoding-f…
Browse files Browse the repository at this point in the history
…rom-all-sanitizers

fix: ensure LinkSanitizer returns utf-8 encoded strings
  • Loading branch information
flavorjones committed May 18, 2023
2 parents 2b0dcb5 + 47f6255 commit b14f89d
Show file tree
Hide file tree
Showing 3 changed files with 25 additions and 12 deletions.
9 changes: 7 additions & 2 deletions CHANGELOG.md
Expand Up @@ -32,12 +32,17 @@

*Mike Dalessio*

* `LinkSanitizer` always returns UTF-8 encoded strings. `SafeListSanitizer` and `FullSanitizer`
already ensured this encoding.

*Mike Dalessio*

* `SafeListSanitizer` allows `time` tag and `lang` attribute by default.

*Mike Dalessio*

* `Rails::Html::XPATHS_TO_REMOVE` has been removed. It's not necessary with the existing sanitizers,
and should have been a private constant all along anyway.
* The constant `Rails::Html::XPATHS_TO_REMOVE` has been removed. It's not necessary with the
existing sanitizers, and should have been a private constant all along anyway.

*Mike Dalessio*

Expand Down
10 changes: 2 additions & 8 deletions lib/rails/html/sanitizer.rb
Expand Up @@ -182,12 +182,6 @@ def serialize(fragment)
properly_encode(fragment, encoding: "UTF-8")
end
end

module SimpleString
def serialize(fragment)
fragment.to_s
end
end
end
end
end
Expand Down Expand Up @@ -242,7 +236,7 @@ class LinkSanitizer < Rails::HTML::Sanitizer
include HTML::Concern::ComposedSanitize
include HTML::Concern::Parser::HTML4
include HTML::Concern::Scrubber::Link
include HTML::Concern::Serializer::SimpleString
include HTML::Concern::Serializer::UTF8Encode
end

# == Rails::HTML4::SafeListSanitizer
Expand Down Expand Up @@ -352,7 +346,7 @@ class LinkSanitizer < Rails::HTML::Sanitizer
include HTML::Concern::ComposedSanitize
include HTML::Concern::Parser::HTML5
include HTML::Concern::Scrubber::Link
include HTML::Concern::Serializer::SimpleString
include HTML::Concern::Serializer::UTF8Encode
end

# == Rails::HTML5::SafeListSanitizer
Expand Down
18 changes: 16 additions & 2 deletions test/sanitizer_test.rb
Expand Up @@ -174,6 +174,13 @@ def test_full_sanitize_respect_html_escaping_of_the_given_string
assert_equal "omg &lt;script&gt;BOM&lt;/script&gt;", full_sanitize("omg &lt;script&gt;BOM&lt;/script&gt;")
end

def test_sanitize_ascii_8bit_string
full_sanitize("<div><a>hello</a></div>".encode("ASCII-8BIT")).tap do |sanitized|
assert_equal "hello", sanitized
assert_equal Encoding::UTF_8, sanitized.encoding
end
end

protected
def full_sanitize(input, options = {})
module_under_test::FullSanitizer.new.sanitize(input, options)
Expand Down Expand Up @@ -223,6 +230,13 @@ def test_strip_links_with_linkception
assert_equal "Magic", link_sanitize("<a href='http://www.rubyonrails.com/'>Mag<a href='http://www.ruby-lang.org/'>ic")
end

def test_sanitize_ascii_8bit_string
link_sanitize("<div><a>hello</a></div>".encode("ASCII-8BIT")).tap do |sanitized|
assert_equal "<div>hello</div>", sanitized
assert_equal Encoding::UTF_8, sanitized.encoding
end
end

protected
def link_sanitize(input, options = {})
module_under_test::LinkSanitizer.new.sanitize(input, options)
Expand Down Expand Up @@ -671,8 +685,8 @@ def test_x03a_legitimate
end

def test_sanitize_ascii_8bit_string
safe_list_sanitize("<a>hello</a>".encode("ASCII-8BIT")).tap do |sanitized|
assert_equal "<a>hello</a>", sanitized
safe_list_sanitize("<div><a>hello</a></div>".encode("ASCII-8BIT")).tap do |sanitized|
assert_equal "<div><a>hello</a></div>", sanitized
assert_equal Encoding::UTF_8, sanitized.encoding
end
end
Expand Down

0 comments on commit b14f89d

Please sign in to comment.