test: actually test against a working unicode-encoded exploit #205

flavorjones · 2021-04-08T15:37:33Z

This commit expands tests to cover an attack via unicode-encoded strings.

Let's work backwards. First, the failing test from rails-html-sanitizer (see rails/rails-html-sanitizer#111):

def test_should_sanitize_div_background_image_unicode_encoded
  raw = %(background-image:\0075\0072\006C\0028'\006a\0061\0076\0061\0073\0063\0072\0069\0070\0074\003a\0061\006c\0065\0072\0074\0028.1027\0058.1053\0053\0027\0029'\0029)
  assert_equal '', sanitize_css(raw)
end

This test started failing in Loofah v2.9.0. Why? Because it shouldn't have been getting filtered in the first place. When decoded, this string is simply background-image:52C8'a161332904a1c5248.10278.1053379'9 which admittedly is garbage, but not potentially harmful garbage.

🤷 I'm fixing that shortly in a PR to rails-html-sanitizers.

But where did this test come from?

It appeared along with a lot of other tests in rails/rails-html-sanitizer@121fd2d which were originally in rails via this 2007 commit: rails/rails@2d02199.

Loofah has a suspiciously similar test, here's the input and expected output:

"input": "<div style=\"background-image:\u00a5\u00a2\u006C\u0028'\u006a\u0061\u00a6\u0061\u00a3\u0063\u00a2\u0069\u00a0\u00a4\u003a\u0061\u006c\u0065\u00a2\u00a4\u0028.1027\u0058.1053\u0053\u0027\u0029'\u0029\">foo</div>",
"output": "<div>foo</div>"

This string was copied from the html5lib test suite in 80703f6. Here's the original version from that project, as ported from python in 2007:

"input": "<div style=\"background-image:\a5\a2\006C\0028'\006a\0061\a6\0061\a3\0063\a2\0069\a0\a4\003a\0061\006c\0065\a2\a4\0028.1027\0058.1053\0053\0027\0029'\0029\">foo</div>",
"output": "<div style=''>foo</div>"

But that '\a5' is clearly an error and was later "fixed" to be proper JSON:

diff --git a/sanitizer/tests1.dat b/sanitizer/tests1.dat
index 44db572..cc8d3c9 100644
--- a/sanitizer/tests1.dat
+++ b/sanitizer/tests1.dat
@@ -35,7 +35,7 @@
 
   {
     "name": "div_background_image_unicode_encoded",
-    "input": "<div style=\"background-image:\a5\a2\006C\0028'\006a\0061\a6\0061\a3\0063\a2\0069\a0\a4\003a\0061\006c\0065\a2\a4\0028.1027\0058.1053\0053\0027\0029'\0029\">foo</div>",
+    "input": "<div style=\"background-image:\u00a5\u00a2\u006C\u0028'\u006a\u0061\u00a6\u0061\u00a3\u0063\u00a2\u0069\u00a0\u00a4\u003a\u0061\u006c\u0065\u00a2\u00a4\u0028.1027\u0058.1053\u0053\u0027\u0029'\u0029\">foo</div>",
     "output": "<div style=''>foo</div>"
   },

Before that, though, this test was originally in python:

self.sanitize_html("""<div style="background-image:\0075\0072\006C\0028'\006a\0061\0076\0061\0073\0063\0072\0069\0070\0074\003a\0061\006c\0065\0072\0074\0028.1027\0058.1053\0053\0027\0029'\0029">foo</div>"""))

so it was somehow garbled in the translation from python to ruby via json.

Here's that string, decoded:

<div style="background-image:52C8'a161332904a1c5248.10278.1053379'9">foo</div>

What is that even supposed to be testing? This doesn't look right, either. Let's take a guess and make all of those real unicode characters by using the \uXXXX escape code:

print("""<div style="background-image:\u0075\u0072\u006C\u0028'\u006a\u0061\u0076\u0061\u0073\u0063\u0072\u0069\u0070\u0074\u003a\u0061\u006c\u0065\u0072\u0074\u0028.1027\u0058.1053\u0053\u0027\u0029'\u0029">foo</div>""")

<div style="background-image:url('javascript:alert(.1027X.1053S')')">foo</div>

Now we're getting somewhere. This looks like an actual attempt at an exploit! So even the original html5lib test from 2007 was encoded improperly.

But: this is the exact same (incorrect) string that was in the Rails tests from 2007. Where did it come from?

And: what is that .1027X.1053S bit? That ... doesn't look right, and is a syntax error in javascript.

Some googling brings me to OWASP's XSS cheatsheet, which names this exploit as "DIV Background-image with Unicoded XSS Exploit":

# This has been modified slightly to obfuscate the url parameter. The original vulnerability was found by Renaud Lifchitz as a vulnerability in Hotmail:
<DIV STYLE="background-image:\0075\0072\006C\0028'\006a\0061\0076\0061\0073\0063\0072\0069\0070\0074\003a\0061\006c\0065\0072\0074\0028.1027\0058.1053\0053\0027\0029'\0029">

Some more googling brings me to this disclosure email from 2006 which uses this as the exploit:

<body bgcolor="#CCCCCC; background-image:
url\0028'\006a\0061\0076\0061\0073\0063\0072\0069\0070\0074\003a\0061\006c\0065\0072\0074\0028\0064\006f\0063\0075\006d\0065\006e\0074\002e\0063\006f\006f\006b\0069\0065\0029'\0029">
<p>Found by http://www.sysdream.com !!!</p>
</body>

That CSS property, decoded, becomes:

background-image: url('javascript:alert(document.cookie)')

Now that is a real test. So once again, we find that an error was made (or copied from yet another source that made an error) in this string that made it an invalid test.

Guessing, again, that .1027 should have been \u0027 and .1053 should have been \u0053, this is the string that the OWASP test could and should have been:

print("""<div style="background-image:\u0075\u0072\u006C\u0028'\u006a\u0061\u0076\u0061\u0073\u0063\u0072\u0069\u0070\u0074\u003a\u0061\u006c\u0065\u0072\u0074\u0028\u0027\u0058\u0053\u0053\u0027\u0029'\u0029">foo</div>""")

<div style="background-image:url('javascript:alert('XSS')')">foo</div>

Which, you know, would have been a nice try if it wasn't for the improperly-escaped single-quotes.

TL;DR: I'm SMDH at the errors that have propagated in the test strings over the years.

I'm SMDH at the errors that have propagated in the test strings over the years.

valscion · 2021-06-10T12:19:48Z

This was an interesting journey through the test string history! Thank you for this unexpected history lesson 😅

flavorjones · 2021-06-10T13:53:44Z

@valscion Thank you for coming to my TED talk. 🤣

This adds onto #205. The original reported exploit in 2006 used CSS hex encoding (e.g., "\0075" for "u"), which was ... - mistakenly put into a double-quoted Ruby string in the Instiki test suite in 2007, - then copied into html5lib-ruby's test suite, - then copied into html5lib-python's suite, - then finally copied into the html5lib shared suite, - which was imported into Loofah

test: actually test against a working unicode-encoded exploit

895b5f4

I'm SMDH at the errors that have propagated in the test strings over the years.

flavorjones mentioned this pull request Apr 8, 2021

fix css scrubbing tests rails/rails-html-sanitizer#113

Merged

flavorjones merged commit db365d0 into main Apr 8, 2021

flavorjones mentioned this pull request Apr 8, 2021

Regressions in Loofah 2.9.0 and 2.9.1 #204

Closed

flavorjones deleted the flavorjones-test-unicode-encoded-exploit branch April 8, 2021 16:01

flavorjones mentioned this pull request Oct 29, 2021

test: use CSS hex-encoded strings to test sanitization #220

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: actually test against a working unicode-encoded exploit #205

test: actually test against a working unicode-encoded exploit #205

flavorjones commented Apr 8, 2021

valscion commented Jun 10, 2021

flavorjones commented Jun 10, 2021

test: actually test against a working unicode-encoded exploit #205

test: actually test against a working unicode-encoded exploit #205

Conversation

flavorjones commented Apr 8, 2021

valscion commented Jun 10, 2021

flavorjones commented Jun 10, 2021