Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: actually test against a working unicode-encoded exploit #205

Merged
merged 1 commit into from Apr 8, 2021

Conversation

flavorjones
Copy link
Owner

This commit expands tests to cover an attack via unicode-encoded strings.

Let's work backwards. First, the failing test from rails-html-sanitizer (see rails/rails-html-sanitizer#111):

def test_should_sanitize_div_background_image_unicode_encoded
  raw = %(background-image:\0075\0072\006C\0028'\006a\0061\0076\0061\0073\0063\0072\0069\0070\0074\003a\0061\006c\0065\0072\0074\0028.1027\0058.1053\0053\0027\0029'\0029)
  assert_equal '', sanitize_css(raw)
end

This test started failing in Loofah v2.9.0. Why? Because it shouldn't have been getting filtered in the first place. When decoded, this string is simply background-image:52C8'a161332904a1c5248.10278.1053379'9 which admittedly is garbage, but not potentially harmful garbage.

馃し I'm fixing that shortly in a PR to rails-html-sanitizers.


But where did this test come from?

It appeared along with a lot of other tests in rails/rails-html-sanitizer@121fd2d which were originally in rails via this 2007 commit: rails/rails@2d02199.

Loofah has a suspiciously similar test, here's the input and expected output:

"input": "<div style=\"background-image:\u00a5\u00a2\u006C\u0028'\u006a\u0061\u00a6\u0061\u00a3\u0063\u00a2\u0069\u00a0\u00a4\u003a\u0061\u006c\u0065\u00a2\u00a4\u0028.1027\u0058.1053\u0053\u0027\u0029'\u0029\">foo</div>",
"output": "<div>foo</div>"

This string was copied from the html5lib test suite in 80703f6. Here's the original version from that project, as ported from python in 2007:

"input": "<div style=\"background-image:\a5\a2\006C\0028'\006a\0061\a6\0061\a3\0063\a2\0069\a0\a4\003a\0061\006c\0065\a2\a4\0028.1027\0058.1053\0053\0027\0029'\0029\">foo</div>",
"output": "<div style=''>foo</div>"

But that '\a5' is clearly an error and was later "fixed" to be proper JSON:

diff --git a/sanitizer/tests1.dat b/sanitizer/tests1.dat
index 44db572..cc8d3c9 100644
--- a/sanitizer/tests1.dat
+++ b/sanitizer/tests1.dat
@@ -35,7 +35,7 @@
 
   {
     "name": "div_background_image_unicode_encoded",
-    "input": "<div style=\"background-image:\a5\a2\006C\0028'\006a\0061\a6\0061\a3\0063\a2\0069\a0\a4\003a\0061\006c\0065\a2\a4\0028.1027\0058.1053\0053\0027\0029'\0029\">foo</div>",
+    "input": "<div style=\"background-image:\u00a5\u00a2\u006C\u0028'\u006a\u0061\u00a6\u0061\u00a3\u0063\u00a2\u0069\u00a0\u00a4\u003a\u0061\u006c\u0065\u00a2\u00a4\u0028.1027\u0058.1053\u0053\u0027\u0029'\u0029\">foo</div>",
     "output": "<div style=''>foo</div>"
   },

Before that, though, this test was originally in python:

self.sanitize_html("""<div style="background-image:\0075\0072\006C\0028'\006a\0061\0076\0061\0073\0063\0072\0069\0070\0074\003a\0061\006c\0065\0072\0074\0028.1027\0058.1053\0053\0027\0029'\0029">foo</div>"""))

so it was somehow garbled in the translation from python to ruby via json.

Here's that string, decoded:

<div style="background-image:52C8'a161332904a1c5248.10278.1053379'9">foo</div>

What is that even supposed to be testing? This doesn't look right, either. Let's take a guess and make all of those real unicode characters by using the \uXXXX escape code:

print("""<div style="background-image:\u0075\u0072\u006C\u0028'\u006a\u0061\u0076\u0061\u0073\u0063\u0072\u0069\u0070\u0074\u003a\u0061\u006c\u0065\u0072\u0074\u0028.1027\u0058.1053\u0053\u0027\u0029'\u0029">foo</div>""")

<div style="background-image:url('javascript:alert(.1027X.1053S')')">foo</div>

Now we're getting somewhere. This looks like an actual attempt at an exploit! So even the original html5lib test from 2007 was encoded improperly.


But: this is the exact same (incorrect) string that was in the Rails tests from 2007. Where did it come from?

And: what is that .1027X.1053S bit? That ... doesn't look right, and is a syntax error in javascript.

Some googling brings me to OWASP's XSS cheatsheet, which names this exploit as "DIV Background-image with Unicoded XSS Exploit":

# This has been modified slightly to obfuscate the url parameter. The original vulnerability was found by Renaud Lifchitz as a vulnerability in Hotmail:
<DIV STYLE="background-image:\0075\0072\006C\0028'\006a\0061\0076\0061\0073\0063\0072\0069\0070\0074\003a\0061\006c\0065\0072\0074\0028.1027\0058.1053\0053\0027\0029'\0029">

Some more googling brings me to this disclosure email from 2006 which uses this as the exploit:

<body bgcolor="#CCCCCC; background-image:
url\0028'\006a\0061\0076\0061\0073\0063\0072\0069\0070\0074\003a\0061\006c\0065\0072\0074\0028\0064\006f\0063\0075\006d\0065\006e\0074\002e\0063\006f\006f\006b\0069\0065\0029'\0029">
<p>Found by http://www.sysdream.com !!!</p>
</body>

That CSS property, decoded, becomes:

background-image: url('javascript:alert(document.cookie)')

Now that is a real test. So once again, we find that an error was made (or copied from yet another source that made an error) in this string that made it an invalid test.

Guessing, again, that .1027 should have been \u0027 and .1053 should have been \u0053, this is the string that the OWASP test could and should have been:

print("""<div style="background-image:\u0075\u0072\u006C\u0028'\u006a\u0061\u0076\u0061\u0073\u0063\u0072\u0069\u0070\u0074\u003a\u0061\u006c\u0065\u0072\u0074\u0028\u0027\u0058\u0053\u0053\u0027\u0029'\u0029">foo</div>""")

<div style="background-image:url('javascript:alert('XSS')')">foo</div>

Which, you know, would have been a nice try if it wasn't for the improperly-escaped single-quotes.


TL;DR: I'm SMDH at the errors that have propagated in the test strings over the years.

I'm SMDH at the errors that have propagated in the test strings over
the years.
@flavorjones flavorjones merged commit db365d0 into main Apr 8, 2021
@flavorjones flavorjones deleted the flavorjones-test-unicode-encoded-exploit branch April 8, 2021 16:01
@valscion
Copy link

This was an interesting journey through the test string history! Thank you for this unexpected history lesson 馃槄

@flavorjones
Copy link
Owner Author

@valscion Thank you for coming to my TED talk. 馃ぃ

flavorjones added a commit that referenced this pull request Oct 29, 2021
This adds onto #205. The original reported exploit in 2006 used CSS
hex encoding (e.g., "\0075" for "u"), which was ...

- mistakenly put into a double-quoted Ruby string in the Instiki test
  suite in 2007,
- then copied into html5lib-ruby's test suite,
- then copied into html5lib-python's suite,
- then finally copied into the html5lib shared suite,
- which was imported into Loofah
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants