test: actually test against a working unicode-encoded exploit #205
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This commit expands tests to cover an attack via unicode-encoded strings.
Let's work backwards. First, the failing test from
rails-html-sanitizer
(see rails/rails-html-sanitizer#111):This test started failing in Loofah v2.9.0. Why? Because it shouldn't have been getting filtered in the first place. When decoded, this string is simply
background-image:52C8'a161332904a1c5248.10278.1053379'9
which admittedly is garbage, but not potentially harmful garbage.馃し I'm fixing that shortly in a PR to
rails-html-sanitizers
.But where did this test come from?
It appeared along with a lot of other tests in rails/rails-html-sanitizer@121fd2d which were originally in rails via this 2007 commit: rails/rails@2d02199.
Loofah has a suspiciously similar test, here's the input and expected output:
This string was copied from the html5lib test suite in 80703f6. Here's the original version from that project, as ported from python in 2007:
But that '\a5' is clearly an error and was later "fixed" to be proper JSON:
Before that, though, this test was originally in python:
so it was somehow garbled in the translation from python to ruby via json.
Here's that string, decoded:
What is that even supposed to be testing? This doesn't look right, either. Let's take a guess and make all of those real unicode characters by using the
\uXXXX
escape code:Now we're getting somewhere. This looks like an actual attempt at an exploit! So even the original html5lib test from 2007 was encoded improperly.
But: this is the exact same (incorrect) string that was in the Rails tests from 2007. Where did it come from?
And: what is that
.1027X.1053S
bit? That ... doesn't look right, and is a syntax error in javascript.Some googling brings me to OWASP's XSS cheatsheet, which names this exploit as "DIV Background-image with Unicoded XSS Exploit":
Some more googling brings me to this disclosure email from 2006 which uses this as the exploit:
That CSS property, decoded, becomes:
Now that is a real test. So once again, we find that an error was made (or copied from yet another source that made an error) in this string that made it an invalid test.
Guessing, again, that
.1027
should have been\u0027
and.1053
should have been\u0053
, this is the string that the OWASP test could and should have been:Which, you know, would have been a nice try if it wasn't for the improperly-escaped single-quotes.
TL;DR: I'm SMDH at the errors that have propagated in the test strings over the years.