rework how bleach.clean sanitizes css #633

willkg · 2022-02-07T17:31:13Z

Bleach's Cleaner has a sanitize_css that's a complete pain in the ass to maintain. Not only is it an inscrutable regex for a complex specification, but it's got lots of edge cases and places where it just plain doesn't work. So far, the way we've been "handling" this is by modifying the regex to fix edge cases.

@g-k mentioned that maybe it's time we took a step back and re-evaluated the goals for sanitizing CSS. What browsers are we targeting? What attacks are we preventing? From there, we can figure out a different course of action. Maybe it's writing or switching to a real css parser. Maybe it's dropping css sanitizing altogether.

The text was updated successfully, but these errors were encountered:

g-k · 2022-02-08T13:09:59Z

What browsers are we targeting?

I'd say either relatively recent Firefox, Safari, and Chromium-based browsers (say back to the last Fx ESR release or two) or not legacy browsers like Trident-based IE and other browsers from that era.

What attacks are we preventing?

I think this should basically be XSS, DOM Clobbering, and Structural Damage (bleach is Python so XSS via jQuery and Prototype Pollution) i.e. what DOMPurify handles minus the client side specific stuff. Since we expect bleach to run server side mXSS is a larger risk and bleach output should be safe for all supported browsers.

Maybe it's writing or switching to a real css parser. Maybe it's dropping css sanitizing altogether.

👍 in #248 we looked at tinycss2 and some other options. It'll be easier to switch now that we're only on Python 3. Ideally we can provide a "use at your own risk" interface for people to swap in their own logic.

willkg · 2022-02-08T13:36:36Z

If there aren't CSS vectors, then I'd rather not sanitize CSS at all and just take all that code out. CSS only comes into play if someone is using .clean() and allowing style tags or style attributes. Otherwise this isn't an issue with our default safety stuff.

g-k · 2022-02-08T13:48:13Z

My only concern would be that we want to prevent damage to html outside the style attr or style elem.

For the attr, we'd want to validate that it can't be escaped. Doesn't contain quotes or double quotes is probably ok. IIRC we do this for other attrs.

The element is trickier, but maybe we can check that it closes and doesn't contain invalid CSS chars. There might be potential for mXSS if there's a way to use CSS to get html5lib out of sync with how browsers parse the style tag.

willkg · 2022-02-08T13:54:01Z

I don't have time right now to do a comprehensive analysis of CSS attack vectors, browser support, and use that to come up with a plan of action. Is that something you or someone else could do? If it's something we could do soon, then I'd be game for including this work in 5.0.0. Otherwise, I want to leave the sanitize_css stuff as is for 5.0.0 and target a future version.

g-k · 2022-02-09T13:58:22Z

I don't have time right now to do a comprehensive analysis of CSS attack vectors, browser support, and use that to come up with a plan of action. Is that something you or someone else could do?

Maybe? Probably not soon.

Just to be clear for 5.0 I'm thinking we could:

fix CI (mostly done? I think we just have to sort out missing hashes for some dev deps or drop hashes for dev deps)
drop 3.6 support (done)
remove the CSS url sanitizing regex to fix support dashes in style attributes #529 and leave the rest of the regex-based CSS parsing in place for now. I can see a case for removing the regex based CSS parsing entirely, but like you say it'd require more analysis then we have time for.
drop support for legacy browsers & protection against the CSS url-based XSS for them
any other smaller bugfixes or refactoring you want to get in like splitting out the dev deps to another package and making CI less brittle

Does that sound reasonable? What are you thinking for 5.x?

willkg · 2022-02-09T14:23:55Z

I like everything you said. I created discussion topic #639 . We can do general release planning discussion there.

For reworking sanitize_css, it sounds like we need to do this:

figure out the target browsers (covered in issue update goals / non-goals #627
determine what we can drop from our css sanitization stuff; currently we think we want to:
1. remove css url sanitizing regex
2. leave everything else as is, but maybe extend the .clean() api to let people provide their own css sanitization function

Does that sound right?

switch to tinycss2 (#633)

willkg · 2022-04-07T14:30:57Z

This is all set now.

willkg added the enhancement label Feb 7, 2022

This was referenced Feb 7, 2022

redo sanitize_css to use a css parser #248

Closed

Add style attr sanitizer #535

Closed

g-k added the css label Feb 8, 2022

willkg added this to the v5.0.0 milestone Feb 10, 2022

willkg closed this as completed Apr 7, 2022

This was referenced Apr 18, 2022

fix: upgrade bleach version to 5.0.0 openedx/django-wiki#118

Merged

fix: upgrade bleach to 5.0.0 openedx/edx-platform#30263

Merged

eht16 mentioned this issue Apr 25, 2022

bug: possible regression in CSS parsing of advanced expressions #657

Closed

xmo-odoo mentioned this issue Oct 29, 2024

Add support for sanitizing (filtering) style properties rust-ammonia/ammonia#208

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rework how bleach.clean sanitizes css #633

rework how bleach.clean sanitizes css #633

willkg commented Feb 7, 2022

g-k commented Feb 8, 2022

willkg commented Feb 8, 2022

g-k commented Feb 8, 2022

willkg commented Feb 8, 2022

g-k commented Feb 9, 2022

willkg commented Feb 9, 2022

willkg commented Apr 7, 2022

rework how bleach.clean sanitizes css #633

rework how bleach.clean sanitizes css #633

Comments

willkg commented Feb 7, 2022

g-k commented Feb 8, 2022

willkg commented Feb 8, 2022

g-k commented Feb 8, 2022

willkg commented Feb 8, 2022

g-k commented Feb 9, 2022

willkg commented Feb 9, 2022

willkg commented Apr 7, 2022