Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rework how bleach.clean sanitizes css #633

Closed
willkg opened this issue Feb 7, 2022 · 7 comments
Closed

rework how bleach.clean sanitizes css #633

willkg opened this issue Feb 7, 2022 · 7 comments

Comments

@willkg
Copy link
Member

willkg commented Feb 7, 2022

Bleach's Cleaner has a sanitize_css that's a complete pain in the ass to maintain. Not only is it an inscrutable regex for a complex specification, but it's got lots of edge cases and places where it just plain doesn't work. So far, the way we've been "handling" this is by modifying the regex to fix edge cases.

@g-k mentioned that maybe it's time we took a step back and re-evaluated the goals for sanitizing CSS. What browsers are we targeting? What attacks are we preventing? From there, we can figure out a different course of action. Maybe it's writing or switching to a real css parser. Maybe it's dropping css sanitizing altogether.

@g-k
Copy link
Collaborator

g-k commented Feb 8, 2022

What browsers are we targeting?

I'd say either relatively recent Firefox, Safari, and Chromium-based browsers (say back to the last Fx ESR release or two) or not legacy browsers like Trident-based IE and other browsers from that era.

What attacks are we preventing?

I think this should basically be XSS, DOM Clobbering, and Structural Damage (bleach is Python so XSS via jQuery and Prototype Pollution) i.e. what DOMPurify handles minus the client side specific stuff. Since we expect bleach to run server side mXSS is a larger risk and bleach output should be safe for all supported browsers.

Maybe it's writing or switching to a real css parser. Maybe it's dropping css sanitizing altogether.

👍 in #248 we looked at tinycss2 and some other options. It'll be easier to switch now that we're only on Python 3. Ideally we can provide a "use at your own risk" interface for people to swap in their own logic.

@willkg
Copy link
Member Author

willkg commented Feb 8, 2022

If there aren't CSS vectors, then I'd rather not sanitize CSS at all and just take all that code out. CSS only comes into play if someone is using .clean() and allowing style tags or style attributes. Otherwise this isn't an issue with our default safety stuff.

@g-k
Copy link
Collaborator

g-k commented Feb 8, 2022

My only concern would be that we want to prevent damage to html outside the style attr or style elem.

For the attr, we'd want to validate that it can't be escaped. Doesn't contain quotes or double quotes is probably ok. IIRC we do this for other attrs.

The element is trickier, but maybe we can check that it closes and doesn't contain invalid CSS chars. There might be potential for mXSS if there's a way to use CSS to get html5lib out of sync with how browsers parse the style tag.

@willkg
Copy link
Member Author

willkg commented Feb 8, 2022

I don't have time right now to do a comprehensive analysis of CSS attack vectors, browser support, and use that to come up with a plan of action. Is that something you or someone else could do? If it's something we could do soon, then I'd be game for including this work in 5.0.0. Otherwise, I want to leave the sanitize_css stuff as is for 5.0.0 and target a future version.

@g-k
Copy link
Collaborator

g-k commented Feb 9, 2022

I don't have time right now to do a comprehensive analysis of CSS attack vectors, browser support, and use that to come up with a plan of action. Is that something you or someone else could do?

Maybe? Probably not soon.

Just to be clear for 5.0 I'm thinking we could:

  1. fix CI (mostly done? I think we just have to sort out missing hashes for some dev deps or drop hashes for dev deps)
  2. drop 3.6 support (done)
  3. remove the CSS url sanitizing regex to fix support dashes in style attributes  #529 and leave the rest of the regex-based CSS parsing in place for now. I can see a case for removing the regex based CSS parsing entirely, but like you say it'd require more analysis then we have time for.
  4. drop support for legacy browsers & protection against the CSS url-based XSS for them
  5. any other smaller bugfixes or refactoring you want to get in like splitting out the dev deps to another package and making CI less brittle

Does that sound reasonable? What are you thinking for 5.x?

@willkg
Copy link
Member Author

willkg commented Feb 9, 2022

I like everything you said. I created discussion topic #639 . We can do general release planning discussion there.

For reworking sanitize_css, it sounds like we need to do this:

  1. figure out the target browsers (covered in issue update goals / non-goals #627
  2. determine what we can drop from our css sanitization stuff; currently we think we want to:
    1. remove css url sanitizing regex
    2. leave everything else as is, but maybe extend the .clean() api to let people provide their own css sanitization function

Does that sound right?

@willkg willkg added this to the v5.0.0 milestone Feb 10, 2022
willkg added a commit that referenced this issue Apr 4, 2022
@willkg
Copy link
Member Author

willkg commented Apr 7, 2022

This is all set now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants