Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

redo sanitize_css to use a css parser #248

Closed
willkg opened this issue Feb 21, 2017 · 9 comments
Closed

redo sanitize_css to use a css parser #248

willkg opened this issue Feb 21, 2017 · 9 comments

Comments

@willkg
Copy link
Member

willkg commented Feb 21, 2017

html5lib has this issue for replacing parts of the sanitize_css() method with a better CSS parser:

html5lib/html5lib-python#152

Bleach has its own sanitize_css() which probably has some of the same problems, but not all of them. Might be worth switching to a library instead.

@willkg willkg changed the title redo sanitize_css redo sanitize_css to use tinycss Feb 21, 2017
@willkg willkg added this to the v2.0 milestone Feb 21, 2017
@willkg
Copy link
Member Author

willkg commented Feb 21, 2017

I'm putting this in the 2.0 milestone for now because I'm changing lots of things, but will push it out if I don't have time.

@willkg
Copy link
Member Author

willkg commented Feb 21, 2017

@willkg
Copy link
Member Author

willkg commented Mar 6, 2017

I got tinycss sort of working with Bleach. Something like this:

    def sanitize_css(self, style):
        """Sanitizes css in style tags"""
        parsed, errors = self.css_parser.parse_style_attr(style)

        if not parsed:
            return u''

        def decl_to_css(decl):
            pri = '!' + decl.priority if decl.priority else ''
            return '%s: %s%s' % (decl.name, decl.value.as_css(), pri)

        parsed = [
            decl
            for decl in parsed
            if (decl.name.lower() in self.allowed_css_properties or
                decl.name.lower() in self.allowed_svg_properties)
        ]

        if not parsed:
            return u''

        return u'; '.join([decl_to_css(decl) for decl in parsed]) + u';'

We have a test that verifies that bogus CSS gets dropped:

def test_valid_css():
    """The sanitizer should fix missing CSS values."""
    styles = ['color', 'float']
    assert (
        clean('<p style="float: left; color: ">foo</p>', styles=styles) ==
        '<p style="float: left;">foo</p>'
    )
    assert (
        clean('<p style="color: float: left;">foo</p>', styles=styles) ==
        '<p style="">foo</p>'
    )

It fails the second assertion in that test. For some reason tinycss thinks color: float: left; is valid and lets it through. I'm not sure whether that's important or not.

I think I'm going to leave things as they are and think about this issue some more another time. Possible options:

  1. fix that bug upstream
  2. look at tinycss2: https://github.com/SimonSapin/tinycss2
  3. find another library
  4. write our own parser that does the bare minimum for what we need to do--we kind of have one already

@willkg willkg removed this from the v2.0 milestone Mar 6, 2017
@stucox
Copy link

stucox commented Apr 7, 2017

I recently tried replacing html5lib’s sanitize_css() with something using tinycss myself.

FWIW, I realised it made sense to also have a whitelist of CSS property functions (rgba(), blur(), etc), to keep out the nasties like expression() yet allow some degree of modern styling.

@willkg
Copy link
Member Author

willkg commented Apr 7, 2017

Oo--supporting CSS property functions is pretty interesting. Maybe sanitize_css should get adjusted to work like attributes do where it can take a callable for filtering?

@g-k
Copy link
Collaborator

g-k commented Mar 26, 2020

@peterbe suggested some other options too:

and willkg suggested rolling a minimal parser (e.g. just of declarations) might suffice.

@g-k g-k changed the title redo sanitize_css to use tinycss redo sanitize_css to use a css parser Mar 26, 2020
@FelixSchwarz
Copy link

just wanted to mention that there is also tinycss2 which handles some edge cases much better than tinycss did.

@g-k
Copy link
Collaborator

g-k commented Jun 11, 2020

just wanted to mention that there is also tinycss2 which handles some edge cases much better than tinycss did.

Yep! The current plan is to drop Python 2 support in a major release so we can use tinycss2.

@willkg
Copy link
Member Author

willkg commented Feb 7, 2022

I don't want to move forward with this. Instead I want to switch gears and focus our efforts on issue #633. I'm going to close this out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants