Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature goals compared to Bleach (/ full ammonia API)? #10

Open
xmo-odoo opened this issue Jan 27, 2023 · 18 comments
Open

Feature goals compared to Bleach (/ full ammonia API)? #10

xmo-odoo opened this issue Jan 27, 2023 · 18 comments

Comments

@xmo-odoo
Copy link

xmo-odoo commented Jan 27, 2023

Discussions are not enabled so opening it here, sorry 'bout it.

With the recent deprecation of bleach (mostly on grounds of html5lib being unmaintained), unless someone has the time to e.g. rebuild the html5lib API on top of an existing html5 parser and the maintainer of bleach decides to use that, ammonia/nh3 seems well positioned as a migration target (there's already one package which has done that visible from the linked Bleach PR).

One issue there is that nh3 currently provides rather limited tuning knobs compared to Ammonia and Bleach (not sure how the two relate as I have not looked yet), but the readme doesn't really say what your eventual goals would be on that front as maintainer. If you do aim to favor such support & migration, maybe an issue or even project (kanban) about full Ammonia support and / or Bleach features parity (if not API compatibility) could be a consideration?

An other possible issue (though more internal) is for exposing customisations which allow arbitrary callables (attribute_filter seems to be the only one currently): nh3 currently releases the GIL during cleanup, which wouldn't allow calling Python functions, and thus exposing a generic attribute_filter, I don't know whether Ammonia has parallelism built-in or how much you care about parallel cleaning (though I figure having two paths and only keeping the GIL if callbacks were actually provided would always be an option if a somewhat more annoying one).

@messense messense changed the title Fdature goals compared to Bleach (/ full ammonia API)? Feature goals compared to Bleach (/ full ammonia API)? Jan 27, 2023
@messense
Copy link
Owner

Neither feature parity with Bleach nor full ammonia API is the goal for nh3 at the moment, but I'm happy to add more knobs for projects that want to migrate from bleach to nh3.

@xmo-odoo
Copy link
Author

Others will likely opine but FWIW looking at our current use of bleach, Ammonia seems to support most of what we need (though some of it by hand-rolling through the ultra generic attibute_filter which may not be so useful), however nh3 does not currently expose it

We also customise the serialization compared to the bleach default (which is just html5lib's as far as I can tell), but html5ever doesn't seem to have tuning knobs there (or at least not any which is relevant to what we configured) so there's definitely nothing you could do.

FWIW for the first two items nh3 might be able to provide bespoke whitelists from which it'd compose the relevant attribute_filter internally, but that may or may not be desirable (and for the first it would add a dependency on something like cssparser to implement a rust-level sanitizer).

@messense
Copy link
Owner

Added attribute_filter in #11.

@messense
Copy link
Owner

Added clean_content_tags in #12

@xmo-odoo
Copy link
Author

❤️

@camflan
Copy link

camflan commented Apr 19, 2023

Bleach offers the ability to pass callbacks which can modify/augment/remove attributes from each element. Is there a chance to get something similar here? One of our use-cases is to add target="_blank" to some anchor elements

@messense
Copy link
Owner

messense commented Apr 22, 2023

Bleach offers the ability to pass callbacks which can modify/augment/remove attributes from each element. Is there a chance to get something similar here?

@camflan Added tag_attribute_values for this in v0.2.11.

See also a2ec808

@MikeVL
Copy link

MikeVL commented Jun 7, 2023

Bleach offers the ability to pass callbacks which can modify/augment/remove attributes from each element. Is there a chance to get something similar here?

@camflan Added tag_attribute_values for this in v0.2.11.

It's not entirely clear how to add target="_blank" to a only if href value is https://github.com for example.

And is possible to rename element, for example h1 -> h2 ?

@xmo-odoo
Copy link
Author

xmo-odoo commented Jun 8, 2023

It's not entirely clear how to add target="_blank" to a only if href value is https://github.com for example.

And is possible to rename element, for example h1 -> h2 ?

I don't think you can do either of those via Ammonia (and thus nh3), especially for the second request it's not a general-purpose HTML-rewriting device.

You can see all the operations Ammonia supports at ammonia::Builder, and nh3 handles a subset of those.

Your first request would I think be rust-ammonia/ammonia#163.

@jasperfirecai2
Copy link

Bleach offers the ability to pass callbacks which can modify/augment/remove attributes from each element. Is there a chance to get something similar here?

@camflan Added tag_attribute_values for this in v0.2.11.

See also a2ec808

Would it be possible to allow a * entry to this map just like is possible in the attributes arg?
from docs:

attributes (dict[str, set[str]], optional) – Sets the HTML attributes that are allowed on specific tags, * key means the attributes are allowed on any tag.

I'm trying to sanitize data from CKEditor5 and multiple tags can for example contain style="text-align:right;". I cannot however allow all values of style, as that would make it vulnerable to xss

@xmo-odoo
Copy link
Author

Would it be possible to allow a * entry to this map just like is possible in the attributes arg? from docs:

nh3 is a thin layer over ammonia so it's limited to what ammonia provides.

For attributes, messense dispatches to generic_attributes or tag_attributes based on the * key rather than replicate the two parameters.

But there's no such generic_attribute_values, messense would have to create a custom attribute_filter which may then have to be reconciled with a user-provided attribute_filter, plus unexpectedly changing the concurrency caracteristics of the call, and possibly having different performance caracteristics. Maybe not something to hide from the user.

Unless you can get Ammonia to add a generic_attribute_values, I'd suggest just using attribute_filter directly.

@jasperfirecai2
Copy link

jasperfirecai2 commented Jun 14, 2023

For attributes, messense dispatches to generic_attributes or tag_attributes based on the * key rather than replicate the two parameters.

Ah, right i missed that in the rust docs.

unexpectedly changing the concurrency caracteristics of the call, and possibly having different performance caracteristics. Maybe not something to hide from the user.

Luckily performance is not an issue for me so i'll do it myself

I'd suggest just using attribute_filter directly.

That is fair enough. I assume I'd implement that by checking if attribute == "style" and value within a custom whitelist?

Does this filter apply before or after cleaning?

Thank you for the quick reply

Edit: I've resorted to just implementing a loop to update the tag_attribute_values with a tag whitelist

    for tag in tag_whitelist:
        tag_attribute_values.update([
            (tag, tag_attribute_values['*']),
        ])

@frwickst
Copy link

One thing from bleach that I'm missing (or maybe I'm just missing it in the docs) is the strip attribute from Bleach. It removes tags which are not in allowed list. This is not the same as clean_content_tags since that is an explicit list of the tags to remove, I'm looking for something that would just remove all tags that are not allowed.

@xmo-odoo
Copy link
Author

One thing from bleach that I'm missing (or maybe I'm just missing it in the docs) is the strip attribute from Bleach. It removes tags which are not in allowed list. This is not the same as clean_content_tags since that is an explicit list of the tags to remove, I'm looking for something that would just remove all tags that are not allowed.

Isn't that just tags? Unlike bleach, ammonia always just removes the blacklisted tags, it doesn't escape them:

>>> bleach.clean('<div><foo>xxx</foo></div>', tags={'div'})
'<div>&lt;foo&gt;xxx&lt;/foo&gt;</div>'
>>> bleach.clean('<div><foo>xxx</foo></div>', tags={'div'}, strip=True)
'<div>xxx</div>'
>>> nh3.clean('<div><foo>xxx</foo></div>', tags={'div'})
'<div>xxx</div>'

clean_content_tags is used to remove not just the tag but the tag's contents as well, which is why it's opt-in, and additional to tags (you can't clean_content_tags a whitelisted tag).

@ThiefMaster
Copy link

Unlike bleach, ammonia always just removes the blacklisted tags, it doesn't escape them:

That makes it unusable for usecases where you want to tread the string as plaintext where people may be writing stuff that looks like HTML (ie it has < and >) but isn't HTML... Being able to choose between stripping and escaping would be very much appreciated.

@messense
Copy link
Owner

Being able to choose between stripping and escaping would be very much appreciated.

See rust-ammonia/ammonia#145

@mattiaverga
Copy link

Is there an equivalent for bleach.linkify()?

@xmo-odoo
Copy link
Author

Is there an equivalent for bleach.linkify()?

No.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants