Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bleach is deprecated; statement on project going forward (2023-01-23) #698

Open
willkg opened this issue Jan 23, 2023 · 11 comments
Open

bleach is deprecated; statement on project going forward (2023-01-23) #698

willkg opened this issue Jan 23, 2023 · 11 comments

Comments

@willkg
Copy link
Member

willkg commented Jan 23, 2023

Summary

As of now, Bleach is deprecated.

We will continue to support Bleach:

  1. security updates
  2. support for new Python versions
  3. fixes for egregious bugs

I figure that's one release a year or something like that.

Why?

Bleach sits on top of--and heavily relies on--html5lib which is no longer in active development. It is increasingly difficult to maintain Bleach in that context and I think it's nuts to build a security library on top of a library that's not in active development. There are some options (switch to something else, take over html5lib, etc), I don't particularly like any of them. I think instead, someone new should explore the options with a brand new library and a fresh start.

@hugovk
Copy link
Contributor

hugovk commented Jan 29, 2023

Thank you for all your work on Bleach, and for announcing this clearly and with plenty of notice!

@Lukas-J
Copy link

Lukas-J commented Mar 5, 2023

Hi @willkg ,
It seems there is some activity going on in html5lib. So maybe not all hope is lost for bleach. Is there any way it could become un-deprecated sometime?

If not, I'm struggling finding a suitable replacement for bleach to be honest. Does someone have recommendations to safely sanitize html user input?

Cheers!

@g1-1-1
Copy link

g1-1-1 commented Mar 6, 2023

Hi @willkg , It seems there is some activity going on in html5lib. So maybe not all hope is lost for bleach. Is there any way it could become un-deprecated sometime?

If not, I'm struggling finding a suitable replacement for bleach to be honest. Does someone have recommendations to safely sanitize html user input?

Cheers!

ammonia has a python binder you can use with similar features + faster speeds

@jsocol
Copy link
Contributor

jsocol commented Jul 31, 2023

Thank you, @willkg, for your stewardship of Bleach for so long—much longer than I had it or than I bet you expected to 😅

And thank you to all of the other contributors over the past dozen years. It's still amazing to me to see how much this project spread and influenced other libraries. nh3/ammonia look like fantastic, fast alternatives built on a rock solid foundation in html5ever.

@willkg
Copy link
Member Author

willkg commented Oct 6, 2023

Update: I just pushed out Bleach 6.1.0 which drops Python 3.7 support, picks up Python 3.12 support, and fixes a handful of issues. I also closed out a lot of old issues and issues for things we're not going to fix.

I will continue fixing security issues as they come up.

Barring anything exciting, I'll probably do the next release when Python releases 3.13.

@aclark4life
Copy link

@willkg Could this library or html5lib-python benefit from any "lifting efforts" as provided by Tidelift? E.g. https://tidelift.com/subscription/pkg/pypi-pillow. I am potentially looking for something new to "lift"… thanks for any info or guidance towards where to put my efforts.

VitaliStupin added a commit to VitaliStupin/X-Road-Metrics that referenced this issue Nov 29, 2023
Corrector is currently using slow and deprecated (mozilla/bleach#698) bleach. Based on the fact that X-Road metrics should not contain HTML it would be more beneficial to just use python translate method and replace potentially dangerous HTML characters. Translate does not parse html and estimated to be 100 times faster than bleach.

Using translate method instead of bleach.clean.

Renaming sanitise -> sanitize to be consistent with the rest of the code.
@kylepollina
Copy link

As of January 2024 there have been 3 commits to the htlm5lib repo: https://github.com/html5lib/html5lib-python/commits/master/

The html5lib is not entirely dead

@Alex3917
Copy link

Alex3917 commented Feb 24, 2024

FWIW, I tried nh3 as a replacement and found it to be unusable. I reported one issue here, but I've also found others that are at least as severe and probably much harder to fix:

messense/nh3#36

There also don't seem to be any Python libraries for linkifying HTML documents, only markdown. I understand why maintaining this library is problematic, but the functionality it provides is essential and there doesn't seem to be any viable replacement right now for either sanitization or linkification. I don't know what the right answer is, but wanted to share my experience since I've now put a few hours into trying to find a viable replacement for each set of functionality.

@simonw
Copy link

simonw commented Mar 16, 2024

html5lib does have active maintainers at the moment: see this discussion thread for details html5lib/html5lib-python#560

@MikeHiett
Copy link

As @Alex3917 mentions, I've also tried implementing nh3 and the allowlists for tags and attributes does not work as described by the doucmentation for anyone who also may stumble into this problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests