Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: \n when stripping nested tags #663

Closed
drjova opened this issue May 16, 2022 · 5 comments
Closed

bug: \n when stripping nested tags #663

drjova opened this issue May 16, 2022 · 5 comments
Labels
untriaged Bug reports that haven't been triaged

Comments

@drjova
Copy link

drjova commented May 16, 2022

Describe the bug

A clear and concise description of what the bug is. [e.g. "bleach.clean does not escape script tag contents"]

** python and bleach versions (please complete the following information):**

  • Python Version: 3.8.9
  • Bleach Version: 5.0.0

To Reproduce

Steps to reproduce the behavior:

from bleach import clean
text = "<div>example<h1> example</h1></div>"
result = clean(text, attributes=[], tags=['div'], strip=True)
print(result)
"""
<div>example
 example</div>
"""

Expected behavior

from bleach import clean
text = "<div>example<h1> example</h1></div>"
result = clean(text, attributes=[], tags=['div'], strip=True)
print(result)
"""
<div>example example</div>
"""

Thank you 🙏

@drjova drjova added the untriaged Bug reports that haven't been triaged label May 16, 2022
@willkg
Copy link
Member

willkg commented May 16, 2022

h1 is a block level tag. Bleach 5.0.0 fixed sanitizing so that when it removes block-level tags, it adds a \n because that's what HTML parsers would do in those circumstances. The problem was covered in issue #369.

@willkg willkg closed this as completed May 16, 2022
@drjova
Copy link
Author

drjova commented May 16, 2022

@willkg Thank you for the explanation. It would be nice to have an option to disable this since not all use-cases need to make the text more readable. Would it be considered if I made a PR?

@willkg
Copy link
Member

willkg commented May 16, 2022

What's your use case that this is problematic?

@drjova
Copy link
Author

drjova commented May 17, 2022

In our case we would like to clean specific tags, including block-level tags, without formatting the content.

@willkg
Copy link
Member

willkg commented May 17, 2022

That doesn't really answer my question--it mostly restates the bug. What's the use case here? Why is adding a \n problematic?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
untriaged Bug reports that haven't been triaged
Projects
None yet
Development

No branches or pull requests

2 participants