Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Accessibility bug] Semantic whitespace implied by block elements isn't retained properly #369

Closed
ghost opened this issue May 1, 2018 · 6 comments · Fixed by #651
Closed

Comments

@ghost
Copy link

ghost commented May 1, 2018

Separate block elements imply a visual spacing when seen on screen, which should be retained when removing those elements to keep proper word separation. Bleach doesn't seem to handle this right now:

>>> import bleach
>>> html = "<p>Te<b>st</b>!</p><p>Hello</p>"
>>> bleach.clean(html, tags=[], strip=True)
'Test!Hello'
>>>

The expected result would be: 'Test! Hello' (since <p> is a block element, and therefore two of them after each other implies a visual line break that is vital for proper readability of the text)

Edit: just to make this clear, I am not proposing parsing the CSS to find out what is a block element or something over-the-top like that. But at least reasonable default behavior would be nice which covers proper semantic HTML (without support for rogue CSS that unreasonably makes <p> inline or nonsense like that). That would work properly for 99% of the web content out there, unlike the current implementation which seems destined to produce missing vital whitespace on any non-trivial page.

@willkg
Copy link
Member

willkg commented Aug 26, 2018

Can you make a list of which block elements you want to handle?

@ghost
Copy link
Author

ghost commented Aug 26, 2018

https://developer.mozilla.org/en-US/docs/Web/HTML/Block-level_elements#Elements using this or a similar list would be a good idea I think

@willkg
Copy link
Member

willkg commented Aug 26, 2018

Transcribing that list here:

  • address
  • article
  • aside
  • blockquote
  • canvas
  • dd
  • div
  • dl
  • dt
  • fieldset
  • figcaption
  • figure
  • footer
  • form
  • h1, h2, h3, h4, h5, h6
  • header
  • hgroup
  • hr
  • li
  • main
  • nav
  • noscript
  • ol
  • output
  • p
  • pre
  • section
  • table
  • tfoot
  • ul
  • video

Can you take a stab at implementing this? I don't think I'm going to get to this for a while.

@ghost
Copy link
Author

ghost commented Aug 26, 2018

I wrote a rich text layouter recently that imports HTML where I got this as a by product, so right now I have no immediate need for this. I just thought it'd be a nice thing to add at some point

@willkg willkg modified the milestone: v3.0.0 Oct 2, 2018
jvanasco added a commit to jvanasco/bleach that referenced this issue Jul 11, 2019
This was referenced Jul 11, 2019
@jvanasco
Copy link
Contributor

I took a stab at this ticket. The solution is not perfect but allows for readable and accessible text.

While discussion on my approach should continue on the PR I wanted to bring up on this discussion:

The whitespace character should be a NEWLINE and not a SPACE.

input: <p>Te<b>st</b>!</p><p>Hello</p>
output:
- 'Test! Hello'
+ 'Test!\nHello'

The main reason is because more complex use cases, such as lists, become unreadable with a space character -- all the blocks bleed together.

@ghost
Copy link
Author

ghost commented Jul 12, 2019

The whitespace character should be a NEWLINE and not a SPACE.

Yeah you are correct I got that wrong, between block elements there should be a newline since that is also how it is rendered in a browser 👍

jvanasco added a commit to jvanasco/bleach that referenced this issue Jul 23, 2021
block elements are tracked and a newline is inserted when they are stripped.

new tests are included.
willkg added a commit that referenced this issue Apr 7, 2022
fix stripping block-level tags (#369)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants