Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Newline whitespace between spans is too aggressively removed in HTML output #1787

Closed
leekowalkowski-hmrc opened this issue Jun 9, 2022 · 2 comments
Assignees
Labels
bug Confirmed bug that we should fix fixed
Milestone

Comments

@leekowalkowski-hmrc
Copy link

This was correct in 1.14.3, but it has broken in 1.15.1

The HTML

<p>
<span>Words</span>
<span>should</span>
<span>have</span>
<span>spaces</span>
<span>between</span>
<span>them</span>
</p>

Would render in a browser as Words should have spaces between them as a new-line character would be treated as a space. See fiddle: https://jsfiddle.net/uL4h1x0v/

When this is parsed with Jsoup.parse() then output with doc.toString() in 1.14.3 we get an equivalent representation that still renders correctly in a browser:

<html>
 <head></head>
 <body>
  <p> <span>Words</span> <span>should</span> <span>have</span> <span>spaces</span> <span>between</span> <span>them</span> </p> 
 </body>
</html>

…but in 1.15.1 it is incorrect, and renders as Wordsshouldhavespacesbetweenthem:

<html>
 <head></head>
 <body>
  <p><span>Words</span><span>should</span><span>have</span><span>spaces</span><span>between</span><span>them</span></p>
 </body>
</html>
@jhy jhy added this to the 1.15.2 milestone Jun 13, 2022
@jhy jhy self-assigned this Jun 13, 2022
@jhy jhy added the bug Confirmed bug that we should fix label Jun 13, 2022
@jhy
Copy link
Owner

jhy commented Jun 13, 2022

Thanks for the report! This looks similar to the issue that was fixed in f6d9aa0, which was about preformatted text, but it appears here in a different context.

@jhy jhy changed the title Jsoup 1.14.3 to 1.15.1 output error Newline whitespace between spans is too aggressively removed in HTML output Jun 13, 2022
@jhy jhy closed this as completed in e714ef1 Jun 17, 2022
@jhy jhy added the fixed label Jun 17, 2022
@jhy
Copy link
Owner

jhy commented Jun 17, 2022

Thanks, fixed now - and I cleaned up some other whitespace issues in the same commit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Confirmed bug that we should fix fixed
Projects
None yet
Development

No branches or pull requests

2 participants