Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jsoup 1.15.2 appears to insert new spaces #1802

Closed
henricook opened this issue Jul 5, 2022 · 5 comments
Closed

jsoup 1.15.2 appears to insert new spaces #1802

henricook opened this issue Jul 5, 2022 · 5 comments
Assignees
Milestone

Comments

@henricook
Copy link

henricook commented Jul 5, 2022

Hi all,

When upgrading from 1.15.1 to 1.15.2 I appear to have encountered unexpected insertion of spaces - is this a bug or desired behaviour?

In this test with a multiline string:

<h1>This is my comment</h1>

<p>Lorem ipsum</p>

<span>Thanks</span>

1.15.1 output of val parsed = Jsoup.parse(textWithHtml):

<h1>This is my comment</h1>
<p>Lorem ipsum</p><span>Thanks</span>

1.15.2 output of val parsed = Jsoup.parse(textWithHtml):

<h1>This is my comment</h1>
<p>Lorem ipsum</p> <span>Thanks</span>

(space inserted between </p> and <span>)

EDITED: To remove references to using .text() on the output of Jsoup.parse

@jeffthomasweb
Copy link

Thanks for sharing the examples. I'm trying to reproduce this using Jsoup 1.15.2 and Java 17.0.4 since I don't have Kotlin installed on my machine and the following code snippet produces a space between ipsum</p> <span> as well when I use the Jsoup.parse() method without the text() method.

Document jsoupHtml() throws IOException {
    String multiLineHtml = """
        <h1>This is my comment</h1>

        <p>Lorem ipsum</p>

        <span>Thanks</span> """;
    Document resultingHtml = Jsoup.parse(multiLineHtml);

    return resultingHtml;
    }

//The above code produces this Html for me:

<html>
 <head></head>
 <body>
  <h1>This is my comment</h1>
  <p>Lorem ipsum</p> <span>Thanks</span>
 </body>
</html>

When I try using the text() method like you do in your example, I don't see an extra space in the final String result.

String jsoupHtml() throws IOException {
    String multiLineHtml = """
        <h1>This is my comment</h1>

        <p>Lorem ipsum</p>

        <span>Thanks</span> """;
    Document resultingHtml = Jsoup.parse(multiLineHtml);
    String textOfHtml = resultingHtml.text();
    
    return textOfHtml;
    }

The above code snippet produces the following String for me without extra spaces when using text() and System.out.println() to print the result to a Ubuntu Linux terminal:

This is my comment Lorem ipsum Thanks

Thanks for sharing your example but maybe I'm missing something when trying to reproduce this issue?

@dcremonini
Copy link

Hi,
I checked as well, I got the same as @jeffthomasweb.
My Java version is: IBM Semeru Runtime Open Edition 17.0.2.0 (build 17.0.2+8)

@henricook
Copy link
Author

henricook commented Aug 22, 2022

Thanks so much for your time both, I've edited my post to remove references to using .text() - it didn't line up with the HTML outputs I pasted... I'm not sure what I was smoking there.

I've created a tiny scala repro case located here, also attached as two jars in a zip (each one with a different jsoup version)

https://github.com/henricook/jsoup-1802-repro

jsoup-bug-1802-jsoup-1.15.x.zip

I'm Ubuntu / Java 11

@jhy jhy closed this as completed in f2913bd Jan 6, 2023
@jhy jhy self-assigned this Jan 6, 2023
@jhy jhy added this to the 1.15.4 milestone Jan 6, 2023
@jhy
Copy link
Owner

jhy commented Jan 6, 2023

I know it looks like it, but jsoup is not inserting a space here. It is actually collapsing a newline into a single space - and would collapse multiples of those if present.

I have improved the pretty-printer to now also collapse this space, similar to the earlier behavior.

Thanks for the report!

@henricook
Copy link
Author

Thanks @jhy !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants