You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In version 1.9.2, processing instructions are not correctly parsed any more.
Here is sample code for reproducing the issue.
package jsoupbug;
import java.util.List;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Node;
import org.jsoup.parser.Parser;
public class JsoupBug {
private static final String XML = "<?xml version=\"1.0\" encoding=\"utf-8\"?>\n<?myProcessingInstruction My Processing instruction.?>";
public static void main(String[] args) {
Document document = Jsoup.parse(XML, "", Parser.xmlParser());
document.outputSettings().prettyPrint(false);
List<Node> nodes = document.childNodes();
Node node = nodes.get(2);
String outerHtml = node.outerHtml();
System.out.println(outerHtml);
}
}
When I correctly understand the spec (https://www.w3.org/TR/REC-xml/#sec-pi) spaces are valid characters for processing instructions, but Jsoup messes things up.
With version 1.9.2 this prints: <?myprocessingInstruction my="" processing="" instruction.=""?>
However in 1.9.1 the behavior is as I would expect: <?myProcessingInstruction My Processing instruction.?>
The text was updated successfully, but these errors were encountered:
Reviewed and agree that this is a bug. The root cause is that we treat XML processing instructions as an odd hybrid of a comment and of a tag with attributes. Sometimes we want the attributes (e.g. to understand encoding options) and other times it'd be better to treat it as an opaque string (as in this example)
Would suggest that the fix would be to treat these as boolean attributes so to emit them without the empty ="" component.
Hello,
In version 1.9.2, processing instructions are not correctly parsed any more.
Here is sample code for reproducing the issue.
When I correctly understand the spec (https://www.w3.org/TR/REC-xml/#sec-pi) spaces are valid characters for processing instructions, but Jsoup messes things up.
With version 1.9.2 this prints:
<?myprocessingInstruction my="" processing="" instruction.=""?>
However in 1.9.1 the behavior is as I would expect:
<?myProcessingInstruction My Processing instruction.?>
The text was updated successfully, but these errors were encountered: