Skip to content

How to avoid Nokogiri::HTML.parse behavior #2998

Answered by flavorjones
YusukeSuzuki asked this question in Q&A
Discussion options

You must be logged in to vote

@YusukeSuzuki Thanks for asking this question. What you're seeing is how libxml2 (the underlying HTML4 parser) constructs a document around this fragment.

I'm curious why you don't want to use DocumentFragment as this is exactly the use case it addresses. Neither <div>text</div> nor text is a Document.

You may also want to use Nokogiri::HTML5 which uses libgumbo instead of libxml2, and that library follows the precise rules in the HTML5 spec around document structure:

Nokogiri::HTML5.parse('<div>text</div>').to_html
# => "<html><head></head><body><div>text</div></body></html>"

Nokogiri::HTML5.parse('text').to_html
# => "<html><head></head><body>text</body></html>"

But really, again, I su…

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@flavorjones
Comment options

Answer selected by flavorjones
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants