New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support passing serialization_options to Nokogiri #171
Support passing serialization_options to Nokogiri #171
Conversation
This is a great start. Very well done! I think I might want to not have this option, however. I think it complicates the API, and it feels like it should not be needed. I don't see why The linked issue seem to say that the problem was only for a specific version of Nokogiri and that it would be fixed, but since it's a very old issue I take it the behavior wasn't ever restored. In either case, I would love it if you could set some default options for the different modes and use them. It's great with a test or two that fail without the save option set. |
Thanks!
Well, this was my first thoughts as well. But then I started thinking, what if someone needs it for something (even if I don't see why)? So that's why I proposed this PR, it does complicate things a little but it's guaranteed not to require anyone to do anything while migrating Roadie to a new version (unless there is a bug of course).
It was fixed in September of 2021 in Nokogiri, meaning they changed their defaults for the case of XHTML to NOT include
You mean something like this:
? |
Those choices sound very reasonable. I'm all for it. :-) |
Okay, I started implementing a PR to add new defaults and I found a problem. Basically, now Nokogiri doesn't specify NO_EMPTY_TAGS for XML by default. Also W3 recommends using self-closing tags (
It seems if there is no specific DTD or schema the preferred way is not defined. However we do have a Roadie user who has a use case for XML to have empty tags specified in the format of start tag+end tag ( So, what do we do now? If we choose Nokogiri's default approach we do break the existing code for at least one user, if we choose the said user's approach we go against Nokogiri's default and maybe even W3 recommendations in some cases. |
Good research. You've changed my mind on the matter. 👍 |
Codecov ReportBase: 97.97% // Head: 97.99% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## master #171 +/- ##
==========================================
+ Coverage 97.97% 97.99% +0.02%
==========================================
Files 57 57
Lines 1973 1994 +21
==========================================
+ Hits 1933 1954 +21
Misses 40 40
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
Version 5.1.0 was just released with this new feature in it. Thank you for your contribution! ❤️ |
This is great, thank you! Now it's released and I was thinking to make a PR in roadie-rails to allow specifying So I think I am not gonna propose a PR there for now but let me know if I am missing something. |
Nokogiri upgrade to 1.11.4 introduced an implicit upgrade of internal libxml2 library to 2.9.12 (for MRI at least).
However on that version of libxml2 they fixed handling of
Nokogiri::XML::Node::SaveOptions::NO_EMPTY_TAGS
flag (reference) so that it takes effect and forces Nokogiri parser to transform this (for example)<meta ...>
into this<meta ...></meta>
whendocument.mode = :xhtml
is used.The problem is that Roadie has these options hardcoded and there is no way for a Roadie user to change them without monkey patching. Hence this PR. It should allow Roadie users to remove
NO_EMPTY_TAGS
to stop Roadie and Nokogiri from transforming empty tags. This PR however is not only about this particular option, it would allow rewriting other options as well: for example, you could removeNokogiri::XML::Node::SaveOptions::NO_DECLARATION
to make Roadie and Nokogiri add <?xml ...> declaration tag at the very top of the document.If this is merged I can prepare a PR for roadie-rails as well.