Closed
Description
MRI nokogiri version
$ nokogiri -v
# Nokogiri (1.8.3)
---
warnings: []
nokogiri: 1.8.3
ruby:
version: 2.5.1
platform: x86_64-darwin16
description: ruby 2.5.1p57 (2018-03-29 revision 63029) [x86_64-darwin16]
engine: ruby
libxml:
binding: extension
source: packaged
libxml2_path: "/Users/jrochkind/.gem/ruby/2.5.1/gems/nokogiri-1.8.3/ports/x86_64-apple-darwin16.7.0/libxml2/2.9.8"
libxslt_path: "/Users/jrochkind/.gem/ruby/2.5.1/gems/nokogiri-1.8.3/ports/x86_64-apple-darwin16.7.0/libxslt/1.1.32"
libxml2_patches:
- 0001-Revert-Do-not-URI-escape-in-server-side-includes.patch
libxslt_patches: []
compiled: 2.9.8
loaded: 2.9.8
Jruby nokogiri version
$ nokogiri -v
Ignoring jruby-launcher-1.1.5-java because its extensions are not built. Try: gem pristine jruby-launcher --version 1.1.5
# Nokogiri (1.8.3)
---
warnings: []
nokogiri: 1.8.3
ruby:
version: 2.5.0
platform: java
description: jruby 9.2.0.0 (2.5.0) 2018-05-24 81156a8 Java HotSpot(TM) 64-Bit Server
VM 25.121-b13 on 1.8.0_121-b13 +jit [darwin-x86_64]
engine: jruby
jruby: 9.2.0.0
xerces: Xerces-J 2.11.0
nekohtml: NekoHTML 1.9.21
reproduction
require 'nokogiri'
xml_src = <<-EOF
<top xmlns="http://example.org/top/default">
<inside>
value
</inside>
</top>
EOF
doc = Nokogiri::XML.parse(xml_src)
new_doc = Nokogiri::XML::Document.new
inside_node = doc.at_xpath("//default:inside", default: "http://example.org/top/default")
puts "inside_node.namespace.href: #{inside_node.namespace.href}"
# in both, it knows this node is in the declared namespace:
#
# MRI => "http://example.org/top/default"
# JRuby => "http://example.org/top/default"
new_doc.root = inside_node
puts "new_doc.root.namespace.href: #{inside_node.namespace.href}"
# in both, it still knows the node, now in new doc, is in a namespace if you ask:
#
# MRI => "http://example.org/top/default"
# Jruby => "http://example.org/top/default"
puts "new_doc.to_xml: #{new_doc.to_xml}"
# In MRI if you serialize it to XML, the namespace it is in is an xmlns attrib
#
# <inside xmlns="http://example.org/top/default">
# value
# </inside>
#
# In Jruby it is not:
#
# <?xml version="1.0"?>
# <inside>
# value
# </inside>
puts "new_doc.at_xpath('//inside') " +
(new_doc.at_xpath('//inside') ? "present" : "absent")
# In MRI, can't find it without a prefix, it really is in a namespace.
# In MRI => "absent"
#
# In Jruby, the same: "absent", JRuby really does treat it like it's in a namespace
puts "new_doc.at_xpath('//default:inside', default: 'http://example.org/top/default') " +
(new_doc.at_xpath('//default:inside', default: 'http://example.org/top/default') ? "present" : "absent")
# In MRI can find it with the right namespace
# In MRI => "present"
# In Jruby => "present"
# Jruby really does treat it as in that namespace, just doesn't serialize it accordingly
So in JRuby, doing Nokogiri::XML.parse(new_doc.to_xml)
would actually result in a different internal document as it would lose the namespace. In MRI, it would be round-trippable.
Activity
flavorjones commentedon Jul 2, 2018
@jrochkind Thanks for the thorough bug report.
jrochkind commentedon Jul 2, 2018
This seems possibly related to #1469.
Except, if I'm reading that code correctly (I have not executed it), I believe MRI Nokogiri is already doing what the issue-opener there is asking for (which I agree is correct and consistent with best practices exhibited by other top-of-the-line XML libraries), just not in JRuby. Maybe it changed (accidentally?) in between then and now in MRI, but not JRuby?
But I haven't excecuted that code in #1469 or totally wrapped my head around it, it could be a slightly different case.
Add namespace attributes when necessary
jvshahid commentedon Jul 15, 2018
Fixed in #1778. I would like someone to review the newly added tests to make sure they are idiomatic and make sense.
Add namespace attributes when necessary