Skip to content

Commit

Permalink
fix(jruby): Node#line returns position in the final DOM
Browse files Browse the repository at this point in the history
- document this difference from CRuby/libxml2
- add 1 to account for the prolog
- update tests to explicitly specify this difference
  • Loading branch information
flavorjones committed Dec 17, 2021
1 parent 187a4f5 commit bdcb9d7
Show file tree
Hide file tree
Showing 4 changed files with 67 additions and 11 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Expand Up @@ -39,6 +39,7 @@ A related discussion about Trust exists at [#2357](https://github.com/sparklemot
* [CRuby] Handle abruptly-closed HTML comments as WHATWG recommends for browsers. (Thanks to HackerOne user [tehryanx](https://hackerone.com/tehryanx?type=user) for reporting this!)
* [CRuby] `Node#line` is no longer capped at 65535. libxml v2.9.0 and later support a new parse option, exposed as `Nokogiri::XML::ParseOptions::PARSE_BIG_LINES` and set in `ParseOptions::DEFAULT_XML`, `::DEFAULT_XSLT`, `::DEFAULT_HTML`, and `::DEFAULT_SCHEMA`. (Note that JRuby never had this problem.) [[#1764](https://github.com/sparklemotion/nokogiri/issues/1764), [#1493](https://github.com/sparklemotion/nokogiri/issues/1493), [#1617](https://github.com/sparklemotion/nokogiri/issues/1617), [#1505](https://github.com/sparklemotion/nokogiri/issues/1505), [#1003](https://github.com/sparklemotion/nokogiri/issues/1003), [#533](https://github.com/sparklemotion/nokogiri/issues/533)]
* [CRuby] If a cycle is introduced when reparenting a node (i.e., the node becomes its own ancestor), a `RuntimeError` is raised. libxml2 does no checking for this, which means cycles would otherwise result in infinite loops on subsequent operations. (Note: JRuby/Xerces already does this.) [[#1912](https://github.com/sparklemotion/nokogiri/issues/1912)]
* [JRuby] `Node#line` behavior has been modified to return the line number of the node in the _final DOM structure_ (the value returned in JRuby is increased by 1 to count the XML prolog). This behavior is different from CRuby, which returns the node's position in the _input string_. This difference is not ideal, but at least is now officially documented and tested. The real-world impact of this change is that the value returned in JRuby is greater by 1 to account for the XML prolog in the output. [[#2380](https://github.com/sparklemotion/nokogiri/issues/2380)] (Thanks, [@dabdine](https://github.com/dabdine)!)


## 1.12.5 / 2021-09-27
Expand Down
9 changes: 8 additions & 1 deletion ext/java/nokogiri/XmlNode.java
Expand Up @@ -1580,14 +1580,21 @@ public class XmlNode extends RubyObject
return getNokogiriClass(context.runtime, "Nokogiri::XML::Node").getConstant(type);
}

/*
* NOTE that the behavior of this function is very difference from the CRuby implementation, see
* the docstring in ext/nokogiri/xml_node.c for details.
*/
@JRubyMethod
public IRubyObject
line(ThreadContext context)
{
Node root = getOwnerDocument();
int[] counter = new int[1];
count(root, counter);
return RubyFixnum.newFixnum(context.runtime, counter[0] + 1);
// offset of 2:
// - one because humans start counting at 1 not zero
// - one to account for the XML declaration present in the output
return RubyFixnum.newFixnum(context.runtime, counter[0] + 2);
}

private boolean
Expand Down
27 changes: 24 additions & 3 deletions ext/nokogiri/xml_node.c
Expand Up @@ -1698,10 +1698,31 @@ native_write_to(
}

/*
* call-seq:
* line
* :call-seq:
* line() → Integer
*
* [Returns] The line number of this Node.
*
* ---
*
* <b> ⚠ The CRuby and JRuby implementations differ in important ways! </b>
*
* Semantic differences:
* - The CRuby method reflects the node's line number <i>in the parsed string</i>
* - The JRuby method reflects the node's line number <i>in the final DOM structure</i> after
* corrections have been applied
*
* Performance differences:
* - The CRuby method is {O(1)}[https://en.wikipedia.org/wiki/Time_complexity#Constant_time]
* (constant time)
* - The JRuby method is {O(n)}[https://en.wikipedia.org/wiki/Time_complexity#Linear_time] (linear
* time, where n is the number of nodes before/above the element in the DOM)
*
* Returns the line for this Node
* If you'd like to help improve the JRuby implementation, please review these issues and reach out
* to the maintainers:
* - https://github.com/sparklemotion/nokogiri/issues/1223
* - https://github.com/sparklemotion/nokogiri/pull/2177
* - https://github.com/sparklemotion/nokogiri/issues/2380
*/
static VALUE
rb_xml_node_line(VALUE rb_node)
Expand Down
41 changes: 34 additions & 7 deletions test/xml/test_node.rb
Expand Up @@ -1242,19 +1242,33 @@ def test_wrap
end

describe "#line" do
it "properly numbers lines with documents containing XML prolog" do
it "counts lines" do
xml = Nokogiri::XML(<<~XML)
<?xml version="1.0" ?>
<a>
<b>Test</b>
</a>
XML

pending_if("https://github.com/sparklemotion/nokogiri/issues/2380", Nokogiri.jruby?) do
if Nokogiri.jruby?
# in the output
assert_equal(3, xml.at_css("b").line)
else
# in the input
assert_equal(2, xml.at_css("b").line)
end
end

it "properly numbers lines with documents containing XML prolog" do
xml = Nokogiri::XML(<<~XML)
<?xml version="1.0" ?>
<a>
<b>Test</b>
</a>
XML

assert_equal(3, xml.at_css("b").line)
end

it "properly numbers lines with documents containing XML comments" do
xml = Nokogiri::XML(<<~XML)
<a>
Expand All @@ -1265,7 +1279,11 @@ def test_wrap
</a>
XML

assert_equal(4, xml.at_css("c").line)
if Nokogiri.jruby?
assert_equal(5, xml.at_css("c").line)
else
assert_equal(4, xml.at_css("c").line)
end
end

it "properly numbers lines with documents containing XML multiline comments" do
Expand All @@ -1282,7 +1300,11 @@ def test_wrap
</a>
XML

assert_equal(6, xml.at_css("c").line)
if Nokogiri.jruby?
assert_equal(7, xml.at_css("c").line)
else
assert_equal(6, xml.at_css("c").line)
end
end

it "returns a sensible line number for each node" do
Expand All @@ -1298,8 +1320,13 @@ def test_wrap
XML

set = xml.css("b")
assert_equal(2, set[0].line)
assert_equal(5, set[1].line)
if Nokogiri.jruby?
assert_equal(3, set[0].line)
assert_equal(6, set[1].line)
else
assert_equal(2, set[0].line)
assert_equal(5, set[1].line)
end
end

it "supports a line number greater than a short int" do
Expand Down

0 comments on commit bdcb9d7

Please sign in to comment.