Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tests to cover line numbering edge cases with JRuby #2379

Merged
merged 5 commits into from Dec 22, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Expand Up @@ -41,6 +41,7 @@ A related discussion about Trust exists at [#2357](https://github.com/sparklemot
* [CRuby] `Node#line` is no longer capped at 65535. libxml v2.9.0 and later support a new parse option, exposed as `Nokogiri::XML::ParseOptions::PARSE_BIG_LINES` and set in `ParseOptions::DEFAULT_XML`, `::DEFAULT_XSLT`, `::DEFAULT_HTML`, and `::DEFAULT_SCHEMA`. (Note that JRuby never had this problem.) [[#1764](https://github.com/sparklemotion/nokogiri/issues/1764), [#1493](https://github.com/sparklemotion/nokogiri/issues/1493), [#1617](https://github.com/sparklemotion/nokogiri/issues/1617), [#1505](https://github.com/sparklemotion/nokogiri/issues/1505), [#1003](https://github.com/sparklemotion/nokogiri/issues/1003), [#533](https://github.com/sparklemotion/nokogiri/issues/533)]
* [CRuby] If a cycle is introduced when reparenting a node (i.e., the node becomes its own ancestor), a `RuntimeError` is raised. libxml2 does no checking for this, which means cycles would otherwise result in infinite loops on subsequent operations. (Note: JRuby/Xerces already does this.) [[#1912](https://github.com/sparklemotion/nokogiri/issues/1912)]
* [CRuby] Source builds will download zlib and libiconv via HTTPS. [[#2391](https://github.com/sparklemotion/nokogiri/issues/2391)] (Thanks, [@jmartin-r7](https://github.com/jmartin-r7)!)
* [JRuby] `Node#line` behavior has been modified to return the line number of the node in the _final DOM structure_ (the value returned in JRuby is increased by 1 to count the XML prolog). This behavior is different from CRuby, which returns the node's position in the _input string_. This difference is not ideal, but at least is now officially documented and tested. The real-world impact of this change is that the value returned in JRuby is greater by 1 to account for the XML prolog in the output. [[#2380](https://github.com/sparklemotion/nokogiri/issues/2380)] (Thanks, [@dabdine](https://github.com/dabdine)!)


## 1.12.5 / 2021-09-27
Expand Down
17 changes: 15 additions & 2 deletions ext/java/nokogiri/XmlNode.java
Expand Up @@ -36,6 +36,7 @@
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.w3c.dom.Text;
import org.w3c.dom.Comment;

import nokogiri.internals.HtmlDomParserContext;
import nokogiri.internals.NokogiriHelpers;
Expand Down Expand Up @@ -1579,14 +1580,21 @@ public class XmlNode extends RubyObject
return getNokogiriClass(context.runtime, "Nokogiri::XML::Node").getConstant(type);
}

/*
* NOTE that the behavior of this function is very difference from the CRuby implementation, see
* the docstring in ext/nokogiri/xml_node.c for details.
*/
@JRubyMethod
public IRubyObject
line(ThreadContext context)
{
Node root = getOwnerDocument();
int[] counter = new int[1];
count(root, counter);
return RubyFixnum.newFixnum(context.runtime, counter[0] + 1);
// offset of 2:
// - one because humans start counting at 1 not zero
// - one to account for the XML declaration present in the output
return RubyFixnum.newFixnum(context.runtime, counter[0] + 2);
}

private boolean
Expand All @@ -1599,9 +1607,14 @@ public class XmlNode extends RubyObject
NodeList list = node.getChildNodes();
for (int jchild = 0; jchild < list.getLength(); jchild++) {
Node child = list.item(jchild);
String text = null;

if (child instanceof Text) {
String text = ((Text)child).getData();
text = ((Text)child).getData();
} else if (child instanceof Comment) {
text = ((Comment)child).getData();
}
if (text != null) {
int textLength = text.length();
for (int jchar = 0; jchar < textLength; jchar++) {
if (text.charAt(jchar) == '\n') {
Expand Down
27 changes: 24 additions & 3 deletions ext/nokogiri/xml_node.c
Expand Up @@ -1698,10 +1698,31 @@ native_write_to(
}

/*
* call-seq:
* line
* :call-seq:
* line() → Integer
*
* [Returns] The line number of this Node.
*
* ---
*
* <b> ⚠ The CRuby and JRuby implementations differ in important ways! </b>
*
* Semantic differences:
* - The CRuby method reflects the node's line number <i>in the parsed string</i>
* - The JRuby method reflects the node's line number <i>in the final DOM structure</i> after
* corrections have been applied
*
* Performance differences:
* - The CRuby method is {O(1)}[https://en.wikipedia.org/wiki/Time_complexity#Constant_time]
* (constant time)
* - The JRuby method is {O(n)}[https://en.wikipedia.org/wiki/Time_complexity#Linear_time] (linear
* time, where n is the number of nodes before/above the element in the DOM)
*
* Returns the line for this Node
* If you'd like to help improve the JRuby implementation, please review these issues and reach out
* to the maintainers:
* - https://github.com/sparklemotion/nokogiri/issues/1223
* - https://github.com/sparklemotion/nokogiri/pull/2177
* - https://github.com/sparklemotion/nokogiri/issues/2380
*/
static VALUE
rb_xml_node_line(VALUE rb_node)
Expand Down
14 changes: 14 additions & 0 deletions test/helper.rb
Expand Up @@ -196,6 +196,20 @@ def assert_not_send(send_ary, m = nil)
refute(recv.__send__(msg, *args), m)
end unless method_defined?(:assert_not_send)

def pending(msg)
begin
yield
rescue MiniTest::Assertion
skip("pending #{msg}")
end
flunk("pending test unexpectedly passed: #{msg}")
end

def pending_if(msg, pend_eh, &block)
return yield unless pend_eh
pending(msg, &block)
end

def i_am_ruby_matching(gem_version_requirement_string)
Gem::Requirement.new(gem_version_requirement_string).satisfied_by?(Gem::Version.new(RUBY_VERSION))
end
Expand Down