Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tests to cover line numbering edge cases with JRuby #2379

Merged
merged 5 commits into from Dec 22, 2021

Conversation

dabdine
Copy link
Contributor

@dabdine dabdine commented Dec 8, 2021

What problem is this PR intended to solve?
While working on line numbering with a ruby-based project that is tested against CRuby and JRuby, I noticed JRuby was not returning the same line numbers. Specifically, when XML prolog was included in a source XML document, the line numbers generated by JRuby were off by one. You can see detail in this PR: rapid7/recog#390.

This DRAFT PR serves as a discussion point and method to execute these tests using the Nokogiri CI, given my issues obtaining a suitable development environment to validate them. I will similarly open an issue linking to this PR.

Have you included adequate test coverage?
I've added three new tests. Two currently fail for JRuby when running bundle exec rake compile test under JRuby 9.2.20.1. However, I'm having an issue getting a CRuby environment set up with nokogiri.

Does this change affect the behavior of either the C or the Java implementations?
It should impact the Java implementation once fixed. The CRuby implementation that relies on libxml appears to work fine.

@flavorjones
Copy link
Member

Hi, thank you for getting involved in Nokogiri development!

For previous work done in this area, please read #1223 and #2177.

Please don't worry about your CRuby environment, the JRuby and CRuby implementation handle line numbers completely differently, so you can focus on JRuby.

flavorjones added a commit to dabdine/nokogiri that referenced this pull request Dec 16, 2021
See context in sparklemotion#2380. The fix on JRuby is expensive and is probably
not worth the investment of developer time right now.

But let's preserve those failures as a known issue by introducing the
`pending` and `pending_if` test helpers.
@flavorjones
Copy link
Member

I've pushed a commit that marks these tests as "pending" on JRuby, per discussion at #2380. Marking as "ready for review" and we'll see what CI says.

@flavorjones flavorjones marked this pull request as ready for review December 16, 2021 23:12
@flavorjones
Copy link
Member

Pushed a commit that fixes multiline comments.

@flavorjones
Copy link
Member

The remaining pending test, "properly numbers lines with documents containing XML prolog", we should talk more about, because it reveals a significant difference between the JRuby and CRuby implementations than I realized ...

Libxml2's concept of "node line number" is the line number in the original document. Here's some code that demonstrates how this works. Note that the final output is identical, but the line value changes. We can conclude that the node line is the line in the original text and not in the final output form.

doc = Nokogiri::XML.parse(<<~XML)
  <root>
    <b>Test</b>
  </root>
XML
doc.at_css("b").line # => 2
doc.to_xml
# => "<?xml version=\"1.0\"?>\n" +
#    "<root>\n" +
#    "  <b>Test</b>\n" +
#    "</root>\n"

doc = Nokogiri::XML.parse(<<~XML)
  <?xml version="1.0" ?>
  <root>
    <b>Test</b>
  </root>
XML
doc.at_css("b").line # => 3
doc.to_xml
# => "<?xml version=\"1.0\"?>\n" +
#    "<root>\n" +
#    "  <b>Test</b>\n" +
#    "</root>\n"

This same code run through JRuby/Xerces demonstrates what we already know from staring at the implementation: the node line number reflects the final structure of the document, after corrections and formatting have been applied.

doc = Nokogiri::XML.parse(<<~XML)
  <root>
    <b>Test</b>
  </root>
XML
doc.at_css("b").line # => 2
doc.to_xml
# => "<?xml version=\"1.0\"?>\n" +
#    "<root>\n" +
#    "  <b>Test</b>\n" +
#    "</root>"

doc = Nokogiri::XML.parse(<<~XML)
  <?xml version="1.0" ?>
  <root>
    <b>Test</b>
  </root>
XML
doc.at_css("b").line # => 2
doc.to_xml
# => "<?xml version=\"1.0\"?>\n" +
#    "<root>\n" +
#    "  <b>Test</b>\n" +
#    "</root>"

I think I'd like to reflect this difference by updating the documentation and putting this expectation into the test explicitly (rather than "pend" the test).

@flavorjones
Copy link
Member

I've taken the liberty of pushing one more commit that embraces the behavior of JRuby, documents it and encodes the expectations in the tests. WDYT?

@flavorjones
Copy link
Member

Ping, let me know if you think I'm taking the API in the wrong direction. I'm planning to merge this today or tomorrow.

dabdine and others added 5 commits December 22, 2021 10:13
See context in sparklemotion#2380. The fix on JRuby is expensive and is probably
not worth the investment of developer time right now.

But let's preserve those failures as a known issue by introducing the
`pending` and `pending_if` test helpers.
- document this difference from CRuby/libxml2
- add 1 to account for the prolog
- update tests to explicitly specify this difference
@flavorjones
Copy link
Member

Rebased on latest main. Will merge when it goes green (unless someone raises an objection).

@flavorjones flavorjones merged commit 1710285 into sparklemotion:main Dec 22, 2021
@dabdine
Copy link
Contributor Author

dabdine commented Dec 22, 2021

@flavorjones sorry for the delay. I think this was the right decision for now. Would be nice to make nokogiri one day have consistent results no matter the engine, but this is a decent compromise for where we're at today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants