Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

investigate using XML_PARSE_BIG_LINES to tell libxml2 to track line numbers bigger than a short int #1764

Closed
flavorjones opened this issue May 21, 2018 · 1 comment · Fixed by #2309
Milestone

Comments

@flavorjones
Copy link
Member

flavorjones commented May 21, 2018

Context

libxml2 has a limitation that it tracks the line numbers in the input with a short int, meaning that long files have odd behavior, either lines having a number of 0 or of 65535, see related issues #1493, #1617, #1505, etc.

New Information

In parser.h since libxml 2.9.5, there's a parse option XML_PARSE_BIG_LINES which stores big line numbers in a separate field in the node struct, psvi. I think this might be seamless because Nokogiri's xml_node.c calls xmlGetLineNo and all the logic we need is in there.

Idea

Let's explore:

  • add XML_PARSE_BIG_LINES to XML::ParseOptions
  • see if we can affect the related issues (linked above) by setting this option.

If it seems to work, then:

  • add some tests - xml doc, htmll doc, xsd schema, relaxng schema ... (any others?)
  • add that bit to the XML::ParseOptions::DEFAULT_* masks to make the tests pass
  • update the CHANGELOG
  • close all the issues
@flavorjones flavorjones added this to the v1.11.0 milestone Jan 5, 2019
@flavorjones flavorjones modified the milestones: v1.12.0, v1.13.0 Aug 2, 2021
flavorjones added a commit that referenced this issue Aug 14, 2021
- set BIG_LINES parse option by default which will allow Node#line to return large integers
- allow Node#line= to set large line numbers on text nodes

Fixes #1764, #1493, #1617, #1505, #1003, #533
@flavorjones
Copy link
Member Author

See #2309.

flavorjones added a commit that referenced this issue Aug 14, 2021
feat(cruby): support line numbers larger than a short

---

**What problem is this PR intended to solve?**

As noted in #1493, #1617, #1505, #1003, and #533, libxml2 has not historically supported line numbers greater than a `short int`. Starting in libxml v2.9.0, setting the parse option `BIG_LINES` would allow tracking line numbers in longer documents.

Specifically this PR makes the following changes:

- set `BIG_LINES` parse option by default which will allow `Node#line` to return large integers
- allow `Node#line=` to set large line numbers on text nodes

Fixes #1764 

**Have you included adequate test coverage?**

Yes!

**Does this change affect the behavior of either the C or the Java implementations?**

JRuby's Xerces-based implementation did not suffer from this particular shortcoming, although its line number functionality is questionable in other ways (see #2177 / b32c875).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant