Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XML line limit #533

Closed
leejarvis opened this issue Sep 7, 2011 · 3 comments
Closed

XML line limit #533

leejarvis opened this issue Sep 7, 2011 · 3 comments

Comments

@leejarvis
Copy link
Member

I'm having issues parsing and fetching the line numbers for a feed of XML which exceeds 65535 lines. Of course the limit of an unsigned short integer is 65535 which is why I assume this fails. Once this limit is hit the line number doesn't increase, which screws our return info. I haven't been able to look into this any further or to see if it's reasonable to change this type of integer (or whether this is an libxml issue?), but thought I'd post it anyway. Thanks!

@leejarvis
Copy link
Member Author

Nevermind looks like this is libxml, https://bugzilla.gnome.org/show_bug.cgi?id=325533

@fulldecent
Copy link
Contributor

fulldecent commented Jun 20, 2017

Sample implementation for storing higher numbers:

https://github.com/rubys/nokogumbo/pull/55/files

Saving here for reference.

+  if (line < 65535)
 +    output_node->line = (unsigned short)line;
 +  else {
 +    output_node->line = 65535;
 +    if (output_node->type == XML_TEXT_NODE)
 +      output_node->psvi = (void *)line;

flavorjones added a commit that referenced this issue Aug 14, 2021
- set BIG_LINES parse option by default which will allow Node#line to return large integers
- allow Node#line= to set large line numbers on text nodes

Fixes #1764, #1493, #1617, #1505, #1003, #533
flavorjones added a commit that referenced this issue Aug 14, 2021
feat(cruby): support line numbers larger than a short

---

**What problem is this PR intended to solve?**

As noted in #1493, #1617, #1505, #1003, and #533, libxml2 has not historically supported line numbers greater than a `short int`. Starting in libxml v2.9.0, setting the parse option `BIG_LINES` would allow tracking line numbers in longer documents.

Specifically this PR makes the following changes:

- set `BIG_LINES` parse option by default which will allow `Node#line` to return large integers
- allow `Node#line=` to set large line numbers on text nodes

Fixes #1764 

**Have you included adequate test coverage?**

Yes!

**Does this change affect the behavior of either the C or the Java implementations?**

JRuby's Xerces-based implementation did not suffer from this particular shortcoming, although its line number functionality is questionable in other ways (see #2177 / b32c875).
@flavorjones
Copy link
Member

This will be fixed in v1.13.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants