Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser locks up on long input lines #97

Closed
mdierolf opened this issue Aug 5, 2020 · 2 comments · Fixed by #314
Closed

Parser locks up on long input lines #97

mdierolf opened this issue Aug 5, 2020 · 2 comments · Fixed by #314
Labels
bug Something isn't working performance
Milestone

Comments

@mdierolf
Copy link

mdierolf commented Aug 5, 2020

The attached test code fails to parse the attached XML document in a reasonable timeframe:
failing.zip

The issue causing this appears to be a regular expression to split the XML on newlines, which either never returns, or is incredibly slow to return:

At: xmldom/lib/sax.js:69:

function position(p,m){
  while(p>=lineEnd && (m = linePattern.exec(source))){
   lineStart = m.index;
   lineEnd = lineStart + m[0].length;
   locator.lineNumber++;
   //console.log('line++:',locator,startPos,endPos)
  }
  locator.columnNumber = p-lineStart+1;
 }
 var lineStart = 0;
 var lineEnd = 0;
 var linePattern = /.*(?:\r\n?|\n)|.*$/g

It appears that the solution would be to re-engineer the parser to process the XML in stream mode, not by splitting lines with a regular expression

@brodybits brodybits added help-wanted External contributions welcome needs investigation Information is missing and needs to be researched labels Aug 17, 2020
@brodybits
Copy link
Member

I have added the needs-investigation and help wanted labels.

It appears that the solution would be to re-engineer the parser to process the XML in stream mode
[...]

I am adding reference to issue #55 to replace lib/sax.js. I suspect there should be an existing SAX parser that we can reuse to both avoid bugs on our side and reduce the maintenance burden.

@karfau karfau added bug Something isn't working performance labels Jan 21, 2021
@karfau karfau added this to the 1.0.0 milestone Jan 21, 2021
@karfau karfau modified the milestones: planning 1.0.0, 0.8.0 Sep 16, 2021
@karfau
Copy link
Member

karfau commented Sep 16, 2021

After fixing duplicate attributes in the provided xml sample (which fails since 0.5.0) I could successfully parse the file when running against master. I assume this was also fixed by normalizing all kinds of line breaks, which landed in #314 and will be part of the 0.8.0 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working performance
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants